<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Keyphrase Extraction from Scientific Chinese Medical Abstracts Based on Character-Level Sequence Labeling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhixiong Zhang</string-name>
          <email>zhangzhx@mail.las.ac.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Gaihong Yu National Science Library, Chinese Academy of Sciences Beijing</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Huan Liu National Science Library, Chinese Academy of Sciences Beijing, China Department of Library Information and Archives Management, University of Chinese Academy of Science Beijing</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Jie Li National Science Library, Chinese Academy of Sciences Beijing, China Department of Library Information and Archives Management, University of Chinese Academy of Science Beijing</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Liangping Ding National Science Library, Chinese Academy of Sciences Beijing, China Department of Library Information and Archives Management, University of Chinese Academy of Science Beijing</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>National Science Library, Chinese Academy of Sciences Beijing, China Department of Library Information and Archives Management, University of Chinese Academy of Science Beijing</institution>
          ,
          <addr-line>China Wuhan Library</addr-line>
          ,
          <institution>Chinese Academy of Sciences Wuhan</institution>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>1</fpage>
      <lpage>4</lpage>
      <abstract>
        <p>Automatic keyphrase extraction (AKE) is an important task for quickly grasping the main points of the text. In this paper, we regard AKE from Chinese text as a character-level sequence labeling task to avoid segmentation errors of Chinese tokenizer. And we initialize our model with pretrained language model BERT, which is released by Google in 2018. We collect data from Chinese Science Citation Database and construct a large-scale dataset from medical domain, which contains 100,000 abstracts as training set, 6,000 abstracts ∗Corresponding Author</p>
      </abstract>
      <kwd-group>
        <kwd>Automatic Keyphrase Extraction</kwd>
        <kwd>Character-Level Sequence Labeling</kwd>
        <kwd>Pretrained Language Model</kwd>
        <kwd>Scientific Chinese Medical Abstracts</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>as development set and 3,094 abstracts as test set. We use
unsupervised keyphrase extraction methods including term
frequency (TF), TF-IDF, TextRank and supervised machine
learning methods including Conditional Random Field (CRF),
Bidirectional Long Short Term Memory Network (BiLSTM)
and BiLSTM-CRF as baselines. Experiments are designed
to compare word-level and character-level sequence
labeling approaches on supervised machine learning models and
BERT-based models. Compared with character-level
BiLSTMCRF, the best baseline model with F1 score of 50.16%, our
character-level sequence labeling model based on BERT
obtains F1 score of 59.80%, geting 9.64% absolute
improvement. We make our character-level IOB format dataset of
automatic keyphrase extraction from scientific Chinese
medical abstracts (AKESCMA) publicly available for the benefits
of research community, which is available at: https://github.
com/possible1402/Dataset-For-Chinese-Medical-KeyphraseExtraction.</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction</title>
      <p>
        Automatic keyphrase extraction (AKE) is a task to extract
important and topical phrases from the body of a document
[
        <xref ref-type="bibr" rid="ref49">49</xref>
        ], which is the basis of information retrieval [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], text
summarization [
        <xref ref-type="bibr" rid="ref58">58</xref>
        ], text categorization [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], opinion
mining [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and document indexing [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. It can help us quickly
go through large amounts of textual information to find out
the main stating point of the text. Appropriate keyphrases
can serve as a highly concise summarization of the text and
are beneficial to retrieve text.
      </p>
      <p>
        Classic keyphrase extraction algorithms usually contain
two steps [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The first step is to generate candidate
keyphrases, in which plenty of manually designed heuristics are
combined to select potential candidate keyphrases. And the
second step is to determine which of these candidate
keyphrases are correct.
      </p>
      <p>One of the shared disadvantages in above-mentioned
twostep approaches is that the model performance in second
step is based on the quality of candidate keyphrases
generated in the first step. So some researchers reformulate
keyphr-ase extraction as a sequence labeling task and
validate the effectiveness of this formulation.</p>
      <p>
        In 2008, Zhang et al. [
        <xref ref-type="bibr" rid="ref56">56</xref>
        ] firstly reformulate keyphrase
extraction as a sequence labeling task and construct a CRF
model to extract keyphrases from Chinese text, which skips
the step of candidate keyphrase generation. They use 600
documents to train the model and design lots of features
manually. Moreover, they use word-level sequence labeling
instead of character-level, tagging the words rather than
characters. In Chinese, word is the minimal unit to express
semantics. The advantage of word-level formulation is that
we can model the relationship among words directly while
the disadvantage is that it still depends on the word
segmentation results of Chinese tokenizer.
      </p>
      <p>
        By virtue of automatic extracting features, deep learning
methods exceed machine learning methods and gradually
become the mainstream in many natural language
processing (NLP) tasks. Transformer [
        <xref ref-type="bibr" rid="ref50">50</xref>
        ] , an emerging model
architecture for handling long-term dependencies, is a substitute
to classic neural networks such as Long Short-Term
Memory network. In 2018, Google released BERT [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which is a
language model pretrained on large-scale unannotated text
and used Transformer to capture deep semantic and
syntactic features in text. In 2019, Sahrawat et al.[
        <xref ref-type="bibr" rid="ref44">44</xref>
        ] regarded
      </p>
      <p>AKE as a sequence labeling task and applied lots of
pretrained language models including BERT to English
automatic keyphrase extraction task, showing the efectiveness
of pretrained language model.</p>
      <p>Compared to English keyphrase extraction, Chinese
keyphrase extraction is facing with two challenges: lacking of
publicly available annotated dataset and relying on Chinese
word segmentation tool. Firstly, supervised methods need
ground-truth keyphrases of the text to train the model, while
there are few Chinese publicly annotated keyphrase
extraction datasets, which makes it dificult to do objective
evaluation among diferent researches. Secondly, English tokens
is split by white space while there is no delimiter among
Chinese words.</p>
      <p>To address the above-mentioned challenges, in this
paper, we construct a high quality dataset for Chinese
automatic keyphrase extraction. We formulate keyphrase
extraction from scientific Chinese medical abstracts as a
characterlevel sequence labeling task which doesn’t rely on Chinese
tokenizer. And also we design experiments to compare the
model performance under word-level and character-level
sequence labeling formulations, which has not been explored.</p>
      <p>In addition, for scientific Chinese medical abstracts, English
words are interspersed with Chinese words, which increases
the dificulty of data preprocessing. So we use Unicode
Coding to distinguish English and Chinese, which regards each
English word as the elementary unit and each Chinese
character as the elementary unit.</p>
      <p>
        Our key contributions are summarized as follows:
1. We regard AKE from scientific Chinese medical
abstracts as a character-level sequence labeling task and
ifne-tune the parameters of BERT[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] to make it adapt
to our large-scale keyphrase extraction dataset. Our
approach skips the step of candidate keyphrase
extraction and is independent of Chinese tokenizer. And
also we transfer the pretrained language model BERT
to downstream Chinese AKE task without complicated
manually-designed features.
2. We design comparative experiments against word-level
and character-level sequence labeling formulation for
Chinese keyphrase extraction to verify the
efectiveness of character-level formulation, especially under
the general trend of pretrained language model. The
comparative experiments are conducted on machine
learning baseline models and BERT-based model. We
ifnd that the performance of character-level
formulation is comparable to word-level formulation or even
higher for traditional machine learning algorithms while
has overwhelming advantages for pretrained language
model.
3. We process data from Chinese Science Citation
Database and construct a large-scale character-level dataset
for AKE from scientific Chinese medical abstracts. The
dataset is labeled using Inside Outside Beginning tag- As for supervised approaches, classic keyphrase
extracging scheme (IOB format) [
        <xref ref-type="bibr" rid="ref43">43</xref>
        ], which is a common tion is formulated as a binary classification problem [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ][
        <xref ref-type="bibr" rid="ref48">48</xref>
        ]
tagging format in chunking tasks such as named en- to determine whether the potential candidate keyphrases
tity recognition task. Our proposed dataset contains match ground-truth keyphrases for the text or not.
Tradi100,000 abstracts in training set, 6,000 abstracts in de- tional machine learning algorithms such as Naïve Bayes [
        <xref ref-type="bibr" rid="ref54">54</xref>
        ],
velopment set and 3,094 abstracts in test set. We make maximum entropy [
        <xref ref-type="bibr" rid="ref61">61</xref>
        ], decision trees [
        <xref ref-type="bibr" rid="ref49">49</xref>
        ], SVM [
        <xref ref-type="bibr" rid="ref59">59</xref>
        ],
bagour processed large-scale dataset (AKESCMA) publicly ging [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], boosting [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] rely heavily on complicated
manuallyavailable for the benefits of the research community. designed features which can be broadly divided into two
categories: within collection features and external
resourcebases features [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Within collection features use textual
2 Related Work fsetaattiusrteicsawlfiethatiunrtersaisnuicnhgadsattearamndfrceaqnuebnecfyur[2th4e],rTdFiv*iIdDeFd[i4n5t]o,
2.1 Automatic Keyphrase Extraction syntactic features such as some linguistic paterns [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] and
Automatic keyphrase extraction has received lots of aten- structural features such as location that keyphrases occur
tion for more than 20 years. Over this time, existing clas- in [
        <xref ref-type="bibr" rid="ref52">52</xref>
        ]. External resource-based features consist of lexical
sic methods usually contain two steps: generating candidate knowledge bases such as Wikipedia [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ][
        <xref ref-type="bibr" rid="ref38">38</xref>
        ], document
cikeyphrases and determining which of these candidate keyph- tations [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], hyperlinks [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. These methods have some
weakrases match ground-truth keyphrases. nesses. The prediction for each candidate keyphrase is
inde
      </p>
      <p>
        In the first step, candidate keyphrases generation relies pendent to that of others, which means that the model can’t
on some heuristics such as extracting n-grams that appears capture the connection among keyphrases.
in external knowledge base[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ][
        <xref ref-type="bibr" rid="ref38">38</xref>
        ], extracting phrases that hTese two-step keyphrase extraction approaches have some
satisfy pre-defined lexical paterns [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref24">24</xref>
        ][
        <xref ref-type="bibr" rid="ref32">32</xref>
        ][
        <xref ref-type="bibr" rid="ref52">52</xref>
        ]. The clas- drawbacks. Firstly, error propagation. The candidate
keyphsic approaches in second step can be divided into two cate- rases generation errors occurring in the first step will be
gories: unsupervised approaches and supervised approaches. passed to the second step and influence the performance of
      </p>
      <p>
        Unsupervised approaches can be divided into four types: the downstream methods. Secondly, the model performance
statistics-based approaches[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], graph-based approaches[
        <xref ref-type="bibr" rid="ref39">39</xref>
        ][
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], relies heavily on some heuristic setings such as threshold,
embedding-based approaches[
        <xref ref-type="bibr" rid="ref35">35</xref>
        ][
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] and language model- external resources (Wikipedia, domain ontology, lexicon
dicbased approaches[
        <xref ref-type="bibr" rid="ref47">47</xref>
        ]. Graph-based methods are the most tionary etc.), and filtration paterns of POS tags, which make
popular ones while statistics-based methods still hold the it dificult to transfer to a new domain. Thirdly, it’s not able
atention of the research community.[
        <xref ref-type="bibr" rid="ref40">40</xref>
        ] to find an optimal N value (number of keyphrases to extract
      </p>
      <p>
        As for Statistics-based approaches, these approaches don’t for the text) based on article contents so it is usually set to a
need any training corpus and they are based on statistical ifxed parameter which results in keyphrase extraction
perfeatures of the given text such as word frequency [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ], TF*IDF formance varying with the value for N. Fourthly, the
num[
        <xref ref-type="bibr" rid="ref46">46</xref>
        ], PAT-tree [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and word co-occurrences [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ]. And it’s ber of keyphrases is same among text, ignoring the physical
suitable for one single document because no prior informa- truth and bringing lots of redundant keyphrases or losing
tion is needed. In 1995, Cohen used N-gram statistical infor- lots of important keyphrases. Finally, in the second step, the
mation to automatically index the document [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. It doesn’t model just analyzes the semantic and syntactic properties of
use any stop list, stemmer or domain-specific external in- candidate keyphrases separately while losing the meaning
formation, allowing for easy application in any language or of the whole text.
domain with slight modification. In 1997, Chien used PAT- Zhang et al.[
        <xref ref-type="bibr" rid="ref56">56</xref>
        ] first reformulates keyphrase extraction
tree and mutual information between words to extract Chi- to a sequence labeling task, and utilizes user-defined
tagnese keyphrases [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In 2009, Carpena et al. considered word ging scheme to annotate each word in Chinese text and
infrequency and spatial distribution features that keywords dicates its chunk belonging. And they use Conditional
Ranare clustered whereas irrelevant words distribute randomly dom Field model, which shows great performance in sequence
in text [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. These statistical approaches are usually easy to labeling task. They design lot of manually-designed features
transfer to a new domain because no prior information is such as POS tagging, TF*IDF, and other location features. Li
applied. et al. [
        <xref ref-type="bibr" rid="ref60">60</xref>
        ] also uses word-level sequence labeling model to
      </p>
      <p>
        As for graph-based approaches, keyphrase extraction is a extract keyphrases in automotive field for Chinese text.
ranking problem substantially. The model scores each can- Casting keyphrase extraction as a sequence labeling task
didate for its likelihood of being a ground-truth keyphrase bypasses the step of candidate keyphrases generation and
and returns top-ranked keyphrases by seting a threshold. provides a unified method for automatic keyphrase
extrachTere are lots of popular unsupervised learning algorithms tion. Moreover, in sequence labeling, keyphrases are
correfor keyphrases extraction, such as TextRank [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ], LexRank lated to each other instead of being independent units.
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], TopicRank [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], SGRank [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and SingleRank [
        <xref ref-type="bibr" rid="ref51">51</xref>
        ].
      </p>
      <p>
        Supervised machine learning methods require precise
feature engineering and they rely heavily on manually-designed
features, which are time-consuming. Using deep learning
method to automatically extract features has become the
mainstream of many natural language processing tasks. There
are some practices for English AKE. In 2016, Zhang et al.
[
        <xref ref-type="bibr" rid="ref57">57</xref>
        ] casts keyphrase extraction as a sequence labeling task
and proposes a joint-layer recurrent neural network model
to extract keyphrases from tweets, which doesn’t need
complicated feature engineering. In 2019, Sahrawat et al. [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ]
constructs a BiLSTM-CRF model and uses contextualized
word embedding from pretrained language models to
initialize the embedding layer. They evaluate model performance
on three English benchmark datasets: Inspec [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ],
SemEval2010 [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], SemEval-2017 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and their model achieves
stateof-the-art results on these three benchmark datasets.
      </p>
      <p>Compared with English AKE, Chinese AKE is more
complicated owing to the characteristic that there is no
delimiter among Chinese words. So there is an additional step in
most Chinese AKE models: using Chinese tokenizer to
segment words. For traditional two-step keyphrase extraction
models, generating Chinese candidate keyphrases needs to
use Chinese tokenizer to segment words first. For Chinese
AKE models based on sequence labeling, existing methods
still use word-level tagging, restricted by the segmentation
results of Chinese tokenizer.
2.2</p>
      <sec id="sec-2-1">
        <title>Sequence Labeling Based on BERT</title>
        <p>a milestone in NLP. BERT is pretrained on large-scale
unlabeled data from BooksCorpus and English Wikipedia,
containing more than 3.3 billion tokens in total. Using BERT
to fine-tune the downstream supervised tasks breaks the
record for 11 NLP tasks including sentence classification,
named entity recognition, natural language inference etc.,
which proves the feasibility of pretraining-finetuning mode.</p>
        <p>
          Using pretrained language models [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ][
          <xref ref-type="bibr" rid="ref41">41</xref>
          ][
          <xref ref-type="bibr" rid="ref42">42</xref>
          ][
          <xref ref-type="bibr" rid="ref22">22</xref>
          ][
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] has
become a standard component of SOTA (state-of-the-art)
model architecture in many natural language processing tasks.
        </p>
        <p>
          Most previous works for sequence labeling are built upon
diferent combinations of LSTM and CRF[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ][
          <xref ref-type="bibr" rid="ref19">19</xref>
          ][
          <xref ref-type="bibr" rid="ref53">53</xref>
          ], Since
the release of BERT[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], some researchers show the
efectiveness of applying BERT or BERT-based models to sequence
labeling task such as named entity recognition task. BERT
has a simple architecture based on bidirectional
transformers[
          <xref ref-type="bibr" rid="ref50">50</xref>
          ], which performs strongly on various tasks
depending on its capability to capture long term frequency. Lee et
al. introduces BioBERT [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ], which is pretrained on
largescale biomedical corpora using the model architecture same
with BERT. They test BioBERT on several publicly datasets
for named entity recognition such as NCBI disease, BC5CDR.
hTe results show that BioBERT outperforms the
state-ofthe-art models on six of nine datasets.
        </p>
        <p>In this paper, we combine the benefits of formulating
keyphrase extraction from Chinese medical abstracts as a
characterlevel sequence labeling task and the advantage of
pretrainingifnetuning mode, which can not only avoid errors occurring
in Chinese tokenizer, but also extract features automatically
rather than using complicated manually-designed features.</p>
        <p>
          With the improvement of computer hardware and the
increase of available data, deep learning based methods
gradually occupy the dominant position in the field of natural
language processing. Although deep neural networks can learn
highly nonlinear features, they are prone to over-fiting with- 3 Methodology
out large amount of annotated data. And the objective func- 3.1 Task Definition
tions of almost all deep learning architectures are highly We cast keyphrase extraction from Chinese medical abstracts
non-convex function of the parameters, with the potential as a character-level sequence labeling task and use IOB
forfor many distinct local minima in the model parameter space[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. mat as the input format of the model. This task can be
forhTus, how to initialize parameters has been a problem that mally stated as:
puzzles researchers. The breakthrough comes in 2006 with Let  = {1, 2, ...,  } be an input text, where 
repthe algorithms for deep belief networks [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] and stacked resents the ℎ element. If the input text is mixed up with
auto-encoders[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], which are all based on a similar approach: Chinese and English, the element is a character for Chinese
greedy layer-wise unsupervised pre-training followed by su- and a word for English. Assign each  in the text one of
pervised fine-tuning. the three class labels  = {,  ,  }, where  denotes
        </p>
        <p>Compared with traditional supervised learning tasks that that  locates in the beginning of a keyphrase,  denotes
randomly initialize parameters then learn language repre- that  locates in the inside or end of a keyphrase, and 
sentations directly from annotated text, pretraining-finetuning denotes that  is not a part of all keyphrases. For example,
mode not only capture the syntactic and semantic features there is a sentence ’X
of tokens from large-scale unannotated text but also provide NR0B1 ’ and the keyphrases in this
a good initial point for the downstream task, improving the sentence are ’X ’ and ’NR0B1
generalization ability of the downstream supervised learn- ’.
ing task. After IOB format transformation, the character-level
tag</p>
        <p>
          Recently, BERT, short for Bidirectional Encoder Represen- ging result of this sentence is shown in Table 1. As we can
tations from Transformers, which is a pretrained language see, we split the sentence according to the language which
model receiving widespread concern and is believed to be regards each English word as the elementary unit and each
Chinese character as the elementary unit. This
characterlevel formulation avoids errors of Chinese tokenizer, which
has been a troublesome problem in Chinese keyphrase
extraction.
Although there is a suit of evaluation measures for sequence
labeling task, in automatic keyphrase extraction, what we
really care about is whether we can extract correct keyphrases
of the provided text. So we use precision, recall and F1-score
based on actual matching keyphrases against the
groundtruth keyphrases for evaluation as used by previous studies
[
          <xref ref-type="bibr" rid="ref30">30</xref>
          ].
        </p>
        <p>
          Traditionally, automatic keyphrase extraction system have
been accessed using the proportion of top-N candidates that
exactly match the ground-truth keyphrases[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. For keyphrase
extraction based on sequence labeling, there is no need for N
value and we just use the keyphrases predicted by the model
to evaluate the AKE performance. But we need to firstly
recognize the keyphrases from IOB format before evaluation.
        </p>
        <p>We concatenate characters between label ’B’ and the last
adjacent label ’I’ behind label ’B’ as predicted keyphrase.</p>
        <p>We denote the total number of predicted keyphrases as
r, number of predicted keyphrases matching with
groundtruth keyphrases as c, number of ground-truth keyphrases
as s. The evaluation measures are defined as follows:
 :  =
 :  =




 1 −  :  =
2 ×  × 
 +</p>
        <p>B
3.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Dataset Construction</title>
        <p>We collect data from Chinese Science Citation Database, which
is a database contains more than 1000 kinds of excellent
journals published in mathematics, physics, chemistry,
biology, medicine and health etc. We set some constraints to
restrict data to Chinese medical data as well as no
incomplete and duplicated records included to ensure the quality
of data. The constraints are listed as follows:
1. According to Chinese Library Classification (CLC), the
CLC code of medical data starts with the capital leter
’R’. So we restrict data to records that the metadata
ifeld of CLC code starts with the capital leter ’R’.
2. The metadata field of language is set to Chinese.
3. The metadata fields of title, abstract and keyphrases
are not null. Here, keyphrases refer to author-assigned
keyphrases.</p>
        <p>Statistics shows that there are 757,277 records meeting
the above-mentioned constraints in total. The title and the
abstract of each article are concatenated as the source
input text. Furthermore, there are two types of keyphrases:
extractive keyphrases and abstractive keyphrases.
Extractive keyphrases refer to keyphrases that are present in the
source input text while abstractive keyphrases refer to
keyphrases that are absent in the source input text. Because we
formulate keyphrase extraction as a character-level sequence
labeling task and can only extract keyphrases that are present
in the source input text, we just consider the extractive
keyphrases.</p>
        <p>For a given text, we expect that all author-assigned
keyphrases are extractive keyphrases, so we can annotate as many
extractive keyphrases as possible. To achieve that, we firstly
match each author-assigned keyphrase with the given text
and see if all author-assigned keyphrases can be found in
X
B</p>
        <p>I</p>
        <p>I</p>
        <p>I</p>
        <p>I</p>
        <p>O</p>
        <p>O</p>
        <p>O</p>
        <p>O</p>
        <p>I</p>
        <p>O</p>
        <p>B
the text. Then we limit our dataset to records that all
authorassigned keyphrases are extractive keyphrases. After
filtration, there are 169,094 records in total. We aim to construct
a large-scale dataset for our deep neural network model
because although deep neural networks can learn highly
nonlinear features, they are prone to over-fiting compared with
traditional machine learning methods.</p>
        <p>We choose 100,000 records as our training set, 6,000 records
as our development set and 3,094 records as our test set.
Training set is used for training the keyphrase extraction
model. Development set is used in the training process to
monitor the generalization error of the model and to tune
hyper-parameters. Test set is used to test the performance
of the model. Note that there is no overlap among data sets.
Next, we process these three data sets using IOB format to
make them suitable for modeling sequence labeling task.</p>
        <p>In this paper, we are going to compare word-level and
character-level formulation for Chinese keyphrase
extraction. So we construct datasets for character-level and
wordlevel sequence labeling separately.</p>
        <p>Before generating character-level IOB format for each
character, we do some preprocessing steps:
1. Using Unicode Coding to distinguish Chinese and
English. To address the problem that English words and
Chinese words are mixed together in Chinese medical
abstracts, we use Unicode Coding to distinguish
English and Chinese. Our proposed data sets can greatly
deal with the split of English words and Chinese
characters, in which English word and Chinese character
is the minimal unit respectively.
2. Converting from all half width to full half width.
Punctuations in Chinese medical text include two format:
full width and half width. Authors may neglect the
format of punctuations, which causes the problem that
keyphrases can’t match with the abstract. For
example, the authors might provide the keyphrase ’er:yag
’, but they use ’eryag ’ in the abstract in
which the colon is in full width format. So we
transform all half width punctuations to full width
punctuations except full stop.
3. Dealing with special characters. There are lots of
special characters in scientific Chinese medical abstracts
and sometimes there are space characters next to these
special characters while sometimes not. To unify the
format, we drop all space characters next to special
characters.
4. Lowercase. We transform all English words to their
lowercase format.</p>
        <p>After preprocessing, we do the tagging process, in which
we match keyphrases with the source input text to find the
locations of keyphrases present in the text and tag the
characters within the locations with either label ’B’ or label ’I’
and characters not within the locations with label ’O’. For
the first character in the keyphrase, tag it with label ’B’
and for the characters other than the first character in the
keyphrase, tag them with label ’I’.</p>
        <p>Figure 1 is an example of character-level IOB format
generation. In this example, the keyphrase is ’X</p>
        <p>’. We match the keyphrase and return the
location between 2 and 14. So we tag the character in location
2 with label ’B’ and the characters located between 3 and 14
with label ’I’. Other characters not within the location are
tagged with label ’O’.</p>
        <p>Note that there are two special occasions in our tagging
process and we apply some tricks on it.</p>
        <p>1. Given two author-assigned keyphrases of the input
text, if there is a containment relationship between
the location span of two keyphrases, we use
Maximum Matching Rule to tag the longest keyphrase. For
example:
Text:’
’
hTis text has two author-assigned keyphrases:’ ’
and ’ ’. The location span of ’ ’ is
between 8 and 9 while the location span of ’ ’
is between 8 and 11. So we tag the characters within
the longest keyphrase ’ ’ with label ’B’ or ’I’.
2. If the first few characters of a keyphrase is equal to
the last few characters of the other keyphrase and
this keyphrase appears after the other keyphrase in a
given text, we will concatenate these two keyphrases
by their common characters. For example:
Text:’
’
hTis text has two author-assigned keyphrases: ’
’ and ’ ’. These two keyphrases
share common characters ’ ’ and appear next to
each other in the text. Then we will tag the keyphrase
’ ’ instead of ’ ’ or ’
’. This step determines that our dataset
is suitable for flat keyphrase extraction rather than
nested keyphrase extraction, which means that each
character will be assigned only one label.
For word-level sequence labeling, we use Chinese
tokenizer Jieba to segment words. And the tagging process is
almost the same with that of character-level dataset
construction except that we tag the words rather than characters.</p>
        <p>To examine the quality of our data sets, we count the
number of recognized keyphrases, the number of correct
recognized keyphrases and the number of ground-truth keyphrases
in our generated data sets. And we use evaluation measures
mentioned in section 3.2 to see the IOB generation
performance. The IOB generation results for character-level and
word-level are summarized in Table 3 and Table 4 separately.</p>
        <p>As we can see, the F1-score of each character-level
generated data set is higher than the corresponding word-level
generated data set for more than 5 percent. For
characterlevel data sets, owing to the above-mentioned tricks that
we apply to IOB generation, the evaluation measures don’t
reach to 100%. But the character-level IOB generation
results on all three data sets still show that our data sets are
of good quality. For word-level sequence labeling data sets,
the segmentation error of the Chinese tokenizer is a
critical reason that the evaluation measures are lower than that
of character-level. Take the example mentioned in section
3.1 as an example, the word-level tagging result is shown
in Table 2. There is one incorrect keyphrase ’nr0b1
’ which is supposed to be ’nr0b1 ’. Except for tagged
incorrect keyphrases, there might be missing keyphrases
because of segmentation error for word-level sequence
labeling.
3.4</p>
      </sec>
      <sec id="sec-2-3">
        <title>Model Architecture</title>
        <p>
          We initialize our sequence labeling keyphrase extraction model
with pretrained BERT model. The architecture of BERT is
based on a multi-layer bidirectional Transformers[
          <xref ref-type="bibr" rid="ref50">50</xref>
          ].
Instead of the traditional left-to-right language modeling
objective, BERT is pretrained on two tasks: predicting randomly
masked tokens and predicting whether two sentences
follow each other. Our sequence labeling keyphrase extraction
model follows the same architecture as BERT and is
optimized on scientific Chinese medical abstracts. We use a
feedforward neural network which acts as a linear classifier layer
on top of the representations from the last layer of BERT to
compute character level IOB probabilities. Our model
architecture is shown in Figure 2.
        </p>
        <p>
          For a given token, its input representation is constructed
by summing the Wordpiece embedding [
          <xref ref-type="bibr" rid="ref55">55</xref>
          ], segment
embedding and position embedding. The first token of each
sequence is always the special token [CLS]. The segment
embedding is useful in sentence pairs task such as question
answering to diferentiate sentence. Sentence pairs are
separated by a special token [SEP] and a sentence A
embedding is added to each token in the first sentence while a
sentence B embedding is added to each token in the second
sentence. Our task is a single sentence task, so we only use
sentence A embeddings. The position embedding is used to
indicate the location of the token in the text and can only
take the length lower than 512. A visual representation of
our character-level input representations is given in Figure
3.
        </p>
        <p>In addition, BERT can only take the input with the
maximum length of 512. Owing to this limitation, some source
input text will be truncated, causing the problem that the
model might predict some single character as keyphrases.
In most cases, single Chinese character makes no sense. We
ifnd that some single Chinese characters are meaningful
including chemical elements in The Periodic Table such as ’
’,’ ’, organs such as ’ ’,’ ’ and animals such as ’ ’,’
’. So we design a user-defined lexicon to store meaningful
Chinese characters for further filtration.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments &amp; Results</title>
      <sec id="sec-3-1">
        <title>Experimental Design</title>
        <p>In this paper, we firstly conduct unsupervised baseline
experiments to demonstrate that traditional unsupervised
twostep keyphrase extraction methods are sensitive to N value
and the lexicon scale, which depends on precise manual
settings. Then before we use sequence labeling formulation to
Chinese keyphrase extraction task, we design comparative
409,371
26,169
13,458
434,266
27,680
14,305
408,373
26,061
13,403
408,373
26,061
13,403
R
R
F</p>
        <p>F
number of correct rec- number of
groundognized keyphrases truth keyphrases
number of recognized
keyphrases
number of correct rec- number of
groundognized keyphrases truth keyphrases</p>
        <p>P</p>
        <p>P</p>
        <sec id="sec-3-1-1">
          <title>Training Set 99.18%</title>
          <p>Development Set 99.13%
Test Set 99.15%
experiments using word-level and character-level
formulation on supervised machine learning baseline methods and
BERT-based methods to verify the efectiveness of
characterlevel. Finally, we compare the best unsupervised baseline
model, the best character-level machine learning baseline
model and our character-level BERT-based sequence
labeling keyphrase extraction model to prove the strength of
sequence labeling formulation and per-trained language model.</p>
          <p>Regarding to unsupervised baselines, We use some
traditional approaches including term frequency, TF*IDF based
on single document, TF*IDF based on multi-documents,
TextRank. Here, TF*IDF based on single document means that
we just consider candidate keyphrases’ term frequency and
inverse document frequency based on one single document.
TF*IDF based on multi-documents means that we calculate
the statistics based on the whole data set. As we know, the
performance of traditional unsupervised approaches varies
with the value for N (number of top ranked keyphrases),
which is a parameter set manually. And traditional
unsupervised Chinese keyphrase extraction relies on Chinese
tokenizer to generate candidate keyphrases. Usually, user-defined
lexicon will make a great diference to the results of Chinese
word segmentation.</p>
          <p>So we design two groups of experiments using control
variable method for unsupervised baselines according to N
value and lexicon scale. Group 1 keeps the same lexicon
scale and compares the performance of baseline approaches
at diferent N value of 3 and 5 to ensure the stability of the
baseline approaches. Group 2 keeps the same N value and
compares the performance of baseline approaches when the
lexicon scale for the Chinese tokenizer is diferent to test
the transferability of baseline approaches. We set two kinds
of lexicon scales, one using all ground-truth keyphrases in
training set, development set and test set as lexicon, the
other just using ground-truth keyphrases in training set.</p>
          <p>Regarding to supervised machine learning baselines, we
cast keyphrase extraction as a sequence labeling task
instead of a binary classification task and use CRF, BiLSTM,
BiLSTM-CRF algorithms as machine learning baselines.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>4.2 Experimental Settings</title>
        <p>As for unsupervised baseline approaches, we use Jieba for
Chinese word segmentation. Before generating candidate
keyphrases, we do some preprocessing steps, such as
removing stop words and some special characters. We restrict
candidate keyphrases within our user-defined lexicon and noun
phrases.</p>
        <p>
          Of the three machine learning baseline approaches, CRF[
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]
is trained by regularized maximum likelihood estimation
and uses Viterbi algorithm to find the optimal sequence of
labels. BiLSTM and BiLSTM-CRF[
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] are trained with
Stochastic Gradient Descent (SGD). The learning rate is set to
5e-4 and the model is trained for 15 epochs with early
stopping. The hidden layers are set to 512 units and the
embedding size is 768 in both models. In addition, the batch size is
set to 64.
        </p>
        <p>For our BERT-based keyphrase extraction model, due to
system memory constraints, the batch size is set to 7 and we
use SGD to optimize Cross Entropy Loss. The initial
learning rate is set to 5e-5 and gradually decreases to 5e-8 as the
training progresses and the model is trained for 3 epochs.</p>
        <p>In this paper, we use F1-score to evaluate model
performance, which is the weighted average of precision and
recall, taking both precision and recall into account.</p>
      </sec>
      <sec id="sec-3-3">
        <title>4.3 Unsupervised Baseline Experiments</title>
        <p>As for traditional unsupervised baseline experiments, we
conduct two groups of baseline approaches comparative
experiments according to N value and lexicon scale as what
we have mentioned in section 4.1.</p>
        <sec id="sec-3-3-1">
          <title>Method</title>
        </sec>
        <sec id="sec-3-3-2">
          <title>Term Frequency TF*IDF Based on Single Document TF*IDF Based on Multi Documents TextRank</title>
          <p>For the group of N value experiments, we restrict the
lexicon scale to whole lexicon, which contains author-assigned
keyphrases in all the training set, development set and test
set as user-defined lexicon for Jieba word segmentation.
Table 5 provides the results of N value comparison experiments
of baseline approaches. Increasing the N value will improve
the recall but lower the precision. We find that the F1-score
of baseline approaches varies with the N value, but TF*IDF
based on multi-documents achieves best performance among
all baseline models no mater the N value. And when the N
value is 3, the F1-score of TF*IDF based on multi-documents
is 44.59%, which is higher than that when N value is 5.</p>
          <p>For the group of lexicon scale experiments, we restrict N
value to 3 to compare baseline approaches at diferent
lexicon scales. Table 6 presents the results of lexicon scale
comparative experiments of baseline approaches. As we can see,
for all unsupervised baseline approaches, the performance
of using lexicon that only contains keyphrase in training
set for Jieba word segmentation drops at least 7% compared
to that of using whole lexicon. The results show that
traditional keyphrases extraction approaches for Chinese
medical abstracts have poor transferability so when transferring
traditional models to a new domain and no lexicon can be
used, the keyphrase extraction performance would be poor.
4.4.1</p>
          <p>Supervised Machine Learning Baseline Models.</p>
          <p>hTe F1-score evaluation metrics of word-level and
characterlevel comparative experiments on machine learning
baseline models are listed in Table 7. As we can see, word-level
sequence labeling formulation is beter than character-level
sequence labeling formulation for CRF and BiLSTM
algorithms while a litle bit lower than character-level sequence
labeling formulation for BiLSTM-CRF algorithms. The
reason might be that BiLSTM-CRF is a more powerful model
to capture the contextual relationship among characters to
make up for the disadvantage that character-level
formulation doesn’t model the relationship among words directly.
4.4.2</p>
          <p>BERT-based Models.</p>
          <p>
            hTe precision, recall and F1-score evaluation metrics of
word-level and character-level sequence labeling
comparative experiments on BERT-based models are listed in
Table 8. For word-level sequence labeling formulation, we just
use the hidden state corresponding to the first character
of the word as input to the linear classifier, which is the
same approach used in [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ] for named entity recognition
task. We find that the precision for word-level is extremely
lower than character-level and the F1-score of word-level
4.4 Word-Level and Character-Level Sequence sequence labeling formulation is more than 20% lower than
          </p>
          <p>Labeling Comparative Experiments character-level formulation. Detailed analysis are conducted
We use word-level and character-level sequence labeling dataset for this result. We assume that Chinese BERT uses
Wordseparately to train and evaluate supervised machine learn- piece tokenizer which will tokenize each Chinese word into
ing baseline models and BERT-based models. characters in the pretraining process. So Chinese BERT is
character-level and has learned good semantic
representation of Chinese characters through pretraining, which can
maximize the advantages of the character-level sequence
labeling formulation and avoid its shortcomings.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>4.5 BERT-based Character-Level Experiments</title>
        <p>From the results of the above word-level and character-level
comparative experiments, we decide to apply character-level
formulation to our BERT-based Chinese keyphrase
extraction model and the best character-level machine learning
baseline model is BiLSTM-CRF. We compare the best
unsupervised method TF*IDF with our character-level sequence
labeling BiLSTM-CRF model and find that sequence
labeling formulation is beneficial for Chinese keyphrase
extraction task. And We use character-level BiLSTM-CRF to
compare with our character-level BERT-based model. The
performance results are summarized in Table 9. Compared with
BiLSTM-CRF, our BERT-based model achieves F1-score of
59.80%, exceeding that of baseline approach by 9.64%, which
shows that the pretrained language model captures rich
features that are useful for downstream keyphrase extraction
task. And we remove single Chinese characters that are not
in the user-defined lexicon. After removal, the keyphrase
extraction performance of our adjusted model reaches to
60.56%.</p>
        <p>And we compare the predicted keyphrases with
authorassigned ground-truth keyphrases and find that some
predicted phrases are concatenation of author-assigned keyphrases.
For example, there are two author-assigned keyphrases ’
’ and ’ ’, while our model extracts keyphrases ’
’. Another example, there are two author-assigned
keyphrases ’ ’ and ’ ’, while our
model extracts keyphrases ’ ’. These
examples indicate that as though our model get the F1-score of
59.80%, our model can achieve good practical application
performance. In addition, it also indicates that the
calculation of evaluation measure is an issue we need to
consider further. Using the proportion of predicted phrases that
exactly match the ground-truth keyphrases to assess the
model is actually not appropriate because there are some
biases for author-assigned keyphrases and sometimes the
phrases predicted by our model are also concise descriptions
for the text.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>In this paper,we formulate automatic keyphrase extraction
as a character-level rather than word-level sequence
labeling task and use pretrained language model BERT to
finetune our keyphrase extraction model on scientific Chinese
medical abstracts. Through our experimental work, we prove
the benefits of this formulation with this architecture, which
bypasses the step of Chinese tokenizer and leverages the
power of pretrained language model. In addition, We also
design comparative experiments to verify that character-level
formulation is more suitable for Chinese keyphrase
extraction task under the trend of pretrained language model.</p>
      <p>Our approach only deals with keyphrase extraction rather
than keyphrase generation, so it can just handle extractive
keyphrases. In the future, we plan to build keyphrase
generation model to extract keyphrases. And also we will explore
the solutions to solve the limitation of BERT’s maximum
sentence length to avoid being truncated. We expect some of
the findings in this paper will provide valuable experiences
for automatic keyphrase extraction and other NLP problems
like document summarization, term extraction etc.
6</p>
    </sec>
    <sec id="sec-5">
      <title>ACKNOWLEDGMENTS</title>
      <p>hTis work is supported by the project ”Research on
Methods and Technologies of Scientific Researcher Entity
Linking and Subject Indexing” (Grant No. G190091) from the
National Science Library, Chinese Academy of Sciences and
the project ”Design and Research on a Next Generation of
Open Knowledge Services System and Key Technologies”
(2019XM55).
Liangping Ding et al.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Isabelle</given-names>
            <surname>Augenstein</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mrinal Das</surname>
          </string-name>
          ,
          <string-name>
            <surname>Sebastian Riedel</surname>
          </string-name>
          , Lakshmi Vikraman, and
          <string-name>
            <surname>Andrew McCallum</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Semeval 2017 task 10: Scienceieextracting keyphrases and relations from scientific publications</article-title>
          .
          <source>arXiv preprint arXiv:1704.02853</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Ken</given-names>
            <surname>Barker</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nadia</given-names>
            <surname>Cornacchia</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Using noun phrase heads to extract document keyphrases</article-title>
          .
          <source>In conference of the canadian society for computational studies of intelligence</source>
          . Springer,
          <fpage>40</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          , Pascal Lamblin, Dan Popovici, and
          <string-name>
            <given-names>Hugo</given-names>
            <surname>Larochelle</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Greedy layer-wise training of deep networks</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          .
          <volume>153</volume>
          -
          <fpage>160</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Gábor</given-names>
            <surname>Berend</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Opinion expression mining by exploiting keyphrase extraction</article-title>
          . (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Adrien</given-names>
            <surname>Bougouin</surname>
          </string-name>
          , Florian Boudin, and
          <string-name>
            <given-names>Béatrice</given-names>
            <surname>Daille</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Topicrank: Graph-based topic ranking for keyphrase extraction</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Ricardo</given-names>
            <surname>Campos</surname>
          </string-name>
          , Vítor Mangaravite, Arian Pasquali, Alípio Mário Jorge, Célia Nunes, and
          <string-name>
            <given-names>Adam</given-names>
            <surname>Jatowt</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A text feature based automatic keyword extraction method for single documents</article-title>
          .
          <source>In European Conference on Information Retrieval</source>
          . Springer,
          <fpage>684</fpage>
          -
          <lpage>691</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Cornelia</given-names>
            <surname>Caragea</surname>
          </string-name>
          , Florin Adrian Bulgarov,
          <source>Andreea Godea, and Sujatha Das Gollapalli</source>
          .
          <year>2014</year>
          .
          <article-title>Citation-enhanced keyphrase extraction from research papers: A supervised approach</article-title>
          . (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Pedro</given-names>
            <surname>Carpena</surname>
          </string-name>
          , Pedro Bernaola-Galván, Michael Hackenberg,
          <source>AV Coronado, and JL Oliver</source>
          .
          <year>2009</year>
          .
          <article-title>Level statistics of words: Finding keywords in literary texts and symbolic sequences</article-title>
          .
          <source>Physical Review E 79</source>
          ,
          <issue>3</issue>
          (
          <year>2009</year>
          ),
          <fpage>035102</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Lee-Feng Chien</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>PAT-tree-based keyword extraction for Chinese information retrieval</article-title>
          .
          <source>In Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval</source>
          .
          <volume>50</volume>
          -
          <fpage>58</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Jonathan</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Cohen</surname>
          </string-name>
          .
          <year>1995</year>
          .
          <article-title>Highlights: Language-and domainindependent automatic indexing terms for abstracting</article-title>
          .
          <source>Journal of the American society for information science 46</source>
          ,
          <issue>3</issue>
          (
          <year>1995</year>
          ),
          <fpage>162</fpage>
          -
          <lpage>174</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Andrew</surname>
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Dai and Quoc V Le</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Semi-supervised sequence learning</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          .
          <volume>3079</volume>
          -
          <fpage>3087</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Soheil</surname>
            <given-names>Danesh</given-names>
          </string-name>
          , Tamara Sumner, and
          <string-name>
            <surname>James H Martin</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Sgrank: Combining statistical and graphical methods to improve the state of the art in unsupervised keyphrase extraction</article-title>
          .
          <source>In Proceedings of the fourth joint conference on lexical and computational semantics</source>
          .
          <volume>117</volume>
          -
          <fpage>126</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Jacob</surname>
            <given-names>Devlin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Dumitru</surname>
            <given-names>Erhan</given-names>
          </string-name>
          , Yoshua Bengio, Aaron Courville,
          <string-name>
            <surname>Pierre-Antoine</surname>
            <given-names>Manzagol</given-names>
          </string-name>
          , Pascal Vincent, and
          <string-name>
            <given-names>Samy</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Why does unsupervised pre-training help deep learning?</article-title>
          <source>Journal of Machine Learning Research</source>
          <volume>11</volume>
          ,
          <string-name>
            <surname>Feb</surname>
          </string-name>
          (
          <year>2010</year>
          ),
          <fpage>625</fpage>
          -
          <lpage>660</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Günes</given-names>
            <surname>Erkan and Dragomir R Radev</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Lexrank: Graph-based lexical centrality as salience in text summarization</article-title>
          .
          <source>Journal of artificial intelligence research 22</source>
          (
          <year>2004</year>
          ),
          <fpage>457</fpage>
          -
          <lpage>479</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Eibe</surname>
            <given-names>Frank</given-names>
          </string-name>
          , Gordon Paynter, Ian Witen, Carl Gutwin, and
          <string-name>
            <surname>Craig</surname>
          </string-name>
          Nevill-Manning.
          <year>1999</year>
          .
          <article-title>Domain-Specific Keyphrase Extraction</article-title>
          . (07
          <year>1999</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>John</surname>
            <given-names>M Giorgi</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Gary D</given-names>
            <surname>Bader</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Transfer learning for biomedical named entity recognition with neural networks</article-title>
          .
          <source>Bioinformatics</source>
          <volume>34</volume>
          ,
          <issue>23</issue>
          (
          <year>2018</year>
          ),
          <fpage>4087</fpage>
          -
          <lpage>4094</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Maria</surname>
            <given-names>Grineva</given-names>
          </string-name>
          , Maxim Grinev, and
          <string-name>
            <given-names>Dmitry</given-names>
            <surname>Lizorkin</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Extracting key terms from noisy and multitheme documents</article-title>
          .
          <source>In Proceedings of the 18th international conference on World wide web. 661-670.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Maryam</surname>
            <given-names>Habibi</given-names>
          </string-name>
          , Leon Weber,
          <string-name>
            <given-names>Mariana</given-names>
            <surname>Neves</surname>
          </string-name>
          , David Luis Wiegandt, and
          <string-name>
            <given-names>Ulf</given-names>
            <surname>Leser</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Deep learning with word embeddings improves biomedical named entity recognition</article-title>
          .
          <source>Bioinformatics</source>
          <volume>33</volume>
          ,
          <issue>14</issue>
          (
          <year>2017</year>
          ),
          <fpage>i37</fpage>
          -
          <lpage>i48</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Kazi</given-names>
            <surname>Saidul</surname>
          </string-name>
          Hasan and
          <string-name>
            <given-names>Vincent</given-names>
            <surname>Ng</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Automatic keyphrase extraction: A survey of the state of the art</article-title>
          .
          <source>In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          .
          <fpage>1262</fpage>
          -
          <lpage>1273</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Geofrey</surname>
            <given-names>E</given-names>
          </string-name>
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <surname>Simon Osindero</surname>
          </string-name>
          , and
          <string-name>
            <surname>Yee-Whye Teh</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>A fast learning algorithm for deep belief nets</article-title>
          .
          <source>Neural computation 18</source>
          ,
          <issue>7</issue>
          (
          <year>2006</year>
          ),
          <fpage>1527</fpage>
          -
          <lpage>1554</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Jeremy</given-names>
            <surname>Howard</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Ruder</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Universal language model ifne-tuning for text classification</article-title>
          . arXiv preprint arXiv:
          <year>1801</year>
          .
          <volume>06146</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Zhiheng</surname>
            <given-names>Huang</given-names>
          </string-name>
          , Wei Xu,
          <string-name>
            <given-names>and Kai</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Bidirectional LSTM-CRF models for sequence tagging</article-title>
          .
          <source>arXiv preprint arXiv:1508</source>
          .
          <year>01991</year>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Anete</given-names>
            <surname>Hulth</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Improved automatic keyword extraction given more linguistic knowledge</article-title>
          .
          <source>In Proceedings of the 2003 conference on Empirical methods in natural language processing. Association for Computational Linguistics</source>
          ,
          <fpage>216</fpage>
          -
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Anete</surname>
            <given-names>Hulth</given-names>
          </string-name>
          , Jussi Karlgren, Anna Jonsson, Henrik Boström, and
          <string-name>
            <given-names>Lars</given-names>
            <surname>Asker</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Automatic keyword extraction using domain knowledge</article-title>
          .
          <source>In International Conference on Intelligent Text Processing and Computational Linguistics</source>
          . Springer,
          <fpage>472</fpage>
          -
          <lpage>482</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Anete</given-names>
            <surname>Hulth and Beáta B Megyesi</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>A study on automatically extracted keywords in text categorization</article-title>
          .
          <source>In Proceedings of the 21st International Conference on Computational Linguistics</source>
          and
          <article-title>the 44th annual meeting of the Association for Computational Linguistics</article-title>
          .
          <source>Association for Computational Linguistics</source>
          ,
          <fpage>537</fpage>
          -
          <lpage>544</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Steve</given-names>
            <surname>Jones</surname>
          </string-name>
          and Mark S Staveley.
          <year>1999</year>
          .
          <article-title>Phrasier: a system for interactive document retrieval using keyphrases</article-title>
          .
          <source>In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval</source>
          .
          <volume>160</volume>
          -
          <fpage>167</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Kelleher</surname>
          </string-name>
          and
          <string-name>
            <given-names>Saturnino</given-names>
            <surname>Luz</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Automatic hypertext keyphrase detection</article-title>
          .
          <source>In IJCAI</source>
          , Vol.
          <volume>5</volume>
          .
          <fpage>1608</fpage>
          -
          <lpage>1609</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Su</given-names>
            <surname>Nam</surname>
          </string-name>
          Kim and
          <string-name>
            <surname>Min-Yen Kan</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Re-examining automatic keyphrase extraction approaches in scientific articles</article-title>
          .
          <source>In Proceedings of the workshop on multiword expressions: Identification</source>
          ,
          <article-title>interpretation, disambiguation and applications</article-title>
          .
          <source>Association for Computational Linguistics</source>
          ,
          <fpage>9</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Su</given-names>
            <surname>Nam</surname>
          </string-name>
          <string-name>
            <given-names>Kim</given-names>
            , Olena Medelyan,
            <surname>Min-Yen Kan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Timothy</given-names>
            <surname>Baldwin</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles</article-title>
          .
          <source>In Proceedings of the 5th International Workshop on Semantic Evaluation</source>
          .
          <fpage>21</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>John</surname>
            <given-names>Laferty</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andrew McCallum</surname>
          </string-name>
          , and
          <source>Fernando CN Pereira</source>
          .
          <year>2001</year>
          .
          <article-title>Conditional random fields: Probabilistic models for segmenting and labeling sequence data</article-title>
          . (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>Tho</given-names>
            <surname>Thi Ngoc Le</surname>
          </string-name>
          , Minh Le Nguyen, and
          <string-name>
            <given-names>Akira</given-names>
            <surname>Shimazu</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Unsupervised keyphrase extraction: Introducing new kinds of words to keyphrases</article-title>
          .
          <source>In Australasian Joint Conference on Artificial Intelligence</source>
          . Springer,
          <fpage>665</fpage>
          -
          <lpage>671</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>Jinhyuk</given-names>
            <surname>Lee</surname>
          </string-name>
          , Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and
          <string-name>
            <given-names>Jaewoo</given-names>
            <surname>Kang</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BioBERT: pre-trained biomedical language representation model for biomedical text mining</article-title>
          . arXiv preprint arXiv:
          <year>1901</year>
          .
          <volume>08746</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <surname>Zhiyuan</surname>
            <given-names>Liu</given-names>
          </string-name>
          , Wenyi Huang, Yabin Zheng, and
          <string-name>
            <given-names>Maosong</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Automatic keyphrase extraction via topic decomposition</article-title>
          .
          <source>In Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics</source>
          ,
          <fpage>366</fpage>
          -
          <lpage>376</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Zhiyuan</surname>
            <given-names>Liu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Peng</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Yabin</given-names>
            <surname>Zheng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Maosong</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Clustering to find exemplar terms for keyphrase extraction</article-title>
          .
          <source>In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics</source>
          ,
          <fpage>257</fpage>
          -
          <lpage>266</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Hans</surname>
            <given-names>Peter</given-names>
          </string-name>
          <string-name>
            <surname>Luhn</surname>
          </string-name>
          .
          <year>1957</year>
          .
          <article-title>A statistical approach to mechanized encoding and searching of literary information</article-title>
          .
          <source>IBM Journal of research and development 1</source>
          ,
          <issue>4</issue>
          (
          <year>1957</year>
          ),
          <fpage>309</fpage>
          -
          <lpage>317</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>Yutaka</given-names>
            <surname>Matsuo</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mitsuru</given-names>
            <surname>Ishizuka</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Keyword extraction from a single document using word co-occurrence statistical information</article-title>
          .
          <source>International Journal on Artificial Intelligence Tools</source>
          <volume>13</volume>
          ,
          <issue>01</issue>
          (
          <year>2004</year>
          ),
          <fpage>157</fpage>
          -
          <lpage>169</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <surname>Olena</surname>
            <given-names>Medelyan</given-names>
          </string-name>
          , Eibe Frank, and
          <string-name>
            <surname>Ian H Witen</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Humancompetitive tagging using automatic keyphrase extraction</article-title>
          .
          <source>In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3-Volume 3. Association for Computational Linguistics</source>
          ,
          <fpage>1318</fpage>
          -
          <lpage>1327</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>Rada</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          and
          <string-name>
            <given-names>Paul</given-names>
            <surname>Tarau</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Textrank: Bringing order into text</article-title>
          .
          <source>In Proceedings of the 2004 conference on empirical methods in natural language processing</source>
          .
          <volume>404</volume>
          -
          <fpage>411</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>Eirini</given-names>
            <surname>Papagiannopoulou</surname>
          </string-name>
          and
          <string-name>
            <given-names>Grigorios</given-names>
            <surname>Tsoumakas</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>A review of keyphrase extraction</article-title>
          .
          <source>Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</source>
          <volume>10</volume>
          ,
          <issue>2</issue>
          (
          <year>2020</year>
          ),
          <year>e1339</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <surname>Mathew</surname>
            <given-names>E Peters</given-names>
          </string-name>
          , Mark Neumann, Mohit Iyyer, Mat Gardner, Christopher Clark,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Luke</given-names>
            <surname>Zetlemoyer</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>arXiv preprint arXiv:1802</source>
          .
          <volume>05365</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <surname>Alec</surname>
            <given-names>Radford</given-names>
          </string-name>
          , Karthik Narasimhan, Time Salimans, and
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Improving language understanding with unsupervised learning</article-title>
          .
          <source>Technical report, OpenAI</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <surname>Lance</surname>
            <given-names>A</given-names>
          </string-name>
          <string-name>
            <surname>Ramshaw and Mitchell P Marcus</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>Text chunking using transformation-based learning</article-title>
          .
          <source>In Natural language processing using very large corpora</source>
          . Springer,
          <fpage>157</fpage>
          -
          <lpage>176</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <surname>Dhruva</surname>
            <given-names>Sahrawat</given-names>
          </string-name>
          , Debanjan Mahata, Mayank Kulkarni, Haimin Zhang, Rakesh Gosangi, Amanda Stent, Agniv Sharma, Yaman Kumar, Rajiv Ratn Shah, and
          <string-name>
            <given-names>Roger</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Keyphrase Extraction from Scholarly Articles as Sequence Labeling using Contextualized Embeddings</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>08840</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>Gerard</given-names>
            <surname>Salton</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Buckley</surname>
          </string-name>
          .
          <year>1988</year>
          .
          <article-title>Term-weighting approaches in automatic text retrieval</article-title>
          .
          <source>Information processing &amp; management 24</source>
          ,
          <issue>5</issue>
          (
          <year>1988</year>
          ),
          <fpage>513</fpage>
          -
          <lpage>523</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <surname>Gerard</surname>
            <given-names>Salton</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chung-Shu Yang</surname>
          </string-name>
          , and
          <string-name>
            <surname>CLEMENT T Yu</surname>
          </string-name>
          .
          <year>1975</year>
          .
          <article-title>A theory of term importance in automatic text analysis</article-title>
          .
          <source>Journal of the American society for Information Science</source>
          <volume>26</volume>
          ,
          <issue>1</issue>
          (
          <year>1975</year>
          ),
          <fpage>33</fpage>
          -
          <lpage>44</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>Takashi</given-names>
            <surname>Tomokiyo</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mathew</given-names>
            <surname>Hurst</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>A language model approach to keyphrase extraction</article-title>
          .
          <source>In Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment</source>
          .
          <volume>33</volume>
          -
          <fpage>40</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <surname>Peter</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Tumey</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>Learning to extract keyphrases from text</article-title>
          .
          <source>NRC Technical Report ERB-l 057. National Research Council</source>
          , Canada (
          <year>1999</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <surname>Peter</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Turney</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Learning algorithms for keyphrase extraction</article-title>
          .
          <source>Information retrieval 2</source>
          ,
          <issue>4</issue>
          (
          <year>2000</year>
          ),
          <fpage>303</fpage>
          -
          <lpage>336</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <surname>Ashish</surname>
            <given-names>Vaswani</given-names>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
          <string-name>
            <surname>Łukasz Kaiser</surname>
            , and
            <given-names>Illia</given-names>
          </string-name>
          <string-name>
            <surname>Polosukhin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Attention is all you need</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          .
          <volume>5998</volume>
          -
          <fpage>6008</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>Xiaojun</given-names>
            <surname>Wan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jianguo</given-names>
            <surname>Xiao</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Single Document Keyphrase Extraction Using Neighborhood Knowledge.</article-title>
          .
          <string-name>
            <surname>In</surname>
            <given-names>AAAI</given-names>
          </string-name>
          , Vol.
          <volume>8</volume>
          .
          <fpage>855</fpage>
          -
          <lpage>860</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [52]
          <string-name>
            <surname>Minmei</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bo Zhao</surname>
            ,
            <given-names>and Yihua</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Ptr: Phrase-based topical ranking for automatic keyphrase extraction in scientific publications</article-title>
          .
          <source>In International Conference on Neural Information Processing</source>
          . Springer,
          <fpage>120</fpage>
          -
          <lpage>128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [53]
          <string-name>
            <surname>Xuan</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Yu Zhang, Xiang Ren, Yuhao Zhang, Marinka Zitnik, Jingbo Shang, Curtis Langlotz, and Jiawei Han.
          <year>2019</year>
          .
          <article-title>Cross-type biomedical named entity recognition with deep multi-task learning</article-title>
          .
          <source>Bioinformatics</source>
          <volume>35</volume>
          ,
          <issue>10</issue>
          (
          <year>2019</year>
          ),
          <fpage>1745</fpage>
          -
          <lpage>1752</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          [54]
          <string-name>
            <surname>Ian</surname>
            <given-names>H Witen</given-names>
          </string-name>
          , Gordon W Paynter, Eibe Frank, Carl Gutwin, and
          <string-name>
            <surname>Craig G Nevill-Manning</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Kea: Practical automated keyphrase extraction</article-title>
          .
          <source>In Design and Usability of Digital Libraries: Case Studies in the Asia Pacific. IGI global</source>
          ,
          <volume>129</volume>
          -
          <fpage>152</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          [55]
          <string-name>
            <surname>Yonghui</surname>
            <given-names>Wu</given-names>
          </string-name>
          , Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao,
          <string-name>
            <given-names>Qin</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Klaus</given-names>
            <surname>Macherey</surname>
          </string-name>
          , et al.
          <year>2016</year>
          .
          <article-title>Google's neural machine translation system: Bridging the gap between human and machine translation</article-title>
          .
          <source>arXiv preprint arXiv:1609.08144</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          [56]
          <string-name>
            <given-names>Chengzhi</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Automatic keyword extraction from documents using conditional random fields</article-title>
          .
          <source>Journal of Computational Information Systems 4</source>
          ,
          <issue>3</issue>
          (
          <year>2008</year>
          ),
          <fpage>1169</fpage>
          -
          <lpage>1180</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          [57]
          <string-name>
            <surname>Qi</surname>
            <given-names>Zhang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yeyun Gong</surname>
          </string-name>
          , and
          <string-name>
            <surname>Xuan-Jing Huang</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Keyphrase extraction using deep recurrent neural networks on twitter</article-title>
          .
          <source>In Proceedings of the 2016 conference on empirical methods in natural language processing</source>
          .
          <volume>836</volume>
          -
          <fpage>845</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          [58]
          <string-name>
            <surname>Yongzheng</surname>
            <given-names>Zhang</given-names>
          </string-name>
          , Nur Zincir-Heywood, and
          <string-name>
            <given-names>Evangelos</given-names>
            <surname>Milios</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>World wide web site summarization</article-title>
          .
          <source>Web Intelligence and Agent Systems: An International Journal 2</source>
          ,
          <issue>1</issue>
          (
          <year>2004</year>
          ),
          <fpage>39</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          [59]
          <string-name>
            <given-names>Wayne</given-names>
            <surname>Xin</surname>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Jing</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jing</surname>
            <given-names>He</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            <given-names>Song</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palakorn Achananuparp</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ee-Peng Lim</surname>
            , and
            <given-names>Xiaoming</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Topical keyphrase extraction from twiter. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologiesvolume 1</article-title>
          . Association for Computational Linguistics,
          <fpage>379</fpage>
          -
          <lpage>388</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          [60] , , , and .
          <year>2013</year>
          . .
          <source>Ph.D. Dissertation.</source>
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          [61] , , , and .
          <year>2004</year>
          . .
          <source>Ph.D. Dissertation.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>