<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extracting Domain Entities from Scientific Papers Leveraging Author Keywords</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jiabin Peng</string-name>
          <email>2542505085@qq.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jing Chen</string-name>
          <email>chenjinguuu@126.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guo Chen</string-name>
          <email>delphi1987@qq.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Information Extraction, Domain Named Entity Recognition,</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Analysis of Scientific Papers</institution>
          ,
          <addr-line>Author Keywords</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Economics &amp;, Management, Nanjing University of Science and, Technology</institution>
          ,
          <addr-line>Nanjing, JiangSu</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>41</fpage>
      <lpage>45</lpage>
      <abstract>
        <p>Current methods of domain entity extraction of scientific texts rely heavily on manually annotation corpus and thus have poor generalization ability. In this paper, we proposed a two-stage methodology that can make good use of existed author keywords of the given domain to solve this problem. Firstly, the author keyword set was used to mark the boundary of candidate entities, and then their features are integrated to classify their entity type. In the experiment on artificial intelligence (AI) documents from WOS, our approach obtains an F1 value of 0.753 without manual annotation, which is slightly lower than the BERT-BiLSTM-CRF baseline model (F1=0.772) trained on manual annotation corpus, showing the usability of our approach in practice.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Computing methodologies • Artificial intelligence • Natural
language processing • Information extraction
At present, there have been many studies on knowledge entity
extraction in scientific papers, and the biggest problem is the lack
of labeled data[1]. As we all know, scientific papers usually
belong to a specific domain, so manual annotation needs
corresponding domain knowledge, which makes the annotation
more expensive, and many popular named entity recognition
(NER) models can not play their inherent excellent performance.
To ensure the generalization ability of NER models, it is necessary
to reduce their dependence on manual annotation. At present,
thanks to the rapid development of databases and the Internet, a
large number of knowledge resources have been accumulated in
many domains, such as knowledge bases, gazetteers, glossaries,
dictionaries, etc. These resources are widely used in NER models
of distant supervision[2] or semi-supervision learning[3], which
reduces the dependence of models on labeled data and improves
the generalization ability of models to a certain extent.
Actually, domain entity extraction can be divided into two
subtasks: entity boundary recognition and entity type
classification. Taking the domain of artificial intelligence (AI) as
an example, we firstly used the domain glossary to help to identify
the entity boundaries and then constructed a low-cost training data
to classify the entities. Problems and solutions were viewed as the
key-insights of scientific papers[1], so we took them as the main
entity types in the experiment. According to related studies, we
summarized the research objectives, domains, applications, and
tasks in technical papers as problems, and the methods, schemes,
models, technologies, tools, software, algorithms, and theories
used to solve these problems as solutions[4][5][6]. The
experimental results showed that a good index of our methodology
was obtained without manual annotation, and the F1-measure
reached 0.753.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Studies</title>
      <p>At present, the mainstream methods of domain NER are divided
into two categories: methods based on statistical machine learning
(ML) and methods based on deep learning (DL). The NER method
based on ML is essentially classification, that is, given multiple
types of named entities, and then models are used to classify the
entities in the text. And there are two ideas in the implementation.</p>
      <p>One is to identify the boundaries of all named entities in the text
firstly, and then classify them into different types, such as
CoBoost[7]. The other is sequence annotation. Each word in the
text is given several candidate type labels, which correspond to its
position in various entities. The classical NER models based on
sequence annotation in ML include HMM[8], CRF[9], etc. The
NER models based on DL use pre-trained word vectors to
represent words, which can solve the problem of data sparsity in
high latitude vector space. Meanwhile, pre-trained word vectors
contain more semantic information than manually selected
features and can obtain the feature representation in unified vector
space from heterogeneous texts, which brings strong development
for sequence annotation tasks, especially for NER[10].</p>
      <p>Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
The biggest problem of domain NER is the lack of labeled corpus
nowadays. When a general NER method is applied to a specific
domain, corresponding adjustment strategies need to be taken
according to the domain corpus. A common idea is to use transfer
learning to share data and models among domains. Ni et al.
projected labeled data and distributed representation of words
without manual annotation in the target domain[11]. Giorgi et al.
transferred the source domain model parameters to the target
domain for initialization, and then fine-tune the parameters to
meet the task[12]. Another idea is to make full use of the existing
knowledge resources in domains to automatically build datasets
and carry out distant supervision, semi-supervision, weak
supervision, etc. Nooralahzadeh et al. adopted a technique of
partial annotation and implemented a reinforcement learning
strategy with a neural network policy in distant supervision
NER[2]. Peters et al. demonstrated a general semi-supervised
approach for adding pre-trained context embeddings from
bidirectional language models to NLP systems and apply it to
NER[3]. Lison et al. relied on a broad spectrum of labeling
functions to automatically annotate texts from the target
domain[13].</p>
      <p>From the above researches, it could be seen that various domain
resources were widely used to reduce the manual annotation cost
as much as possible, which thus achieved good results. However,
the domain NER models based on transfer learning,
semisupervision etc. still could not avoid manual participation in the
construction of datasets. Therefore, after analyzing the essence of
the NER task, domain NER was divided into two subtasks in this
paper, which avoided manual annotation with the help of domain
resources. In addition, some new ideas such as zero-shot
learning[14] and learning with noisy labels[15] have also been
applied to domain NER to help further reduce labor costs.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
    </sec>
    <sec id="sec-4">
      <title>Framework</title>
      <p>Traditional NER was regarded as a sequence labeling task, which
assigned the corresponding entity type and location label to each
token in the text. In fact, NER could be regarded as two subtasks:
boundary recognition and entity classification. That was, we can
firstly identify the boundaries of named entities in the text, and
then classify them into different types. The NER method based on
sequence labeling treated two subtasks as a whole, in which the
same labeled data was shared by two subtasks so that the
requirement for its quality was pretty high. As a result, many
classic NER methods could not be applied in some subdivided
domains. In addition, the NER method based on sequence labeling
could not be effectively integrated into the existing domain
resources. At present, the common practice was to use domain
terms as auxiliary data to help roughly label data. On the contrary,
by dividing NER into boundary recognition and entity
classification, we could make full use of the existing domain
knowledge resources.</p>
      <p>Entity boundary recognition could be regarded as a word
segmentation task, which required large-scale resources (i.e.
userdefined lexicon). And there usually exist some domain glossaries
and a large-scale author keyword set in a given domain, which can
help to solve the word segmentation task. Compared with word
segmentation, entity classification required smaller-scale resources
(i.e. training data). At present, many domains knowledge can be
obtained easily through an online database or knowledge graph,
which can provide necessary training data for entity type
classification without manual annotation. To sum up, the
framework of this paper was shown in Figure 1.</p>
      <sec id="sec-4-1">
        <title>Domain Resources</title>
        <p>Entity Boundary Recognition</p>
      </sec>
      <sec id="sec-4-2">
        <title>Entity Classification</title>
      </sec>
      <sec id="sec-4-3">
        <title>Document</title>
      </sec>
      <sec id="sec-4-4">
        <title>Glossary</title>
      </sec>
      <sec id="sec-4-5">
        <title>Author Keywords Abstract</title>
      </sec>
      <sec id="sec-4-6">
        <title>Lexicon</title>
      </sec>
      <sec id="sec-4-7">
        <title>Segmentation</title>
        <p>The framework was divided into three parts. The first was the
acquisition of domain resources. The domain resources used in
this paper included domain glossary and domain documents.
Domain glossary could be obtained directly through browser (such
as Google, Firefox, etc.) or relevant domain knowledge websites
(such as Wikipedia, Baidu baike, etc.). In addition, we also got the
types of terms when constructing a domain glossary. Domain
documents could be obtained through databases (such as WOS,
CNKI, etc.), then author keywords and abstract were extracted.
Author keywords were indispensable large-scale resources for
entity boundary recognition, and abstracts could be used in
constructing features for the training data. The second was entity
boundary recognition, which was regarded as a word segmentation
task. The user-defined lexicon of the word segmentation task was
constructed by combining domain glossary and author keywords
set and helped to realize entity boundary recognition at a low cost.
The third was entity classification. The training data required for</p>
      </sec>
      <sec id="sec-4-8">
        <title>Training Data</title>
      </sec>
      <sec id="sec-4-9">
        <title>Features</title>
      </sec>
      <sec id="sec-4-10">
        <title>Word Vector</title>
      </sec>
      <sec id="sec-4-11">
        <title>Pos of Speech</title>
      </sec>
      <sec id="sec-4-12">
        <title>Word Case</title>
      </sec>
      <sec id="sec-4-13">
        <title>Model</title>
      </sec>
      <sec id="sec-4-14">
        <title>Evaluation &amp;</title>
        <p>Optimization
classification were extracted from the domain glossary, and the
text features were obtained from abstract training or counting.
3.2</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Implementation</title>
      <p>3.2.1 Entity Boundary Recognition. As mentioned above, entity
boundary recognition was transformed into word segmentation.
Inspired by the Chinese automatic word segmentation method, the
forward maximum matching algorithm based on string matching
was used in this paper. Slightly different from Chinese word
segmentation, it was necessary to extract the stem of English
words before English word segmentation to avoid the influence of
word form on word segmentation. In addition, there would be
some noise when using word segmentation lexicon to label the
candidate entities, so the following entity classification task was
actually multi-class.</p>
      <p>3.2.2 Entity Classification. Entity classification was essentially
word classification, which was a typical supervised task, and the
training data was indispensable. At present, it is difficult to
construct a large number of high-quality classification data in a
given domain. However, a small number of high-quality data
could usually be obtained at a low cost with the help of domain
knowledge bases or domain experts.
1) Construct training data. Training data consisted of positive
samples and negative samples. Positive samples consisted of
entities and corresponding types. And the pre-constructed
glossary contained the types of terms, so we directly
extracted some high-quality terms and their types as the
positive samples. Negative samples, i.e. non-entities, were
randomly extracted from keyword sets and texts.
2) Construct text features. According to the task, word vector,
part of speech (POS) feature, and word case feature were
constructed. Word vectors could be obtained by training
large-scale domain unlabeled corpus, and semantic
information was given to discrete words according to the
context. POS feature was obtained by counting the corpus
without word segmentation. The acquisition of the case
feature was basically consistent with the POS feature, but the
corpus with word segmentation was used and the cases
needed to be self-defined.</p>
      <sec id="sec-5-1">
        <title>3) Model selection, training, evaluation, and optimization.</title>
        <p>According to the task, the models we used included four
classical machine learning models: Random Forest (RF),
KNearest Neighbor (KNN), Support Vector Machine (SVM),
Multilayer Perceptron (MLP), and TextCNN, which
performed well in sentence classification[16]. The detailed
steps of our experiment were as follows: ① feeding the
training data to models to obtain the basic results; ②
optimizing the word vector according to the model effect and
adding features in the training process; ③ evaluating the
effect of models to decide whether to continue optimizing.
3.3</p>
        <p>Feature Processing
3.3.1 Word Vector. At present, common models for training
word vectors included Word2Vec, GloVe, ELMO, GPT, BERT,
etc. The word vectors trained by the first two models were
context-independent. Because our core task was phrase
classification, which not needed context information, we chose
Word2Vec to train word vectors. Before training, we used
underline to concatenate the words in a phrase in the corpus after
word segmentation, to ensure that phrases were regarded as a
whole when training word vectors.</p>
        <p>Word2Vec included two algorithms: Skip-gram and CBOW.
Research showed that Skip-gram contained more semantic
information, while CBOW contains more grammatical
information[17]. The window size was also very important for
training word vectors, and the commonly used window sizes were
5 and 101. Therefore, we would first explore the above two factors
affecting word vector results in the classification experiment.
According to related studies[18], other parameters were shown in
Table 1. In addition, to make word vectors more robust, we used
stemmed corpus to train Word2Vec.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Parameters Values</title>
        <p>sg 1 / 0
window size(w) 5 / 10
min count 5
iteration number 20
embedding size 200</p>
        <p>3.3.2 POS Feature. POS of words in sentences could be
obtained through the Python third-party package nltk. There were
36 kinds of POS in nltk, which meant the length of the POS vector
of a single word was 36. In the classification experiment, POS
vectors of training data were obtained by concatenating the POS
vectors of component words. To avoid the inconsistent length of
POS vectors, we counted the lengths of phrases (the number of
words in phrases) in the segmentation lexicon to obtain the
maximum phrase length. When the lengths of training data were
less than the maximum phrase length, POS vectors of training data
would be padded with 0. Finally, the length of POS vectors of
training data was 36 * max(len(phrase{lexicon})) (Maximum length
of phrases in lexicon). POS vectors were used by concatenating
word vectors in the experiment.</p>
        <p>3.3.3 Case Feature. Three types of phrase cases were defined
in this paper: initial uppercase, all uppercase and all lowercase.
The lengths of case vectors of training data were 3. Similarly, case
vectors were used by concatenating word vectors.
3.4</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Classification Models</title>
      <p>Classification models used in this paper included RF, KNN, SVM,
MLP, and TextCNN. The first four models were implemented by
sklearn in Python. In the RF, the number of decision trees was set
to 100. All parameters of the KNN took the default values. In the
SVM, the probability was set to True, that was, probability
estimation was enabled. In the MLP, the number of neurons in the
1 https://www.bbsmax.com/A/A2dm2D7zen/
hidden layer was (100, 50). TextCNN was originally used to
classify sentences, and phrases could be regarded as shorter
sentences. The input of TextCNN 2 were vectors generated by
Word2Vec. The embedding size, sequence length, batch size, and
training epoch in TextCNN were set as 200, 10, 32, and 20
respectively. Parameters not mentioned above took the default
values.
4</p>
    </sec>
    <sec id="sec-7">
      <title>Experiment</title>
      <p>To verify the effectiveness of our methodology, we took the
domain of AI as an example in the experiment.
4.1</p>
    </sec>
    <sec id="sec-8">
      <title>Data Acquisition and Preprocessing</title>
      <p>Firstly, we obtained the bibliography data of the AI domain. The
data was from the category of AI in the core collection of WOS
(Web of Science). Documents were retrieved with WC = computer
science and WC = artificial intelligence, and the time range was
set as from 1996 to 2020. Then, abstracts and keywords were
extracted from the bibliography data, including 927675 abstracts
and 161169 keywords.</p>
      <p>Secondly, we constructed a glossary of AI domain. The data came
from a knowledge website3 in AI domain, from which we obtained
all problem and solution entities. The problem entities were from
the tasks of the Browse State-of-the-Art page in the website, and
the solution entities were from the machine learning components
of the Methods page. After removing duplications, 1887 problem
entities and 1209 solution entities were remained.</p>
      <p>Finally, we processed the above data to get the final experimental
data. The user-defined lexicon of English word segmentation was
constructed by merging keywords set and domain glossary. The
training data of classifiers consisted of entities and non-entities, in
which 360 entities of each type were manually extracted from the
glossary, and 360 non-entities were manually constructed.
Nonentities included phrases and words, in which phrases were
extracted from high-frequency keywords and words were
constructed randomly. The ratio of phrases to words in
nonentities was about 2:1. This was because almost all entities were
phrases, so more phrase-level non-entities were needed to help
train models. Finally, 1080 pieces of classification data were
obtained. The training set and validation set were randomly
divided according to the ratio of 5:1. In addition, to evaluate the
performance of our methodology, we set our baseline as the
traditional BERT-BiLSTM-CRF NER model. A previous
annotated corpus containing 3000 sentences was used for the
baseline model, in which 2000 sentence were randomly selected as
the training data and the rest 1000 sentences were used as the
common test set.
4.2</p>
    </sec>
    <sec id="sec-9">
      <title>Result Analysis</title>
      <p>2 https://github.com/cjymz886/text-cnn
3 https://paperswithcode.com/</p>
      <p>The macro average of precision, recall, and F1-measure were used
to evaluate the models.</p>
      <p>Word vectors were the basic input of models, so we firstly
explored the influences of word vectors trained by the two
algorithms in different window sizes, and the results were shown
in Table 2.</p>
      <p>RF
KNN
SVM</p>
      <p>MLP
TextCNN</p>
      <p>Firstly, we compared the results in Table 2 vertically. When sg=1,
five models in two window sizes achieved good results on the
whole. Only when w=5, KNN performed poorly. However, when
sg=0, the performances of all models decreased, especially KNN
and MLP. The possible reason was that Skip-gram focused on
semantics, which was more conducive to the NER task than
CBOW. In addition, KNN and MLP had higher requirements for
data quality, but word vectors trained by CBOW could not meet
the requirement. Secondly, we compared the results in Table 2
horizontally, and the best F1-measures of each model were about
0.7, which were highlighted in bold in Table 2. In the following
experiments, the word vector that made each model achieved the
best performance was used, and POS features and case features
were added to models. The results were shown in Table 3.
It could be found in Table 3 that the addition of two features
effectively improved the F1-measures. When all features were
fused, optimal results were obtained in all models. Among the five
models, SVM had the best performance, with an F1-measure of
0.753. This might be because the underlying training mechanism
of SVM made SVM more suitable for small sample classification.
When the word vector parameters were sg=1 and w=10, the voting
model had the best performance, and the F1-measure was 0.752.
SVM or voting model could be selectively used in practical
application. The best F1-measure of TextCNN was 0.715, which
was far from its performance in sentence classification. One
possible reason was that the length of phrases was much smaller
than sentences.</p>
      <p>The baseline BERT-BiLSTM-CRF4 performed well on the domain
NER task. Its F1-measure was 0.772, which was far less than its
performance in the general NER task, but it had been a very good
result in the subdivided domain. And the result was 0.019 higher
than our optimal model. From the experimental result, there was
still a gap in our methodology, but from the cost of experimental
data, the gap was acceptable. In the following work, we can
further optimize the word vectors and add more features to
improve the performance.
4 https://github.com/macanv/BERT-BiLSTM-CRF-NER
5</p>
    </sec>
    <sec id="sec-10">
      <title>Conclusion</title>
      <p>Aiming at the problem that the current domain NER models
heavily rely on manually annotation data and thus has poor
domain generalization ability, we propose a two-stage knowledge
entity extraction methodology, which can get rid of the
dependence on manually annotation data. Experiments in WOS
documents in the domain of AI showed that good results can be
achieved in the extraction of problem and solution entities without
manual annotation using our approach.</p>
      <p>In general, our approach has good domain generalization because
it does not need manual annotation, and can be applied to many
subdivided domains at a low cost. However, the performance of
our scheme still has some room for improvement. In the follow-up
work, we can try to use better word vectors and more features to
improve the accuracy of entity extraction, and gradually extend
the model to the extraction of more knowledge types.</p>
    </sec>
    <sec id="sec-11">
      <title>ACKNOWLEDGMENTS</title>
      <p>This study is supported by the MOE (Ministry of Education in
China) Project of Humanities and Social Sciences.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Zara</given-names>
            <surname>Nasar</surname>
          </string-name>
          ,
          <source>Syed Waqar Jaffry and Muhammad Kamran Malik</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>Information extraction from scientific articles: a survey</article-title>
          .
          <source>Scientometrics 117</source>
          <volume>3</volume>
          (
          <issue>2018</issue>
          ),
          <fpage>1931</fpage>
          -
          <lpage>1990</lpage>
          . DOI: https://doi.org/10.1007/s11192-018-2921-5.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Nooralahzadeh</given-names>
            <surname>Farhad</surname>
          </string-name>
          ,
          <source>Lønning Tore Jan and Øvrelid Lilja</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>Reinforcement-based denoising of distantly supervised NER with partial annotation</article-title>
          .
          <source>In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP</source>
          .
          <fpage>225</fpage>
          -
          <lpage>233</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Peters E.</given-names>
            <surname>Matthew</surname>
          </string-name>
          , Ammar Waleed and
          <string-name>
            <given-names>Bhagavatula</given-names>
            <surname>Chandra</surname>
          </string-name>
          , et al.,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>Semi-supervised sequence tagging with bidirectional language models</article-title>
          .
          <source>In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics</source>
          .
          <fpage>1756</fpage>
          -
          <lpage>1765</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Gupta</given-names>
            <surname>Sonal and Manning</surname>
          </string-name>
          <string-name>
            <surname>D</surname>
          </string-name>
          ,
          <year>2011</year>
          .
          <article-title>Analyzing the dynamics of research by extracting key aspects of scientific papers</article-title>
          .
          <source>In Proceedings of the 5th International Joint Conference on Natural Language Processing. 1-9.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Singh</given-names>
            <surname>Mayank</surname>
          </string-name>
          , Dan Soham, Agarwal Sanyam,
          <source>Goyal Pawan and Mukherjee Animesh</source>
          ,
          <year>2017</year>
          .
          <article-title>AppTechMiner: Mining Applications and Techniques from Scientific Articles</article-title>
          .
          <source>In Proceedings of the Joint Conference on Digital Libraries Joint Conference on Digital Libraries. 1-8.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Heffernan</given-names>
            <surname>Kevin and Teufel Simone</surname>
          </string-name>
          ,
          <year>2018</year>
          .
          <article-title>Identifying Problems and Solutions in Scientific Text</article-title>
          .
          <source>Scientometrics 116</source>
          <volume>2</volume>
          (
          <issue>2018</issue>
          ),
          <fpage>1367</fpage>
          -
          <lpage>1382</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>DOI: https://doi.org/10.1007/s11192-018-2718-6.</mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Michael</given-names>
            <surname>Collins and Yoram Singer</surname>
          </string-name>
          ,
          <year>1999</year>
          .
          <article-title>Unsupervised models for named entity classification</article-title>
          .
          <source>In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Zhou</given-names>
            <surname>Guodong and Su Jian</surname>
          </string-name>
          ,
          <year>2002</year>
          .
          <article-title>Named entity recognition using an HMMbased chunk tagger</article-title>
          .
          <source>In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Stroudsburg: Association for Computational Linguistics</source>
          ,
          <fpage>473</fpage>
          -
          <lpage>480</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>McCallum</given-names>
            <surname>Andrew and Li Wei</surname>
          </string-name>
          ,
          <year>2003</year>
          .
          <article-title>Early results for named entity recognition with conditional random fields, feature induction and webenhanced lexicons</article-title>
          .
          <source>In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL. Stroudsburg: Association for Computational Linguistics</source>
          ,
          <fpage>188</fpage>
          -
          <lpage>191</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Cherry</given-names>
            <surname>Colin and Guo Hongyu</surname>
          </string-name>
          ,
          <year>2015</year>
          .
          <article-title>The unreasonable effectiveness of word representations for Twitter named entity recognition</article-title>
          .
          <source>In The 2015 Annual Conference of the North American Chapter of the ACL. Stroudsburg: Association for Computational Linguistics</source>
          ,
          <fpage>735</fpage>
          -
          <lpage>745</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Ni</given-names>
            <surname>Jian</surname>
          </string-name>
          ,
          <source>Dinu Georgiana and Florian Radu</source>
          ,
          <year>2017</year>
          .
          <article-title>Weakly Supervised CrossLingual Named Entity Recognition via Effective Annotation and Representation Projection</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting on Association for Computational Linguistics</source>
          .
          <fpage>1470</fpage>
          -
          <lpage>1480</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Giorgi M John and Bader D Gary</surname>
          </string-name>
          ,
          <year>2018</year>
          .
          <article-title>Transfer learning for biomedical named entity recognition with neural networks</article-title>
          .
          <source>Bioinformatics</source>
          <volume>34</volume>
          23 (
          <year>2018</year>
          ),
          <fpage>4087</fpage>
          -
          <lpage>4094</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Lison</given-names>
            <surname>Pierre</surname>
          </string-name>
          , Barnes Jeremy, Hubin Aliaksandr and
          <string-name>
            <given-names>Touileb</given-names>
            <surname>Samia</surname>
          </string-name>
          .
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Named</given-names>
            <surname>Entity</surname>
          </string-name>
          <article-title>Recognition without Labelled Data: A Weak Supervision Approach</article-title>
          .
          <source>In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>
          .
          <fpage>1518</fpage>
          -
          <lpage>1533</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <surname>Damai</surname>
          </string-name>
          , et al.
          <article-title>"Inductively Representing Out-of-Knowledge-Graph Entities by Optimal Estimation Under Translational Assumptions</article-title>
          .
          <source>"</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Ifeoluwa</given-names>
            <surname>David Adelani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Michael</given-names>
            <surname>Hedderich</surname>
          </string-name>
          , Dawei Zhu, den Esther van Berg and
          <string-name>
            <surname>Dietrich Klakow</surname>
          </string-name>
          ,
          <year>2020</year>
          .
          <article-title>Distant Supervision and Noisy Label Learning for Low Resource Named Entity Recognition: A Study on Hausa and Yor\`ub\'a</article-title>
          . arXiv preprint arXiv:
          <year>2003</year>
          .
          <volume>08370</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Kim Y . Convolutional</surname>
          </string-name>
          <article-title>Neural Networks for Sentence Classification</article-title>
          [J].
          <source>Eprint Arxiv</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Mikolov</given-names>
            <surname>Tomas</surname>
          </string-name>
          , Chen Kai,
          <source>Corrado Greg and Dean Jeffrey</source>
          ,
          <year>2013</year>
          .
          <article-title>Efficient Estimation of Word Representations in Vector Space</article-title>
          . Computer Science.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>arXiv preprint arXiv:1301.3781v3</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Siwei</given-names>
            <surname>Lai</surname>
          </string-name>
          , Kang Liu,
          <source>Liheng Xu and Jun Zhao</source>
          ,
          <year>2016</year>
          .
          <article-title>How to Generate a Good Word Embedding</article-title>
          .
          <source>IEEE Intelligent Systems 31</source>
          <volume>6</volume>
          (
          <issue>2016</issue>
          ),
          <fpage>5</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>