=Paper= {{Paper |id=Vol-2658/paper9 |storemode=property |title=IEKM-MD: An Intelligent Platform for Information Extraction and Knowledge Mining in Multi-Domains |pdfUrl=https://ceur-ws.org/Vol-2658/paper9.pdf |volume=Vol-2658 |authors=Yu Li,Tao Yue,Wu Zhenxin |dblpUrl=https://dblp.org/rec/conf/jcdl/YuTW20 }} ==IEKM-MD: An Intelligent Platform for Information Extraction and Knowledge Mining in Multi-Domains== https://ceur-ws.org/Vol-2658/paper9.pdf
                    EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents




    IEKM-MD: An Intelligent Platform for Information Extraction
           and Knowledge Mining in Multi-Domains
                Yu Li†                                            Tao Yue                                           Wu Zhenxin
National Science Library, Chinese                   National Science Library, Chinese                    National Science Library, Chinese
      Academy of Sciences                                 Academy of Sciences                                  Academy of Sciences
          Beijing, China                                     Beijing, China                                        Beijing, China
      yul@ mail.las.ac.cn                                taoyue@ mail.las.ac.cn                                wuzx@ mail.las.ac.cn



ABSTRACT                                                                     to fully learn the characteristics of natural language representation.
                                                                             In most case, however, the annotated corpus in one specific field
The terminologies in different disciplines vary greatly, and the             is constructed manually by several experts, which is time-
annotated corpora are scarce, which have limited the portability of          consuming and laborious. Therefore, it is hard to directly use a
information extraction models. The content of scientific articles is         well-trained model to other domains.
still underutilized. This paper constructs an intelligent platform
                                                                                 How to extract information without massive annotated corpus
for information extraction and knowledge mining, namely IEKM-
                                                                             is a big challenge. Active Learning (AL) [4] has been proved to be
MD. Two innovative technologies are proposed: Firstly, a phrase-
                                                                             an effective way to solve the problem of corpus scarcity when
level scientific entity extraction model combining neural network
                                                                             dealing with the classification tasks [5, 6]. However, it has not
and active learning is designed, which can reduce the model’s
                                                                             been validated on the sequence labelling task, which is more
dependence on large-scale corpus. Secondly, a translation-based
                                                                             difficult to find the optimal result because its complexity increases
relation prediction model is provided, which improves the relation
                                                                             exponentially [7]. In this paper, we introduce multiple active
embeddings by optimizing loss function. In addition, the platform
                                                                             learning strategies into information extraction for the first time, so
integrates the advanced entity recognition model (spaCy.NER)
                                                                             as to explore a cheap and efficient solution for recognizing the
and the keyword extraction model (RAKE). It provides abundant
                                                                             fined-grain entities in multiple domains.
services for fine-grained and multi-dimensional knowledge,
including problem discovery, method recognition, relation                        Relation predication is another basic technology for
representation and hot spot detection. We carried out the                    knowledge organization. Translation models see relation as a
experiments in three different domains: Artificial Intelligence,             process of translating the head entity to the tail entity, which have
Nanotechnology and Genetic Engineering. The average accuracies               been widely used to predict relations. There are some classic
of scientific entity extraction respectively are 0.91, 0.52 and 0.76.        translation models proposed from different perspectives: TransE
                                                                             [8] is the first translation embedding model with fewer
CCS CONCEPTS                                                                 parameters. TransH [9] is presented to solve the problem of
• Computing methodologies • Artificial intelligence • Natural                complex relation representation. TransR [10] distinguishes the
language processing • Information extraction                                 semantic embedding for different types of relations, which wined
                                                                             a better F-score. TransD [11] simplifies the projection process of
KEYWORDS                                                                     TransR and improves the computing efficiency.
Information extraction, Relation prediction, Active learning,                   This paper aims to construct an intelligent platform for
Translation embedding, Neural network                                        information extraction and knowledge mining, which can be used
                                                                             in multiple domains without much human intervention. The main
                                                                             contributions are as follows: 1). with the limited annotated corpus,
1   Introduction                                                             an effective method combining neural network with active
With the progress of science and technology, there are more and              learning recognizes scientific entities in multiple domains; 2). By
more fields and scientific articles. Information extraction and              optimizing the loss function, an improved translation model
knowledge mining in the specific field enable scholars to quickly            represents the semantic vectors more accurately and reaches the
grasp the overall outline of information, and track the                      convergence state faster with a small loss score compared with the
development of fine-grained knowledge. There are many mature                 original model.
models to extract information from texts, such as BiLSTM-CNN
[1], CNN-BiLSTM-CRF [2], LM-LSTM-CRF [3], which have
achieved high scores in various tasks of natural language
                                                                             2    Intelligent Platform: IEKM-MD
processing. In fact, these supervised learning models inevitably             The technology framework of our platform is shown in Figure 1.
consume large amounts of high-quality annotated corpus in order              This platform includes two innovative technologies: 1) the model




      Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                        73
                         EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents




combining neural network with active learning extracts "problem"                   and “methods” for all the unlabeled articles. More details about
and "method" entities, 2) the improved translation model predicts                  parameter setting will be discussed in Section 3.1.
relations between "problem" and "method" entities. At the same
time, the platform integrates two excellent tools (spaCy.NER1 and
RAKE2 ) to recognize the named entities and keywords. Finally,                       labeled samples


this platform provides a variety of knowledge services for                                we
                                                                                       provide
                                                                                                                                                                          Learning Engine

researchers, including problem discovery, method recognition,                              a
                                                                                        novel            CNN-
                                                                                       method          character                                                                    Predicted
relation representation and hot spot detection. Besides, the                              for          encoding
                                                                                                                                                                        Word
                                                                                                                                                                           we
                                                                                                                                                                                      Label
                                                                                                                                                                                        O
                                                                                         face
analyzers can perform richer downstream tasks based on our                             emotion
                                                                                     recognition
                                                                                                                                                                        provide
                                                                                                                                                                            a
                                                                                                                                                                                        O
                                                                                                                                                                                        O
                                                                                                                            feature                       CRF            novel          O
platform, such as discipling analysis, trend explosion, new                               we                                jointing                    decoding        method          O
                                                                                       provide                                                                             for          O
technology detection, and so on.                                                           a                                                                              face
                                                                                                                                                                        emotion
                                                                                                                                                                                      B-task
                                                                                                                                                                                      I-task
                                                                                        novel          Bi-LSTM
                                                                                       method            word                                                         recognition     I-task
                                                                                          for          encoding                                                            predicted score
            Portal                     Platform: IEKM-MD                                 face
                                                                                       emotion
                                                                                     recognition



                       Problem          Method       Relation     Hotspots
         Services                                                                                                                                                    loss score < threshold         Yes
                      discovery       recognition    represent    detection                                                                                          all samples are labelled
                                                                                                                                          unlabeled samples
                                                                                                                                                                             No

                                                                                     Selecting Engine
                      Scientific                     Named                                                               Sample        Value score    Margin       Sample    Predicted score
                                       Relation                  Keyword                                                  sent-1        score’-1                    sent-1       score-1
        Functions       entity                        entity                                                              sent-2        score’-2
                                                                                                                                                       NSE
                                                                                                                                                                    sent-2       score-2        methods   problems
                                      prediction                 extraction                                                ……             ……          MNLP           ……           ……
                      extraction                    extraction                      expert annotation selected samples    sent-n        score’-n       LWP          sent-n       score-n




                                                                                   Figure 2: Information extraction model combining neural
        Databases        AI             GIS           Bio         ……               network with active learning

                                                                                       Here we choose CNN-BiLSTM-CRF [12] as the learning
    I nfrastructure       Platform of storing and computing big data               engine. CNN focuses on the morphology features that are the
                                                                                   prefix and suffix of word. BiLSTM learns the dependency
Figure 1: Technology framework of IEKM-MD                                          relationship between words with a long distance by using two
                                                                                   groups of long-short term memory networks in opposite directions.
2.1       Scientific Entity Recognition                                            CRF decides the most optimal labeling sequence with a rational
                                                                                   linguistic logic.
Scientific entity recognition contributes to extract phrases from
scientific articles. These phrases consist of several words which                      In addition, we propose a hybrid approach for the selecting
describe the focus of article or the method proposed by author. In                 engine. Firstly, the value score of each unlabeled sample is
order to reduce the dependence on annotated corpus, this paper                     respectively computed by four different types of active learning
provides a semi-supervised learning model combining neural                         strategies, and the sum of them is set as the final value score.
network with active learning.                                                      Secondly, the value scores are listed in descending order, only the
                                                                                   top 10% most valuable samples are selected to be annotated
    The framework of the information extraction model is shown
                                                                                   manually in each iteration.
in Figure 2. Firstly, the learning engine trains the parameters of
neural network by using a small number of annotated samples                           This paper picked out three classical strategies from the
(dozens of abstracts with semantic labels). Then, the trained                      uncertain sampling methods: margin [13], N-best sequence
neural network predicts the labels of unannotated samples and                      entropy [14] and maximum normalized log-probability [15].
inputs the predicted scores to the selecting engine. Secondly,                     Additionally, we propose a novel strategy, namely label weighted
according to the active learning strategies, the selecting engine                  probability, which enhances on the importance of the number of
decides which samples are valuable and should be annotated                         labels. The more labels of problems or methods there are in a
manually. Only the top 10% most valuable samples are labelled                      sentence, the more valuable the sentence is.
by experts. Thirdly, the manually annotated samples are added
into the training set to re-train the neural network, in order to                  2.2         Entity Relation Prediction
improve the performance of label prediction. The whole process                     Relation prediction decides whether a "problem" and a "method"
runs repeatedly until the performance of model has no significant                  is related or not. That means if a “problem” is related to a
optimization. Finally, the trained model predicts the “problems”                   “method”, the method can be used to solve this problem.
                                                                                       Translation model sees the relation in the triple (head entity,
1
                                                                                   relation, tail entity) as a translational between two entities. There
    https://spacy.io/
2
    https://github.com/aneesha/RAKE                                                is a series of translation models. TransE [8] has few parameters
                                                                                   and is low in complexity, but cannot distinguish two tail entities




                                                                              74
                   EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents




with the same relation. TransH [9] uses different vectors to                        generate the semantic representation of all head and tail
represent one entity with various relations, which solves the                       entities.
problem of complex relation representation (1-N, N-1, N-N).
                                                                            3)      To improve the ability of feature learning for the unknown
TransR [10] supposes that different relations are in different
                                                                                    entities, we add one hidden layer of linear transformation
semantic spaces. Thus, this model projects entities into their
                                                                                    respectively for the head entities and tail entities.
relation spaces at first, then builds the translation process.
However, it greatly increases the time cost because of too many
                                                                            2.3       Named Entity Recognition and Keyword
parameters. TransD [11] creates the projection matrix respectively
for head entity and tail entity. It not only combines the effects of                 Extraction
both entities and relations on projection, but also improves the            We use an enterprise open source toolkit spaCy.NER to recognize
computing efficiency.                                                       the named entities. spaCy.NER implements a very fast and
                                                                            efficient system based on the statistical machine learning
   After comparing the performance of various translation models,           algorithms, which can recognize 18 entity types, such as Person,
we choose TransH to predict relations, which keeps balance                  Organization, Location, Geopolitics entity.
between accuracy and efficiency. To solve the problems of one-
to-many, many-to-one, many-to-many relations, TransH generates                 Furthermore, keyword extraction is achieved by the open
the relation-specific translation vector 𝑑𝑟 in the relation-specific        source toolkit RAKE (Rapid Automatic Keyword Extraction).
hyperplane 𝑤𝑟 rather than in the same space of entity embeddings.           RAKE is an automatic keyword extraction technique. Based on
                                                                            the statistical method, RAKE outperformed TextRank and other
                                                                            supervised learning models, which obtained a high F value [16]
                                                                            and is more efficient.
                            t
                                                          wr
           h
                                                                            3 Platform Evaluation and Display
                                 dr                                         We evaluate the performance of information extraction of IEKM-
                                                                            MD in the field of Artificial Intelligence (AI). There are two
                  h⊥                                                        datasets be used.
                            t⊥
                                                                                1) The top 100 AI conferences were picked out by the domain
                                                                            experts, and their abstracts were acquired from NSTL database3,
                                                                            total in 9753 sentences. Next, we built the truth datasets. Each
                                                                            sentence is annotated synchronously by two students in the
                                                                            corresponding subjects (task, method or other). The annotation
Figure 3: TransH projection [9]
                                                                            results are checked by one expert. The annotation format is shown
                                                                            as Figure 4. The AI annotated corpus contains 26,0000 tokens.
    As shown in Figure 3, the relation 𝑟 in its hyperplane 𝑤𝑟 has a
translation vector 𝑑𝑟 , the head embedding ℎ and the tail                                We    use    active   learning     to    extract   information
embedding 𝑡 in 𝑤𝑟 have their projection vectors ℎ⊥ and 𝑡⊥ . The                           |     |        |        |         |        |           |
defined score function is: ||ℎ⊥ + 𝑑𝑟 − 𝑡⊥ ||22 .
                                                                                         O     O     B-method I-method      O     B-task      I-task
   However, the original TransH model does not match our goal
exactly. We achieved three improvements.
                                                                            Figure 4: An example of annotation format
1)   TransH constructs the negative samples by replacing the
     head or tail entity with others in the positive samples.                  2) FTD datasets4 shared by Stanford University in the field of
     However, the replaced one may also be correct because of               Computational Linguistics. It comes from the Conference of the
     synonyms, which introduced many false negative labels into             Association for Computational Linguistics and ranges from 1965
     training. Considering that there are only two types of                 to 2009, which containing four types of labels: focus, technique,
     relationships, we simply construct the negative samples by             domain and other, in total 2628 sentences.
     modifying the correct relationship into its antonym. By this
                                                                                In addition, we show the effect of knowledge mining in three
     change, it is more convenient to construct a balanced
                                                                            different kinds of domains. We choose three popular keywords
     annotated corpus. Moreover, the score function 𝑓𝑟 (ℎ, 𝑡) is re-
                                                                            (Neural Networks, Nano Structure and Genetic Engineering) that
     defined as Equation (1), which aims to move the attention
                                                                            respectively respect the subjects of Computer Science, Material
     from entity to relation.
                                                                            and Medicine to acquire abstracts from NSTL database. 200
                  𝑓𝑟 (ℎ, 𝑡) = ||𝑎𝑏𝑠(ℎ⊥ − 𝑡⊥ ) − 𝑑𝑟 ||22          (1)
                                                                            3
                                                                                https://www.las.ac.cn
2)   Comparing with the original model that initializes the entities        4
                                                                                https://nlp.stanford.edu/pubs/FTDDataset_v1.txt
     with the random vectors, we use the word2vec model to




                                                                       75
                             EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents




abstracts of each subject are randomly selected from SCI journals                                        The results reflect that Neural Networks achieved the best
and are used to verify the practical application effect of IEKM-                                     performance with 0.93 accuracy of problem extraction and 0.89
MD.                                                                                                  accuracy of method extraction. The average accuracy of three
                                                                                                     fields reveals that problem extraction has a better score than
3.1 Scientific Entity Recognition                                                                    method extraction. The first reason is that the total mentions of
We set the baselines only using the CNN-BiLSTM-CRF (CBC)                                             problem are smaller than methods, and they are usually described
model trained on all annotated samples. For each dataset (AI or                                      in the noun phrases, which contribute to an easier pattern to be
FTD), the best performance is as the baseline, so as to detect                                       caught by model. The second reason is that one article may
whether active learning helps reduce the scale of annotated corpus                                   contain multiple methods, which are modified by multiple
for supervised learning models. The scale of training sets and the                                   attributives or adverbials, making it more challenging to recognize
best F1 scores of CBC model are shown in Table 1.                                                    the complete methods.

Table 1: Best F1 of three datasets trained by CBC model                                              Table 3: Accuracies of scientific entity recognition

                                         AI                              FTD                                               AI                   Nano Structure        Genetic Engineering
        Metric                                                                                        Metric
                            Problem            Method        Focus     Technique      domain                     Problem         Method      Problem        Method   Problem       Method

Instances in training set        5763          12041          1740       1986            1652        Accuracy       0.93           0.89        0.61           0.42     0.77          0.75

     Best F1 score           73.70%            71.24%        55.33%     51.33%        57.73%            However, our platform performed worst in the field of Nano
                                                                                                     Structure. This may because that the articles of Nano Structure
   In the model of IEKM-MD, initially only 0.01% annotated                                           include many complex and specialized terms in the subjects of
samples are used to carry out the cold starting process, then the                                    biology, physics, chemistry, electronics, and metrology. Our
highest valuable samples (10%) are added into the training sets in                                   platform still lacks the professional knowledge to learn the
each iteration. Only if the F1 score of IEKM-MD reaches the                                          specific features.
baseline, can the learning process be stopped. The label scales and
F1 scores of AI and FTD datasets in each iteration are show in                                          The extracted top 10 problems of three fields are shown in
Table 2.                                                                                             Table 4, which reveal that Neural Networks focuses on the
                                                                                                     classification, prediction and recognition problems of data and
Table 2: Learning effect of IEKM-MD in each iteration                                                images in the subject of Computer Science. Nano Structure covers
                                                                                                     a wide range, including physics, biology, chemistry, and so on,
                                               AI                       FTD                          which focuses on the applications on the basic disciplines.
            Step        Metric
                                    Problem Method            Focus   Technique Domain
                                                                                                     Therefore, the extracted problems involve detection, analysis and
                                                                                                     prediction of energy, atom and medicine. The scope of Genetic
           Initial      Labels           31           67        6        8           6               Engineering is relatively narrow and is related to drug
                        Labels          694          1303      272      284         371              development, disease treatment, and biological manufacturing in
         Iteration-1                                                                                 the biomedical field.
                            F1      64.20%          60.23%   42.81%    42.33%      47.59%

                        Labels          1232         2713      428      403         452
         Iteration-2
                            F1      68.18%          66.43%   46.27%    49.02%      53.20%            Table 4: Problem recognition in multiple domains
                        Labels          1729         3866      564      618         573
        Itereation-3                                                                                   Top      Neural Network               Nano Structure          Genetic Engineering
                            F1      75.87%          72.57%   57.41%    50.70%    58.00%
                                                                                                        1         Classification                Detection              Drug discovery
                        Labels           -            -         -       821          -
         Iteration-4                                                                                    2           Prediction                 Optimization             Identification
                            F1           -            -         -      52.23%        -
                                                                                                                                          Energy storage chemical
                                                                                                        3       Pattern recognition                                   Disease resistance
   Table 1 and 2 reveal that after combing supervised learning                                                                                  prediction
model with active learning strategies, the annotated samples can                                        4        Feature selection          Sensitive detection        Crop protection
be cut down 60%-70%.
                                                                                                        5         Optimization                Remote sensing            Drug delivery
   After IEKM-MD achieves the best performance as that CBC
                                                                                                        6         Datum mining                 UV detection          Genetic engineering
model did, the model extracts problems and methods from Neural
Networks, Nano Structure and Genetic Engineering datasets. We                                           7
                                                                                                                     Binary                Hydrothermal clinical
                                                                                                                                                                     Biodiesel production
                                                                                                                  classification                diagnosis
manually checked the top 30 problems and methods and evaluated
their accuracies as shown in Table 3.                                                                   8        Computer vision              Determination
                                                                                                                                                                          Cancer
                                                                                                                                                                       immunotherapy




                                                                                                76
                          EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents




                                    Excitation limit of
   9     Feature extraction                                    Biofuel production
                                        detection

                Image
  10                              Atomic layer deposition          Biomedical
            classification

    Table 5 shows the extracted top 10 methods. In the field of
Neural Networks, they are mostly based on machine learning
models, such as support vector machine, random forest, deep
learning. The technologies in Nano Structure are specific
instruments, such as microscope, spectrograph and ray. For
Genetic Engineering, gene editing, manipulation and
recombination are the three main techniques.                                              Figure 5: Problem-method relation network in Neural
                                                                                          Networks
Table 5: Method recognition in multiple domains
                                                                                             Specifically, we can get more details from the above-
 Top   Neural Network             Nano Structure            Genetic Engineering           mentioned network. By setting the method X-Ray Diffraction
                                                                                          (XRD) as a center, Figure 6 reveals that what problems are solved
            Machine                                           Polymerase Chain
  1
            learning
                              X-Ray diffraction (XRD)
                                                               Reaction (PCR)             by XRD. They are Assisted Synthesis, Biomedical Application,
                                                                                          Biosynthesis of Silver Nanoparticles and so on.
         Support vector        Transmission electron         Genetic engineering
  2
           machine              microscopy (TEM)                  strategy

                                Scanning electron
  3      Classification                                         Gene therapy
                                microscopy (SEM)

  4      Random forest          Raman spectroscopy          Southern blot analysis

                                 Fourier transform
  5     Neural network         infrared spectroscopy            Biotechnology
                                      (FTIR)

                                                             Clustered Regularly
                              Atomic force microscopy         Interspaced Short
  6      Deep learning
                                      (AFM)                  Palindromic Repeats
                                                                  (CRISPR)

                                                              enzyme-linked
                              High Performance Liquid
  7      Decision tree                                      immunosorbent assay
                              Chromatography (HPLC)                                       Figure 6: The problems solved by XRD method in Nano
                                                                 (ELISA)
                                                                                          Structure
  8     Feature selection        Elemental analysis         Genetic transformation


  9      Datum mining
                                X-ray photoelectron
                                                            Genetic manipulation          3.3 Hotspot Detection
                                spectroscopy (XPS)
                                                                                          Hotspots are the most popular research topics. We use the
        Artificial neural      Hydrothermal atomic
  10                                                         Recombinant DNA              extracted keywords to pick out the hotspots in multiple domains.
            network             force microscopy
                                                                                          As a hotspot, the total number occurring in articles should be
                                                                                          increased year by year or keeps a steady top order in last three
3.2 Entity Relation Prediction                                                            years. According by this rule, Figure 7 shows the hotspots in the
By predicting the relations between problem and method, we                                field of Neural Networks. They are distinct from the scientific
construct the method-problem networks for different domains. As                           entities recognized in section 3.1, which have no semantic type
shown in Figure 5, the methods and problems which were                                    but reflect the popularity degree of terms.
separate in the articles of Neural Network are linked by relation
prediction. The red dots refer to methods, and the blue dots refer
to problems.




                                                                                          Figure 7: Hotspots in AI




                                                                                     77
                         EEKE 2020 - Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents




                                                                                              [15] Balcan Maria-Florina, Broder Andrei, Zhang Tong. 2007. Margin based active
4    Conclusion                                                                                    learning. In Proceedings of the 20th. Annual Conference on Learning Theory
                                                                                                   (COLT’07), 2007, San Diego, CA, USA. Springer-Verlag., Berlin, Heidelberg,
This paper introduced an innovative and intelligent platform                                       35–50. https://doi.org/10.5555/1768841.1768848
IEKM-MD to extract information and mine knowledge from                                        [16] Stuart Rose, Dave Engel, Nick Cramer, Wendy Cowley. 2010. Automatic
                                                                                                   keyword extraction from individual documents. Text Mining: Applications and
scientific articles in multiple domains. One contribution is                                       Theory 20, 1 (Mar. 2010), 1-20. DOI:
providing a hybrid active learning strategy to solve the problem of                                https://doi.org/10.1002/9780470689646.ch1.
annotated corpus scarcity in supervised learning model. Another
contribution is designing an improved Translation embedding
approach based on TransH model to optimize the performance of
relation prediction. Three datasets in Neural Networks, Nano
Structure and Genetic Engineering show that our platform is
enable to achieve various knowledge services with a high
accuracy in multiple domains.

ACKNOWLEDGMENTS
This work is supported by the project “Annotation and evaluation
of the semantic relationship between geographical entities in
Chinese web texts” (Grant No. 41801320) from the National
natural science foundation of China youth science foundation.

REFERENCES
[1] Chiu Jason, Nichols Eric. 2015. Named entity recognition with bidirectional
     LSTM-SNNs. Transactions of the Association for Computational Linguist
     6(Nov. 2015). DOI: https://doi.org/10.1162/tacl_a_00104.
[2] Ma Xuezhe, Eduard Hovy. 2016. End-to-end sequence labeling via bi-
     directional LSTM-CNNs-CRF. arXiv:1603.01354. Retrieved from
     https://arxiv.org/abs/1603.01354.
[3] Liyuan Liu, Jingbo Shang, Frank F. Xu, Xiang Ren, Huan Gui, Jian Peng,
     Jiawei Han. 2017. Empower sequence labeling with task-aware neural language
     model. arXiv:1709.04109. Retrieved from https://arxiv.org/abs/1709.
[4] Kulkarni, Sanjeev and Mitter, Sanjoy and Tsitsiklis, John and Systems,
     Massachusetts. 1993. Active Learning Using Arbitrary Binary Valued Queries.
     Machine Learning 11, 1 (Apr. 1993), 23-35. DOI: https://doi.org/11.
     10.1023/A:1022627018023.
[5] Vijayanarasimhan Sudheendra, Grauman Kristen.2012. Active frame selection
     for label propagation in videos. In Proceedings of the 12th. European
     Conference on Computer Vision (ECCV’12), Florence, Italy. Springer-Verlag.
     Heidelberg, Berlin, 496-509. https://doi.org/10.1007/978-3-642-33715-4_36.
[6] Deng Yue, Dai Qionghai, Liu Risheng, Zhang Zengke, Hu Sanqing. 2013. Low-
     rank structure learning via non-convex heuristic recovery. IEEE Transactions
     on Neural Networks and Learning Systems, 24(3): 383–396. DOI:
     https://doi.org/10.1109/TNNLS.2012.2235082.
[7] Deng Yue, Chen Kawai, Shen Yilin, Jin Hongxia. 2018. Adversarial active
     learning for sequences labeling and generation. In Proceedings of the 27th
     International Joint Conference on Artificial Intelligence, July, 2018, Stockholm,
     Sweden. IJCAI-18. California, 4012-4018.
     https://doi.org/10.24963/ijcai.2018/558.
[8] Bordes Antonie, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, Oksana
     Yakhnenko. 2013. Translating embeddings for modeling multi-relational data.
     In Proceedings of NIPS. MIT Press. Cambridge, MA, 2787-2795.
[9] Zhen Wang, Jianwen Zhang, Jianlin Feng, Zheng Chen. 2014. Knowledge
     graph embedding by translating on hyperplanes. In Proceedings of the 28th.
     AAAI Conference on Artificial Intelligence (AAAI’14), June, 2014. AAAI
     Press. Menlo Park, CA, 1112-1119. https://doi.org/10.5555/2893873.2894046.
[10] He Shizhu, Liu Kang, Ji Guoliang, Zhao Jun. 2015. Learning to represent
     knowledge graphs with Gaussian embedding. In Proceedings of CIKM. ACM.
     New York, 623-632. https://doi.org/10.1145/2806416.2806502.
[11] Ji Guoliang, He Shizhu, Xu Liheng, Liu Kang, Zhao Jun. 2015. Knowledge
     graph embedding via dynamic mapping matrix. In Proceedings of ACL. ACL.
     Stroudsburg, PA, 687-696. https://doi.org/10.3115/v1/P15-1067.
[12] Xuezhe Ma, Eduard Hovy. 2016. End-to-end sequence labeling via bi-
     directional LSTM-CNNs-CRF [OL]. arXiv: 1603.01354. Retrieved from
     https://arxiv.org/abs/1603.01354.
[13] Yanyao Shen, Hyokun Yun, Zachary C. 2017. Lipton, Yakov Kronrod,
     Animashree Anandkumar. Deep active learning for named entity recognition.
     arXiv:1707.05928. Retrieved from https://arxiv.org/abs/1707.05928.
[14] Seokhwan Kim, Yu Song, Kyungduk Kim, Jeong-Won Cha, Gary Geunbae
     Lee. 2006. MMR-based active machine learning for bio entities. In Proceedings
     of the Human Language Technology Conference of the NAACL, Companion
     Volume: Short Papers, June, 2006, New York., New York, USA, 69–72.




                                                                                         78