<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">IEKM-MD: An Intelligent Platform for Information Extraction and Knowledge Mining in Multi-Domains</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Yu</forename><surname>Li</surname></persName>
						</author>
						<author role="corresp">
							<persName><forename type="first">Tao</forename><surname>Yue</surname></persName>
							<email>taoyue@mail.las.ac.cn</email>
						</author>
						<author>
							<persName><forename type="first">Wu</forename><surname>Zhenxin</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department">National Science Library</orgName>
								<orgName type="institution">Chinese Academy of Sciences</orgName>
								<address>
									<settlement>Beijing</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">National Science Library</orgName>
								<orgName type="institution">Chinese Academy of Sciences</orgName>
								<address>
									<settlement>Beijing</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="department">National Science Library</orgName>
								<orgName type="institution">Chinese Academy of Sciences</orgName>
								<address>
									<settlement>Beijing</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">IEKM-MD: An Intelligent Platform for Information Extraction and Knowledge Mining in Multi-Domains</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">CC848333B5FCAE5703764DAF4A74394B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T04:08+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>CCS CONCEPTS</term>
					<term>Computing methodologies</term>
					<term>Artificial intelligence</term>
					<term>Natural language processing</term>
					<term>Information extraction Information extraction, Relation prediction, Active learning, Translation embedding, Neural network</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The terminologies in different disciplines vary greatly, and the annotated corpora are scarce, which have limited the portability of information extraction models. The content of scientific articles is still underutilized. This paper constructs an intelligent platform for information extraction and knowledge mining, namely IEKM-MD. Two innovative technologies are proposed: Firstly, a phraselevel scientific entity extraction model combining neural network and active learning is designed, which can reduce the model's dependence on large-scale corpus. Secondly, a translation-based relation prediction model is provided, which improves the relation embeddings by optimizing loss function. In addition, the platform integrates the advanced entity recognition model (spaCy.NER) and the keyword extraction model (RAKE). It provides abundant services for fine-grained and multi-dimensional knowledge, including problem discovery, method recognition, relation representation and hot spot detection. We carried out the experiments in three different domains: Artificial Intelligence, Nanotechnology and Genetic Engineering. The average accuracies of scientific entity extraction respectively are 0.91, 0.52 and 0.76.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>With the progress of science and technology, there are more and more fields and scientific articles. Information extraction and knowledge mining in the specific field enable scholars to quickly grasp the overall outline of information, and track the development of fine-grained knowledge. There are many mature models to extract information from texts, such as BiLSTM-CNN <ref type="bibr" target="#b0">[1]</ref>, CNN-BiLSTM-CRF <ref type="bibr" target="#b1">[2]</ref>, LM-LSTM-CRF <ref type="bibr" target="#b2">[3]</ref>, which have achieved high scores in various tasks of natural language processing. In fact, these supervised learning models inevitably consume large amounts of high-quality annotated corpus in order to fully learn the characteristics of natural language representation. In most case, however, the annotated corpus in one specific field is constructed manually by several experts, which is timeconsuming and laborious. Therefore, it is hard to directly use a well-trained model to other domains.</p><p>How to extract information without massive annotated corpus is a big challenge. Active Learning (AL) <ref type="bibr" target="#b3">[4]</ref> has been proved to be an effective way to solve the problem of corpus scarcity when dealing with the classification tasks <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b6">6]</ref>. However, it has not been validated on the sequence labelling task, which is more difficult to find the optimal result because its complexity increases exponentially <ref type="bibr" target="#b7">[7]</ref>. In this paper, we introduce multiple active learning strategies into information extraction for the first time, so as to explore a cheap and efficient solution for recognizing the fined-grain entities in multiple domains.</p><p>Relation predication is another basic technology for knowledge organization. Translation models see relation as a process of translating the head entity to the tail entity, which have been widely used to predict relations. There are some classic translation models proposed from different perspectives: TransE <ref type="bibr" target="#b8">[8]</ref> is the first translation embedding model with fewer parameters. TransH <ref type="bibr" target="#b9">[9]</ref> is presented to solve the problem of complex relation representation. TransR <ref type="bibr" target="#b10">[10]</ref> distinguishes the semantic embedding for different types of relations, which wined a better F-score. TransD <ref type="bibr" target="#b11">[11]</ref> simplifies the projection process of TransR and improves the computing efficiency. This paper aims to construct an intelligent platform for information extraction and knowledge mining, which can be used in multiple domains without much human intervention. The main contributions are as follows: 1). with the limited annotated corpus, an effective method combining neural network with active learning recognizes scientific entities in multiple domains; 2). By optimizing the loss function, an improved translation model represents the semantic vectors more accurately and reaches the convergence state faster with a small loss score compared with the original model. combining neural network with active learning extracts "problem" and "method" entities, 2) the improved translation model predicts relations between "problem" and "method" entities. At the same time, the platform integrates two excellent tools (spaCy.NER<ref type="foot" target="#foot_0">1</ref> and RAKE<ref type="foot" target="#foot_1">2</ref> ) to recognize the named entities and keywords. Finally, this platform provides a variety of knowledge services for researchers, including problem discovery, method recognition, relation representation and hot spot detection. Besides, the analyzers can perform richer downstream tasks based on our platform, such as discipling analysis, trend explosion, new technology detection, and so on. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Scientific Entity Recognition</head><p>Scientific entity recognition contributes to extract phrases from scientific articles. These phrases consist of several words which describe the focus of article or the method proposed by author. In order to reduce the dependence on annotated corpus, this paper provides a semi-supervised learning model combining neural network with active learning.</p><p>The framework of the information extraction model is shown in Figure <ref type="figure">2</ref>. Firstly, the learning engine trains the parameters of neural network by using a small number of annotated samples (dozens of abstracts with semantic labels). Then, the trained neural network predicts the labels of unannotated samples and inputs the predicted scores to the selecting engine. Secondly, according to the active learning strategies, the selecting engine decides which samples are valuable and should be annotated manually. Only the top 10% most valuable samples are labelled by experts. Thirdly, the manually annotated samples are added into the training set to re-train the neural network, in order to improve the performance of label prediction. The whole process runs repeatedly until the performance of model has no significant optimization. Finally, the trained model predicts the "problems" and "methods" for all the unlabeled articles. More details about parameter setting will be discussed in Section 3.1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 2: Information extraction model combining neural network with active learning</head><p>Here we choose CNN-BiLSTM-CRF <ref type="bibr" target="#b12">[12]</ref> as the learning engine. CNN focuses on the morphology features that are the prefix and suffix of word. BiLSTM learns the dependency relationship between words with a long distance by using two groups of long-short term memory networks in opposite directions. CRF decides the most optimal labeling sequence with a rational linguistic logic.</p><p>In addition, we propose a hybrid approach for the selecting engine. Firstly, the value score of each unlabeled sample is respectively computed by four different types of active learning strategies, and the sum of them is set as the final value score. Secondly, the value scores are listed in descending order, only the top 10% most valuable samples are selected to be annotated manually in each iteration. This paper picked out three classical strategies from the uncertain sampling methods: margin <ref type="bibr" target="#b13">[13]</ref>, N-best sequence entropy <ref type="bibr" target="#b14">[14]</ref> and maximum normalized log-probability <ref type="bibr" target="#b15">[15]</ref>. Additionally, we propose a novel strategy, namely label weighted probability, which enhances on the importance of the number of labels. The more labels of problems or methods there are in a sentence, the more valuable the sentence is.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Entity Relation Prediction</head><p>Relation prediction decides whether a "problem" and a "method" is related or not. That means if a "problem" is related to a "method", the method can be used to solve this problem.</p><p>Translation model sees the relation in the triple (head entity, relation, tail entity) as a translational between two entities. There is a series of translation models. TransE <ref type="bibr" target="#b8">[8]</ref> has few parameters and is low in complexity, but cannot distinguish two tail entities with the same relation. TransH <ref type="bibr" target="#b9">[9]</ref> uses different vectors to represent one entity with various relations, which solves the problem of complex relation representation (1-N, N-1, N-N). TransR <ref type="bibr" target="#b10">[10]</ref> supposes that different relations are in different semantic spaces. Thus, this model projects entities into their relation spaces at first, then builds the translation process. However, it greatly increases the time cost because of too many parameters. TransD <ref type="bibr" target="#b11">[11]</ref> creates the projection matrix respectively for head entity and tail entity. It not only combines the effects of both entities and relations on projection, but also improves the computing efficiency.</p><p>After comparing the performance of various translation models, we choose TransH to predict relations, which keeps balance between accuracy and efficiency. To solve the problems of oneto-many, many-to-one, many-to-many relations, TransH generates the relation-specific translation vector 𝑑 𝑟 in the relation-specific hyperplane 𝑤 𝑟 rather than in the same space of entity embeddings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 3: TransH projection [9]</head><p>As shown in Figure <ref type="figure">3</ref>, the relation 𝑟 in its hyperplane 𝑤 𝑟 has a translation vector 𝑑 𝑟 , the head embedding ℎ and the tail embedding 𝑡 in 𝑤 𝑟 have their projection vectors ℎ ⊥ and 𝑡 ⊥ . The defined score function is:</p><formula xml:id="formula_0">||ℎ ⊥ + 𝑑 𝑟 − 𝑡 ⊥ || 2 2 .</formula><p>However, the original TransH model does not match our goal exactly. We achieved three improvements.</p><p>1) TransH constructs the negative samples by replacing the head or tail entity with others in the positive samples.</p><p>However, the replaced one may also be correct because of synonyms, which introduced many false negative labels into training. Considering that there are only two types of relationships, we simply construct the negative samples by modifying the correct relationship into its antonym. By this change, it is more convenient to construct a balanced annotated corpus. Moreover, the score function 𝑓 𝑟 (ℎ, 𝑡) is redefined as Equation <ref type="bibr" target="#b0">(1)</ref>, which aims to move the attention from entity to relation.</p><formula xml:id="formula_1">𝑓 𝑟 (ℎ, 𝑡) = ||𝑎𝑏𝑠(ℎ ⊥ − 𝑡 ⊥ ) − 𝑑 𝑟 || 2 2<label>(1)</label></formula><p>2) Comparing with the original model that initializes the entities with the random vectors, we use the word2vec model to generate the semantic representation of all head and tail entities.</p><p>3) To improve the ability of feature learning for the unknown entities, we add one hidden layer of linear transformation respectively for the head entities and tail entities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Named Entity Recognition and Keyword Extraction</head><p>We use an enterprise open source toolkit spaCy.NER to recognize the named entities. spaCy.NER implements a very fast and efficient system based on the statistical machine learning algorithms, which can recognize 18 entity types, such as Person, Organization, Location, Geopolitics entity.</p><p>Furthermore, keyword extraction is achieved by the open source toolkit RAKE (Rapid Automatic Keyword Extraction). RAKE is an automatic keyword extraction technique. Based on the statistical method, RAKE outperformed TextRank and other supervised learning models, which obtained a high F value <ref type="bibr" target="#b16">[16]</ref> and is more efficient.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Platform Evaluation and Display</head><p>We evaluate the performance of information extraction of IEKM-MD in the field of Artificial Intelligence (AI). There are two datasets be used.</p><p>1) The top 100 AI conferences were picked out by the domain experts, and their abstracts were acquired from NSTL database<ref type="foot" target="#foot_2">3</ref> , total in 9753 sentences. Next, we built the truth datasets. Each sentence is annotated synchronously by two students in the corresponding subjects (task, method or other). The annotation results are checked by one expert. The annotation format is shown as Figure <ref type="figure">4</ref>. The AI annotated corpus contains 26,0000 tokens. In addition, we show the effect of knowledge mining in three different kinds of domains. We choose three popular keywords (Neural Networks, Nano Structure and Genetic Engineering) that respectively respect the subjects of Computer Science, Material and Medicine to acquire abstracts from NSTL database. 200</p><formula xml:id="formula_2">h t w r h ⊥ t ⊥ d r</formula><p>abstracts of each subject are randomly selected from SCI journals and are used to verify the practical application effect of IEKM-MD.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Scientific Entity Recognition</head><p>We set the baselines only using the CNN-BiLSTM-CRF (CBC) model trained on all annotated samples. For each dataset (AI or FTD), the best performance is as the baseline, so as to detect whether active learning helps reduce the scale of annotated corpus for supervised learning models. The scale of training sets and the best F1 scores of CBC model are shown in Table <ref type="table" target="#tab_0">1</ref>. In the model of IEKM-MD, initially only 0.01% annotated samples are used to carry out the cold starting process, then the highest valuable samples (10%) are added into the training sets in each iteration. Only if the F1 score of IEKM-MD reaches the baseline, can the learning process be stopped. The label scales and F1 scores of AI and FTD datasets in each iteration are show in Table <ref type="table">2</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 2: Learning effect of IEKM-MD in each iteration</head><p>Step  <ref type="table" target="#tab_0">1</ref> and 2 reveal that after combing supervised learning model with active learning strategies, the annotated samples can be cut down 60%-70%.</p><p>After IEKM-MD achieves the best performance as that CBC model did, the model extracts problems and methods from Neural Networks, Nano Structure and Genetic Engineering datasets. We manually checked the top 30 problems and methods and evaluated their accuracies as shown in Table <ref type="table" target="#tab_3">3</ref>.</p><p>The results reflect that Neural Networks achieved the best performance with 0.93 accuracy of problem extraction and 0.89 accuracy of method extraction. The average accuracy of three fields reveals that problem extraction has a better score than method extraction. The first reason is that the total mentions of problem are smaller than methods, and they are usually described in the noun phrases, which contribute to an easier pattern to be caught by model. The second reason is that one article may contain multiple methods, which are modified by multiple attributives or adverbials, making it more challenging to recognize the complete methods. However, our platform performed worst in the field of Nano Structure. This may because that the articles of Nano Structure include many complex and specialized terms in the subjects of biology, physics, chemistry, electronics, and metrology. Our platform still lacks the professional knowledge to learn the specific features.</p><p>The extracted top 10 problems of three fields are shown in Table <ref type="table" target="#tab_4">4</ref>, which reveal that Neural Networks focuses on the classification, prediction and recognition problems of data and images in the subject of Computer Science. Nano Structure covers a wide range, including physics, biology, chemistry, and so on, which focuses on the applications on the basic disciplines. Therefore, the extracted problems involve detection, analysis and prediction of energy, atom and medicine. The scope of Genetic Engineering is relatively narrow and is related to drug development, disease treatment, and biological manufacturing in the biomedical field. Table <ref type="table" target="#tab_5">5</ref> shows the extracted top 10 methods. In the field of Neural Networks, they are mostly based on machine learning models, such as support vector machine, random forest, deep learning. The technologies in Nano Structure are specific instruments, such as microscope, spectrograph and ray. For Genetic Engineering, gene editing, manipulation and recombination are the three main techniques. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Entity Relation Prediction</head><p>By predicting the relations between problem and method, we construct the method-problem networks for different domains. As shown in Figure <ref type="figure" target="#fig_3">5</ref>, the methods and problems which were separate in the articles of Neural Network are linked by relation prediction. The red dots refer to methods, and the blue dots refer to problems. Specifically, we can get more details from the abovementioned network. By setting the method X-Ray Diffraction (XRD) as a center, Figure <ref type="figure" target="#fig_4">6</ref> reveals that what problems are solved by XRD. They are Assisted Synthesis, Biomedical Application, Biosynthesis of Silver Nanoparticles and so on. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Hotspot Detection</head><p>Hotspots are the most popular research topics. We use the extracted keywords to pick out the hotspots in multiple domains. As a hotspot, the total number occurring in articles should be increased year by year or keeps a steady top order in last three years. According by this rule, Figure <ref type="figure" target="#fig_5">7</ref> shows the hotspots in the field of Neural Networks. They are distinct from the scientific entities recognized in section 3.1, which have no semantic type but reflect the popularity degree of terms. This paper introduced an innovative and intelligent platform IEKM-MD to extract information and mine knowledge from scientific articles in multiple domains. One contribution is providing a hybrid active learning strategy to solve the problem of annotated corpus scarcity in supervised learning model. Another contribution is designing an improved Translation embedding approach based on TransH model to optimize the performance of relation prediction. Three datasets in Neural Networks, Nano Structure and Genetic Engineering show that our platform is enable to achieve various knowledge services with a high accuracy in multiple domains.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Technology framework of IEKM-MD</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 4 : An example of annotation format 2 )</head><label>42</label><figDesc>Figure 4: An example of annotation format 2) FTD datasets 4 shared by Stanford University in the field of Computational Linguistics. It comes from the Conference of the Association for Computational Linguistics and ranges from 1965 to 2009, which containing four types of labels: focus, technique, domain and other, in total 2628 sentences.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Problem-method relation network in Neural Networks</figDesc><graphic coords="5,333.55,76.77,208.83,104.20" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: The problems solved by XRD method in Nano Structure</figDesc><graphic coords="5,336.65,293.59,202.69,122.25" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Hotspots in AI</figDesc><graphic coords="5,348.48,580.01,179.00,91.25" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 : Best F1 of three datasets trained by CBC model</head><label>1</label><figDesc></figDesc><table><row><cell></cell><cell>AI</cell><cell></cell><cell></cell><cell>FTD</cell><cell></cell></row><row><cell>Metric</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>Problem</cell><cell>Method</cell><cell>Focus</cell><cell>Technique</cell><cell>domain</cell></row><row><cell>Instances in training set</cell><cell>5763</cell><cell>12041</cell><cell>1740</cell><cell>1986</cell><cell>1652</cell></row><row><cell>Best F1 score</cell><cell>73.70%</cell><cell>71.24%</cell><cell>55.33%</cell><cell>51.33%</cell><cell>57.73%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Metric AI FTD Problem Method Focus Technique Domain</head><label></label><figDesc></figDesc><table><row><cell>Initial</cell><cell>Labels</cell><cell>31</cell><cell>67</cell><cell>6</cell><cell>8</cell><cell>6</cell></row><row><cell></cell><cell>Labels</cell><cell>694</cell><cell>1303</cell><cell>272</cell><cell>284</cell><cell>371</cell></row><row><cell>Iteration-1</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>F1</cell><cell cols="5">64.20% 60.23% 42.81% 42.33% 47.59%</cell></row><row><cell></cell><cell>Labels</cell><cell>1232</cell><cell>2713</cell><cell>428</cell><cell>403</cell><cell>452</cell></row><row><cell>Iteration-2</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>F1</cell><cell cols="5">68.18% 66.43% 46.27% 49.02% 53.20%</cell></row><row><cell></cell><cell>Labels</cell><cell>1729</cell><cell>3866</cell><cell>564</cell><cell>618</cell><cell>573</cell></row><row><cell>Itereation-3</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>F1</cell><cell cols="5">75.87% 72.57% 57.41% 50.70% 58.00%</cell></row><row><cell></cell><cell>Labels</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>821</cell><cell>-</cell></row><row><cell>Iteration-4</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>F1</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>52.23%</cell><cell>-</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table</head><label></label><figDesc></figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3 : Accuracies of scientific entity recognition</head><label>3</label><figDesc></figDesc><table><row><cell></cell><cell>AI</cell><cell></cell><cell cols="2">Nano Structure</cell><cell cols="2">Genetic Engineering</cell></row><row><cell>Metric</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row><row><cell></cell><cell>Problem</cell><cell>Method</cell><cell>Problem</cell><cell>Method</cell><cell>Problem</cell><cell>Method</cell></row><row><cell>Accuracy</cell><cell>0.93</cell><cell>0.89</cell><cell>0.61</cell><cell>0.42</cell><cell>0.77</cell><cell>0.75</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4 : Problem recognition in multiple domains</head><label>4</label><figDesc></figDesc><table><row><cell>Top</cell><cell>Neural Network</cell><cell>Nano Structure</cell><cell>Genetic Engineering</cell></row><row><cell>1</cell><cell>Classification</cell><cell>Detection</cell><cell>Drug discovery</cell></row><row><cell>2</cell><cell>Prediction</cell><cell>Optimization</cell><cell>Identification</cell></row><row><cell>3</cell><cell>Pattern recognition</cell><cell>Energy storage chemical prediction</cell><cell>Disease resistance</cell></row><row><cell>4</cell><cell>Feature selection</cell><cell>Sensitive detection</cell><cell>Crop protection</cell></row><row><cell>5</cell><cell>Optimization</cell><cell>Remote sensing</cell><cell>Drug delivery</cell></row><row><cell>6</cell><cell>Datum mining</cell><cell>UV detection</cell><cell>Genetic engineering</cell></row><row><cell>7</cell><cell>Binary classification</cell><cell>Hydrothermal clinical diagnosis</cell><cell>Biodiesel production</cell></row><row><cell>8</cell><cell>Computer vision</cell><cell>Determination</cell><cell>Cancer immunotherapy</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 5 : Method recognition in multiple domains</head><label>5</label><figDesc></figDesc><table><row><cell>Top</cell><cell>Neural Network</cell><cell>Nano Structure</cell><cell>Genetic Engineering</cell></row><row><cell>1</cell><cell>Machine learning</cell><cell>X-Ray diffraction (XRD)</cell><cell>Polymerase Chain Reaction (PCR)</cell></row><row><cell>2</cell><cell>Support vector machine</cell><cell>Transmission electron microscopy (TEM)</cell><cell>Genetic engineering strategy</cell></row><row><cell>3</cell><cell>Classification</cell><cell>Scanning electron microscopy (SEM)</cell><cell>Gene therapy</cell></row><row><cell>4</cell><cell>Random forest</cell><cell>Raman spectroscopy</cell><cell>Southern blot analysis</cell></row><row><cell></cell><cell></cell><cell>Fourier transform</cell><cell></cell></row><row><cell>5</cell><cell>Neural network</cell><cell>infrared spectroscopy</cell><cell>Biotechnology</cell></row><row><cell></cell><cell></cell><cell>(FTIR)</cell><cell></cell></row><row><cell></cell><cell></cell><cell></cell><cell>Clustered Regularly</cell></row><row><cell>6</cell><cell>Deep learning</cell><cell>Atomic force microscopy (AFM)</cell><cell>Interspaced Short Palindromic Repeats</cell></row><row><cell></cell><cell></cell><cell></cell><cell>(CRISPR)</cell></row><row><cell>7</cell><cell>Decision tree</cell><cell>High Performance Liquid Chromatography (HPLC)</cell><cell>enzyme-linked immunosorbent assay (ELISA)</cell></row><row><cell>8</cell><cell>Feature selection</cell><cell>Elemental analysis</cell><cell>Genetic transformation</cell></row><row><cell>9</cell><cell>Datum mining</cell><cell>X-ray photoelectron spectroscopy (XPS)</cell><cell>Genetic manipulation</cell></row><row><cell>10</cell><cell>Artificial neural network</cell><cell>Hydrothermal atomic force microscopy</cell><cell>Recombinant DNA</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://spacy.io/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://github.com/aneesha/RAKE</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://www.las.ac.cn</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://nlp.stanford.edu/pubs/FTDDataset_v1.txt</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>ACKNOWLEDGMENTS</head><p>This work is supported by the project "Annotation and evaluation of the semantic relationship between geographical entities in Chinese web texts" (Grant No. 41801320) from the National natural science foundation of China youth science foundation.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Named entity recognition with bidirectional LSTM-SNNs</title>
		<author>
			<persName><forename type="first">Jason</forename><surname>Chiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nichols</forename><surname>Eric</surname></persName>
		</author>
		<idno type="DOI">10.1162/tacl_a_00104</idno>
		<idno>DOI:</idno>
		<ptr target="https://doi.org/10.1162/tacl_a_00104" />
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguist</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<date type="published" when="2015-11">2015. Nov. 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">Ma</forename><surname>Xuezhe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eduard</forename><surname>Hovy</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1603.01354</idno>
		<ptr target="https://arxiv.org/abs/1603.01354" />
		<title level="m">End-to-end sequence labeling via bidirectional LSTM-CNNs-CRF</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Empower sequence labeling with task-aware neural language model</title>
		<author>
			<persName><forename type="first">Liyuan</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jingbo</forename><surname>Shang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Frank</forename><forename type="middle">F</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xiang</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Huan</forename><surname>Gui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jian</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jiawei</forename><surname>Han</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1709.04109</idno>
		<ptr target="https://arxiv.org/abs/1709" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Active Learning Using Arbitrary Binary Valued Queries</title>
		<author>
			<persName><forename type="first">Sanjeev</forename><surname>Kulkarni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sanjoy</forename><surname>Mitter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">John</forename><surname>Tsitsiklis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Massachusetts</forename><surname>Systems</surname></persName>
		</author>
		<idno type="DOI">11.%2010.1023/A:1022627018023</idno>
		<idno>DOI:</idno>
		<ptr target="https://doi.org/11.10.1023/A:1022627018023" />
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="23" to="35" />
			<date type="published" when="1993-04">1993. Apr. 1993</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Active frame selection for label propagation in videos</title>
		<author>
			<persName><forename type="first">Vijayanarasimhan</forename><surname>Sudheendra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Grauman</forename><surname>Kristen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 12th. European Conference on Computer Vision (ECCV&apos;12)</title>
				<meeting>the 12th. European Conference on Computer Vision (ECCV&apos;12)<address><addrLine>Florence, Italy</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title/>
		<idno type="DOI">10.1007/978-3-642-33715-4_36</idno>
		<ptr target="https://doi.org/10.1007/978-3-642-33715-4_36" />
		<imprint>
			<biblScope unit="page" from="496" to="509" />
			<pubPlace>Heidelberg; Berlin</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Lowrank structure learning via non-convex heuristic recovery</title>
		<author>
			<persName><forename type="first">Deng</forename><surname>Yue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dai</forename><surname>Qionghai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Liu</forename><surname>Risheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhang</forename><surname>Zengke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hu</forename><surname>Sanqing</surname></persName>
		</author>
		<idno type="DOI">10.1109/TNNLS.2012.2235082</idno>
		<idno>DOI:</idno>
		<ptr target="https://doi.org/10.1109/TNNLS.2012.2235082" />
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Neural Networks and Learning Systems</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="383" to="396" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Adversarial active learning for sequences labeling and generation</title>
		<author>
			<persName><forename type="first">Deng</forename><surname>Yue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Chen</forename><surname>Kawai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jin</forename><surname>Shen Yilin</surname></persName>
		</author>
		<author>
			<persName><surname>Hongxia</surname></persName>
		</author>
		<idno type="DOI">10.24963/ijcai.2018/558</idno>
		<ptr target="https://doi.org/10.24963/ijcai.2018/558" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 27th International Joint Conference on Artificial Intelligence</title>
				<meeting>the 27th International Joint Conference on Artificial Intelligence<address><addrLine>Stockholm, Sweden; California</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018-07">2018. July, 2018</date>
			<biblScope unit="page" from="4012" to="4018" />
		</imprint>
	</monogr>
	<note>IJCAI-18</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Translating embeddings for modeling multi-relational data</title>
		<author>
			<persName><forename type="first">Bordes</forename><surname>Antonie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nicolas</forename><surname>Usunier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alberto</forename><surname>Garcia-Duran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jason</forename><surname>Weston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Oksana</forename><surname>Yakhnenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of NIPS</title>
				<meeting>NIPS<address><addrLine>Cambridge, MA</addrLine></address></meeting>
		<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="2787" to="2795" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Knowledge graph embedding by translating on hyperplanes</title>
		<author>
			<persName><forename type="first">Zhen</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jianwen</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jianlin</forename><surname>Feng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zheng</forename><surname>Chen</surname></persName>
		</author>
		<idno type="DOI">10.5555/2893873.2894046</idno>
		<ptr target="https://doi.org/10.5555/2893873.2894046" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 28th. AAAI Conference on Artificial Intelligence (AAAI&apos;14)</title>
				<meeting>the 28th. AAAI Conference on Artificial Intelligence (AAAI&apos;14)<address><addrLine>Menlo Park, CA</addrLine></address></meeting>
		<imprint>
			<publisher>AAAI Press</publisher>
			<date type="published" when="2014-06">2014. June, 2014</date>
			<biblScope unit="page" from="1112" to="1119" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Learning to represent knowledge graphs with Gaussian embedding</title>
		<author>
			<persName><forename type="first">He</forename><surname>Shizhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Liu</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ji</forename><surname>Guoliang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhao</forename><surname>Jun</surname></persName>
		</author>
		<idno type="DOI">10.1145/2806416.2806502</idno>
		<ptr target="https://doi.org/10.1145/2806416.2806502" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of CIKM. ACM</title>
				<meeting>CIKM. ACM<address><addrLine>New York</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="623" to="632" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Knowledge graph embedding via dynamic mapping matrix</title>
		<author>
			<persName><forename type="first">Ji</forename><surname>Guoliang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">He</forename><surname>Shizhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xu</forename><surname>Liheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Liu</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhao</forename><surname>Jun</surname></persName>
		</author>
		<idno type="DOI">10.3115/v1/P15-1067</idno>
		<ptr target="https://doi.org/10.3115/v1/P15-1067" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of ACL. ACL</title>
				<meeting>ACL. ACL<address><addrLine>Stroudsburg, PA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="687" to="696" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">End-to-end sequence labeling via bidirectional LSTM-CNNs-CRF</title>
		<author>
			<persName><forename type="first">Xuezhe</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eduard</forename><surname>Hovy</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1603.01354</idno>
		<ptr target="https://arxiv.org/abs/1603.01354" />
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">Yanyao</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hyokun</forename><surname>Yun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zachary</forename><forename type="middle">C</forename><surname>; Lipton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yakov</forename><surname>Kronrod</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Animashree</forename><surname>Anandkumar</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1707.05928</idno>
		<ptr target="https://arxiv.org/abs/1707.05928" />
		<title level="m">Deep active learning for named entity recognition</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">MMR-based active machine learning for bio entities</title>
		<author>
			<persName><forename type="first">Seokhwan</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yu</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kyungduk</forename><surname>Kim</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeong-Won</forename><surname>Cha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gary</forename><forename type="middle">Geunbae</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers</title>
				<meeting>the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers<address><addrLine>New York; New York, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2006-06">2006. June, 2006</date>
			<biblScope unit="page" from="69" to="72" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Margin based active learning</title>
		<author>
			<persName><forename type="first">Balcan</forename><surname>Maria-Florina</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Broder</forename><surname>Andrei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Zhang</forename><surname>Tong</surname></persName>
		</author>
		<idno type="DOI">10.5555/1768841.1768848</idno>
		<ptr target="https://doi.org/10.5555/1768841.1768848" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 20th. Annual Conference on Learning Theory (COLT&apos;07)</title>
				<meeting>the 20th. Annual Conference on Learning Theory (COLT&apos;07)<address><addrLine>San Diego, CA, USA; Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2007">2007. 2007</date>
			<biblScope unit="page" from="35" to="50" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Automatic keyword extraction from individual documents</title>
		<author>
			<persName><forename type="first">Stuart</forename><surname>Rose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dave</forename><surname>Engel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nick</forename><surname>Cramer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wendy</forename><surname>Cowley</surname></persName>
		</author>
		<idno type="DOI">10.1002/9780470689646.ch1</idno>
		<idno>DOI:</idno>
		<ptr target="https://doi.org/10.1002/9780470689646.ch1" />
	</analytic>
	<monogr>
		<title level="j">Text Mining: Applications and Theory</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="20" />
			<date type="published" when="2010-03">2010. Mar. 2010</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
