<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transfer Learning for Industrial Applications of Named Entity Recognition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lingzhen Chen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Moschitti</string-name>
          <email>alessandro.moschittig@unitn.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giuseppe Castellucci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Favalli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raniero Romagnoli</string-name>
          <email>r.romagnolig@almawave.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Almawave Srl.</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Trento, POVO</institution>
          ,
          <addr-line>Trento 38123</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <fpage>129</fpage>
      <lpage>140</lpage>
      <abstract>
        <p>In this paper, we propose a Transfer Learning technique for Named Entity Recognition that is able to flexibly deal with domain changes. The proposed technique is able to manage both the case when the set of named entities does not change and the case when the set of named entities changes in the target domain. In particular, we focus on the case when the target data contains only the annotation of a target named entity, and the source data is no longer available for the target task. Our solution consists in transferring the parameters from a source model, which are then fine-tuned with the target data. The model architecture is modified when recognizing a new category by adding properly new neurons to the model. Our experiments show that it is possible to effectively transfer learned parameters in both the scenarios, resulting in strong performances over the target categories without degrading the performances on the other named entities.</p>
      </abstract>
      <kwd-group>
        <kwd>Transfer Learning</kwd>
        <kwd>Named Entity Recognition</kwd>
        <kwd>Sequence Labeling</kwd>
        <kwd>Recurrent Neural Network</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Standard Named Entity Recognition (NER) models are supposed to be trained and
applied on data coming from similar sources (i.e., in-domain data). The application of
a NER model on out-of-domain data will inevitably result in poor performances [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
For example, the contexts in which a Person name entity occur could be slightly
different (from a morpho-syntatic point-of-view) in each domain. Moreover, in industrial
scenarios, different domains (e.g., different customers) often will require to recognize
disjoint named entity sets. For example, in finance, target entities would probably
include Companies and Banks, while in politics, target entities could include Senators and
Ministries. Thus, general-purpose models, for example trained starting from
generalpurpose datasets (e.g., wikipedia) cannot be an optimal solution. A typical solution to
the out-of-domain problem is to train a NER model from scratch over different
annotated datasets, i.e., one for each domain. In this way, each NER model can focus over
the morpho-syntactic variations of the texts characterizing the occurrences of the
different categories for each domain. However, this solution has two main disadvantages:
1) annotating a dataset from scratch for each domain can be costly; 2) there is no reuse
of any acquired knowledge over the existing data (e.g., common entity categories could
share some linguistic characteristics that can be useful for training a new model).
      </p>
      <p>Motivated by the problems described above, in this work, we define a new paradigm
of Transfer Learning for NER models. Without loss of generality, our setting consists
of two steps: (i) an initial step, where a source model MS is trained to recognize a set
of named entities on source data DS; (ii) a subsequent step, where the target model MT
needs to recognize the new NE categories in the target data DT, which are unseen in
DS. Notice that DT is typically much smaller in size compared to DS and it will contain
only the annotations of the target entities. Moreover, in the update step DS is no longer
available. These are strong requirements often appearing in industrial scenarios, where
a NER model provider wants to make available model update strategies to its customers,
without distributing the original data used to train the source model.</p>
      <p>
        These problems lie in the research area of Transfer Learning (TL), which aims to
leverage the gained knowledge on a task in a source domain, to improve the learning of
a task in a target domain [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. To tackle the problem, we aim at applying a knowledge
transfer in a deep neural network setting. Given an initial model for NER trained on
source data, we modify the layers of the network to include new neurons for learning
the new categories and continue to train with DT. More specifically, we implement a
Bidirectional LSTM (BiLSTM) with Conditional Random Fields (CRF) as MS. In the
update step, we modify such architecture to build MT. The weights from MS are
transferred to MT and then fine-tuned with DT, where both seen and unseen NE categories
could be observed.
      </p>
      <p>We conduct extensive experiments for updating NER models, analyzing the
performance of both methods on a custom dataset. Our experiments show that the training and
fine-tuning phases can effectively learn new knowledge about existing or new entities,
while not degrading much the performances over unchanged entities.</p>
      <p>The rest of the paper is structured as following: in sections 2 and 3 the adopted
Deep Neural Network and the transfer learning methodology are described. In section
4 the experimental evaluation is provided. In section 5 the related work are discussed.
Finally, in section 6 the conclusions are derived.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Deep Neural Network for Named Entity Recognition</title>
      <p>The NER problem can be defined as a sequential labeling problem: given an input
sequence X = x1; x2; :::; xn (xi 2 X ), predicts the output sequence Y = y1; y2; :::; yn
(yi 2 Y). X and Y represent the input and output space respectively. Typically, the
model learns to maximize the conditional probability P (Y jX).</p>
      <p>
        The model adopted to label a token sequence with respect to the NE categories
is the state-of-the-art neural model made of a Bidirectional LSTM + CRF [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. This
model consists of several layers: (i) a character-level Bidirectional LSTM (BiLSTM) to
compute a vector representation of each word based on the character composing it; (ii) a
token-level LSTM layer whose role is to compute the representation of a sequence; (iii)
a Conditional Random Field (CRF) layer for making sequential predictions. A detailed
description of the model is presented in remaining of this section.
Word &amp; Character Representations. A word in the input sequence is represented by
both its word-level and character-level embeddings. The former is used to capture
semantic information of words, while the latter is adopted to model the morphological
variations of words (e.g., capitalization, inflections, etc.) and they showed to provide
useful information for the NER task [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Token-level embeddings map each word ti
(given a lookup table) to a vector wi. Character-level embeddings map, instead, each
character cij of the ith word (given a lookup table) to a vector eij . The final
representation for a word given its character embeddings is obtained by applying a BiLSTM (see
next section) over the eij vectors to obtain the character-level representation ei. The
final representation of the ith word xi in the input sequence is the concatenation of its
word-level embedding wi and character-level embedding ei.
      </p>
      <p>Bidirectional LSTM Layer. BiLSTM is a neural architecture where two LSTM cells
are applied in both the directions of a sequence. In particular, it is composed of a
forward LSTM (LST!M) and a backward LSTM (LSTM), which read the input sequence
(represented as word vectors described in the previous subsection) in both left-to-right
and reverse order.</p>
      <p>The output of the BiLSTM ht is obtained by the concatenation of forward and
!
backward outputs: ht = [ht; ht], where
and
h!t = LST!M(xt; !h t 1)
ht = LSTM(xt; h t+1)
and ht captures the left and right context for xt.</p>
      <p>Prediction Layer. The representation computed by the BiLSTM is then passed through
a fully-connected layer that computes the representation pt for each word. The final
prediction yt is made based on the softmax over the output of fully-connected layer pt
by</p>
      <p>P(yt = cjpt) =</p>
      <p>eWo;cpt</p>
      <p>Pc02C eWo;c0 pt ;
where Wo are parameters to be learned on the output layer and C represents the set of
all the possible output labels.</p>
      <p>
        CRF Layer. CRF [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] is a powerful model for sequence labeling tasks such as NER
as it considers neighboring information of the categories when making predictions. We
implement a Linear Chain CRF over the output of the BiLSTM to improve the
prediction ability of the model, i.e., by taking the neighboring prediction into account while
classifying each time step. Recall that we aim at maximizing the conditional
probability P(Y jX) of the output label sequence Y on the input word sequence X. In the CRF
setting, P(Y jX) is computed by
      </p>
      <p>P(Y jX) = PY 0 escore(X;Y 0)</p>
      <p>escore(X;Y )
where Y 0 is all possible label sequences and score( ) is calculated by adding up
the transition and emission scores for a label sequence. More specifically, the emission
(1)
(2)
(3)
(4)
score is the probability of predicting a label yt for t word in the sequence, and is given
by the prediction made from the BiLSTM. The transition score is the probability of
transiting from previously predicted label yt 1 to current label yt. The outputs of
fullyconnected layer pt at time step t provides the emission scores for all possible value of
y. A square matrix P of size C + 2 is used to store transitional probabilities among C
output labels, as well as a start label and an end label. Hence,
score(X; Y ) =</p>
      <p>X pt[yt] + Pyt;yt 1
t
(5)
3</p>
    </sec>
    <sec id="sec-3">
      <title>Updating Algorithm for NER Neural Models</title>
      <p>The NER architecture described in the previous section is usually adopted to train a
model in a domain that is able to recognize named entities within texts of the same
domain. Our goal is to make this architecture adaptable to different domains without
the need of a full re-training of the model. Recall, in fact, that we are assuming that the
dataset used to train a base model, could be not available in the update step.</p>
      <p>Let us consider data coming from a source domain DS containing named entities
annotations for the entities s 2 S and data coming from a target domain DT. The target
data contains annotations of entities t 2 T , where T = S [ O and O is a possibly empty
set of new entity categories.</p>
      <p>In the initial step, the model is trained on DS until the optimal parameters ^S =
^S o [ ^oS are obtained and saved. ^S o contains the parameters of the network except
those of the output layer, while ^oS contains the parameters only of the output layer.
There are two different TL cases, depending on the nature of the T set. First is the case
where T = S [ O and O is empty, i.e., the set of entities does not change between
source and target domains. In this setting, what changes is only the domain of the data
where the entities of the set T need to be recognized. In this case, the adaptation of the
neural model can be achieved by updating the weights of the network by performing
a training stage over DT. In this way, the weights of the network will shift towards
the new data DT, and the network training will accommodate within them the relevant
morpho-syntactic information of the new domain.</p>
      <p>The second scenario occurs when T = S [ O and O is a non-empty set. In this case,
the model structure needs to be changed in order to accommodate the new categories.
In order to make the model able to recognize the new NE categories contained in O,
the major modification is made on the output layer after the BiLSTM. In the standard
BiLSTM architecture, the fully-connected layer maps the outputs h of BiLSTM to a
vector p of size nC + 1, where n is a factor depending on the tagging format of the
dataset. For example, n = 2 if the data is annotated with an IOB format: in fact, a word
can be Outside a NE or can be at the Beginning or Inside an annotation. For recognizing
target NEs, we build the same model architecture for all layers before the output layer
and initialize all the parameters by ^S o obtained in the initial step. We extend the output
layer by size nE, where E is the number of new NE categories. The extended part is
initialized with weights drawn from the random distribution X N ( ; 2), where
and are the mean and standard deviation of the pre-trained weights in the same layer.
In this way, the associated weight matrix of the fully-connected layer Wso also updates
from the original shape nC p to a new matrix Wso0 of shape (nC + nE + 1) p. Note
that the parameters ^oSutput and ^S output are essentially the weights in the matrix Wso
and Ws o. In the BiLSTM + CRF architecture, the modification must be made also on
the P transition matrix in order to include the new rows and columns that are relative to
the new entity categories. The new model parameters are updated in the same way as in
the initial step by performing training steps over DT.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>We divide the experiments on dynamically updating the NER model into three sub-tasks
(i), (ii) and (iii). For task (i), we experiment on updating MT with regard to a target NE
category with more training data coming from a new domain (i.e. DT). For task (ii), we
experiment on forgetting and retraining an entity class (seen in DS) with new training
data (i.e. DT) from a different domain. Note that the difference between task (i) and task
(ii) is the morphology of the target NE in DT. For task (i), the morphology of the target
NE is expected to be the same in DS and DT while they are expected to be different for
task (ii) (hence the forgetting and retraining). Lastly, for task (iii), we experiment on
updating MT on a target NE that is unseen in DS, with new training data DT.
4.1</p>
      <sec id="sec-4-1">
        <title>Data</title>
        <p>
          The experiments are carried out with a dataset provided by Almawave. The dataset
consists of sentences coming from chat conversations in Italian between users and
customer service support representatives. The dataset contains conversations coming from
two different customers/domains. Each sentence is annotated with respect to 8 named
entity categories. The specification of each of the named entity category can be found in
Table 1. The annotation process was provided by Almawave in the context of the Italian
educational project Alternanza Scuola-Lavoro. In this project, two groups of students
coming from a scientific and a language high school, annotated the dataset by Webanno
annotation tool [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] in two weeks (one week for each group): each group was split
into 9 teams, the annotations of each team were checked against the other. Finally, an
annotation review process was performed by Almawave’s employees.
        </p>
        <p>The dataset is split in three parts, reflecting the three sub-tasks. In Table 2, the
details about the dataset are provided. The target NE category for three subtasks are
PaymentInformation, CustomerIdentification and Address,
respectively. Note that in DS, all NEs are annotated but in DT, only the annotations of the
target NE with regard to the subtask is provided. That is to say, there are linguistic
representations of other NEs in DT but they are annotated as ”O”. We first use pretrained
MS to annotate these NEs before we train the MT on DT. There is no specific
preprocessing of the dataset, except for replacing all digits with 0 to reduce the size of the
vocabulary.
Entity Description Example
Address pTohsetaolcacdudrrreenssceinoaf asentence. (LT’ihnedairdizdzroesssuolncothnetracottnotrVaciatidsiVCiaasdailCBaoscacloBnoec1c8o8n.e 188.)
CustomerIdentification oTfhtehoecccuusrtroemnceer,oef.gin.,fcolrimenattiiodn, username. (ccoldieicnet icdliPen1t0eAPA1201A0A624150)645
Location The occurrence of a location. (LMaymrieasirdeesindceynizsainaRRoommea,,mmaathileccoonntrtraattcot irsergeigstirsatetroead LinaLtiantai.na.)
Organization aTnheorogcacnuirzraetniocne.of the name of (HIogoritcaevbuettoteurnoofffefrerftraommiFgalisotrweedba) Fastweb.</p>
        <p>The occurrence of information regarding Si, certo pu fare l’accredito sul conto IT17X 000011111000001234 000.
PaymentInformation a payment, e.g., a credit card number or an IBAN. (Sure, you can credit on IT17X 000011111000001234 000.)
Person tThheenoacmceurorefnacpeeorfsoinnf.ormation regarding (LTahleinlienae risegriesgtirsatteareadntoomMeadriioMRaorisosiR)ossi.</p>
        <p>PersonalInformation pTehresooncacluirnrefonrcmeaotfioinnf,oer.mg.adtiaotne orefgbairrdthin.g (NIaatmoaboVrenroilni pVreorovilnicpiraovdiinRceomofaRilom23e/0o5n/12935/065./1956)
Phone pThhoenoecncuurmrebnecres.of information regarding (MYoiupucacnonrteaatctahrmeaela3t33334344747757555o oarl 0a0t60607676878788)
Table 1: Named Entity Categories and Examples.
We use 300 dimension GLOVE pretrained embedding3 for Italian to initialize the weights
of the embedding layer. Since we do not lowercase the tokens in the input sequence, we
map the words having no direct mapping to the pretrained word embeddings to their
lowercased counterpart, if one is found in the pretrained word embeddings.</p>
        <p>We map the infrequent words (words that appear in the dataset for less than twice) to
&lt;UNK&gt; as well as the unknown words appearing in the test set. The word embedding
for &lt;UNK&gt; is drawn from a uniform distribution between [ 0:25; 0:25]. The character
embedding lookup table is randomly initialized with embedding size of 25. The hidden
size of the character-level BLSTM is 25 while the word level one is 128.</p>
        <p>
          We apply a dropout regularization on the word embeddings with a rate of 0.5. All
models are implemented in TensorFlow [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], as an extension of NeuroNER [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. In the
fist step, we use the Adam [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] optimizer with a learning rate of 0.05, gradient clipping
of 5.0 to minimize the categorical cross entropy, and a maximum epoch number of 100
at each step. The neural hyperparameters of the update step are the same as the ones of
the first step except for the learning rate. In fact, we observed that it is a better choice
to lower the value of the learning rate: it is needed to avoid catastrophic forgetting
phenomena. In our experiments for the update step, we set the value of learning rate to
3 http://hlt.isti.cnr.it/wordembeddings/
0.005. Finally, the models are evaluated with the F1 score 4 as in the official CONLL
2003 shared task [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
4.3
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Results</title>
        <p>In this section, we present results on the three experiments.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Fine-tuning seen NE with Additional Data from a different domain First of all, we</title>
        <p>show the result in Table 3 of task (i), i.e. training MT with new training data on target
NE.</p>
        <p>In terms of the test F1 score in the initial step, we see that model performs well on
categories that have a substantial number of instances in DS, such as Organization,
Person and Phone. One unusual case being Location, which has 1932 training
instances but only got 12.31 in test F1 in the initial step. This is possibly due to the
fact that there are only 16 instances in the test set, and most of them are not commonly
seen in the training set. For example, surface form Manfredonia, Aprilia Latina, Acilia
Roma or Veroli are never presented in the training set, which makes it quite difficult for
the model to correctly distinguish them from non-NE words, especially in a different
context. Surface form Trento is presented many time in the training set but annotated as
O, which also causes confusion for the model.</p>
        <p>Phone is a NE category with plenty of training instances but the F1 score is not
as high as expected. After a comparison between training and test data, we found out
that the low performance could be due to the different format of the phone number. In
the training set, phone number are often presented as a token of several digits in the
format 000000000 where each 0 means a digit, whilst in the test set, they are presented
as several separate tokens with shorter numbers of digits, instead of one token.</p>
        <p>With regard to fine-tuning the dataset on DT, the F1 of the target NE category
PaymentInformation increased by 1 point with the new training data in DT. For
4 https://www.clips.uantwerpen.be/conll2000/chunking/conlleval.txt
some of the NE categories, the test F1 sees a very minor amount of decrease, while for
some others, there are improvement in the test F1. This result suggests the effectiveness
of the TL techniques on transferring the learned knowledge in DS to MT. It is also able
to further improve on both the target NE category and several other categories without
catastrophic forgetting.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Retraining seen NE with additional data from a different domain In Table 4, we</title>
        <p>present the results of task (ii), where the target model MT is trained on DT with the
same set of NE as DS but in a different domain. In terms of the test F1 in the initial
step, the result is quite similar to that of task (i). It is reasonable because the training
procedure for the source model in these two tasks are the same. As in the subsequent
step for task (ii), we see a consistent improvement on the test F1 for all NE categories.
It indicates that MT is able to adapt fairly well to the new domain for NEs, with the
parameters transferred from MS.</p>
        <p>More importantly, since the Dtest in task (ii) is from the same domain as DT for task
(ii), task (ii) simulates the industrial scenario where a company have at hand a pretrained
model and some in-domain training for their target task. The good performance of the
proposed TL techniques suggests that the model performance on the target task can be
improved with a small amount of target data, compared to using only the pretrained
model.</p>
      </sec>
      <sec id="sec-4-5">
        <title>Recognizing new NE with new training data Last but not the least, we present in</title>
        <p>Table 5 the results of task (iii) on recognizing NE that is unseen in DS. As seen from
the result, in the subsequent step, we got a fairly good performance on recognizing the
target new NE along with improvements in test F1 scores on other seen NE categories.
It is worth noting that, in DT, there are not many training examples of those seen NE
categories. (for example, only 36 for PaymentInformation. The test F1 score only
sees a minor decrease in two categories CustomerIdentification and Phone.</p>
        <p>Hence, the TL techniques are able to transfer well the learned knowledge in the source
model MS to the target model MT.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Related Work</title>
      <p>Our work is related to research in Named Entity Recognition and Transfer Learning.
We report on both in the following sections.
5.1</p>
      <sec id="sec-5-1">
        <title>Named Entity Recognition</title>
        <p>
          In the earlier years of the study on Named Entity Recognition, most work approached
the task by engineering linguistic features [
          <xref ref-type="bibr" rid="ref2 ref3">3, 2</xref>
          ], such as lexical features, case
information, orthographically patterns and so on. Models such as Maximum Entropy Model,
Perceptrons and CRF are often used for the task [
          <xref ref-type="bibr" rid="ref10 ref3 ref7 ref9">9, 3, 7, 10</xref>
          ].
        </p>
        <p>
          Recent approaches also include neural models. Current state-of-the-art methods on
NER are mostly Recurrent Neural Network models that incorporate word and character
level embeddings and/or additional morphological features. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] uses BiLSTM
combined with CRF to established the state-of-the-art performance on NER (90.10 in terms
of test F1 on CONLL 2003 NER dataset). In their system, they used a variety of
handcraft features together with word embeddings as the input to the model. Later, [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]
implemented the same CRF over BiLSTM model without using any handcraft features.
They reported 90.94 of test F1 on the same dataset. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] also implemented a similar
BiLSTM model with Convolutional filters as character feature extractor, achieving 91.62 in
the F1 score using also lexical features.
        </p>
        <p>In this work, we opted to adopt a BiLSTM + CRF in order to test whether our
proposed methods can be applied on the state-of-the-art models.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Transfer Learning</title>
        <p>
          Neural networks based TL has proven to be very effective for image recognition [
          <xref ref-type="bibr" rid="ref19 ref8">8,
19</xref>
          ]. In the NLP area, [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] showed that TL can also be successfully applied on
semantically equivalent NLP tasks. Researches were carried out on NER related TL too. [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]
explored TL for NER with different NE categories (different output spaces). They
pretrain a linear-chain CRF on large amount annotated data in the source domain. A two
linear layer neural network is used to learn the discrepancy between the source and
target label distributions. Finally, they initialize another CRF with learned weight
parameters in linear layers for the target domain. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] experimented with transferring features
and model parameters between similar domains, where the label types are different but
may have semantic similarity. Their main approach is to construct label embeddings to
automatically map the source and target label types to help improve the transfer. In [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ],
they have also used an adapter to help transfer. They propose progressive networks to
solve a sequence of reinforcement learning task while being immune to the forgetting,
by leveraging learned knowledge with the adapter, which is an additional connection
between new model and learned models. This connection is realized by a feed-forward
neural layer with non-linear activation.
6
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In this paper, we study the Named Entity Recognition problem in a domain adaptation
setting involving a two-step process. The first step regards the acquisition of a source
model, while the second step regards the update of this model.</p>
      <p>The main characteristics of our setting are: (i) the set of named entities between
source and target domain can possibly change; (ii) the dataset used to acquire the source
model is no longer available in the update step; (iii) the dataset used to train the target
model is only annotated with target NE. These are strong requirements often appearing
in industrial scenarios, where a NER model provider wants to make available model
update strategies to its customers, without distributing the original data used to train the
source model.</p>
      <p>We verified that our method can be applied to current state-of-the-art neural models
for Named Entity Recognition, i.e., a Bidirectional LSTM + CRF applied over word
and character embeddings inputs. We carried out extensive experiments to analyze the
effect on the performance of the transfer approach. The empirical results show the
effectiveness of the proposed methods and techniques. In particular, the update step we
studied demonstrated to be effective in recognizing a specific target entity, while not
degrading the performances on the other categories.</p>
      <p>In future work, we should investigate how to update a model with respect to more
than one named entity. Moreover, we believe that this approach can be also applied to
different and more complex tasks that we aim to investigate, e.g., coreference resolution.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barham</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brevdo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Citro</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Devin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghemawat</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harp</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Irving</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Isard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jozefowicz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kaiser</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kudlur</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levenberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Mane´,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Monga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Moore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Murray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Olah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Schuster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Shlens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Talwar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Tucker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Vanhoucke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Vasudevan</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          , Vie´gas,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Vinyals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Warden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Wattenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Wicke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <surname>X.</surname>
          </string-name>
          :
          <article-title>TensorFlow: Large-scale machine learning on heterogeneous systems (</article-title>
          <year>2015</year>
          ), https://www.tensorflow.org/, software available from tensorflow.org
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Carreras</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , Ma`rquez, L.,
          <string-name>
            <surname>Padro</surname>
            <given-names>´</given-names>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          :
          <article-title>Learning a perceptron-based named entity chunker via online recognition feedback</article-title>
          .
          <source>In: Proceedings of the Seventh Conference on Natural Language Learning</source>
          ,
          <source>CoNLL</source>
          <year>2003</year>
          ,
          <article-title>Held in cooperation with HLT-NAACL 2003</article-title>
          , Edmonton, Canada, May 31 - June 1,
          <year>2003</year>
          . pp.
          <fpage>156</fpage>
          -
          <lpage>159</lpage>
          (
          <year>2003</year>
          ), http://aclweb.org/ anthology/W/W03/W03-0422.pdf
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chieu</surname>
            ,
            <given-names>H.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
          </string-name>
          , H.T.:
          <article-title>Named entity recognition with a maximum entropy approach</article-title>
          .
          <source>In: Proceedings of the Seventh Conference on Natural Language Learning</source>
          ,
          <source>CoNLL</source>
          <year>2003</year>
          ,
          <article-title>Held in cooperation with HLT-NAACL 2003</article-title>
          , Edmonton, Canada, May 31 - June 1,
          <year>2003</year>
          . pp.
          <fpage>160</fpage>
          -
          <lpage>163</lpage>
          (
          <year>2003</year>
          ), http://aclweb.org/anthology/W/W03/W03-0423.pdf
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chiu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nichols</surname>
          </string-name>
          , E.:
          <article-title>Named entity recognition with bidirectional lstm-cnns</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>4</volume>
          ,
          <fpage>357</fpage>
          -
          <lpage>370</lpage>
          (
          <year>2016</year>
          ), https://www. aclweb.org/anthology/Q16-1026
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ciaramita</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Altun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Named-entity recognition in novel domains with external lexical knowledge</article-title>
          .
          <source>In: Proceedings of the NIPS Workshop on Advances in Structured Learning for Text and Speech Processing</source>
          .
          <year>2005</year>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dernoncourt</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szolovits</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Neuroner: an easy-to-use program for named-entity recognition based on neural networks</article-title>
          .
          <source>In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP</source>
          <year>2017</year>
          , Copenhagen, Denmark, September 9-
          <issue>11</issue>
          ,
          <fpage>2017</fpage>
          -
          <string-name>
            <given-names>System</given-names>
            <surname>Demonstrations</surname>
          </string-name>
          . pp.
          <fpage>97</fpage>
          -
          <lpage>102</lpage>
          (
          <year>2017</year>
          ), https://aclanthology. info/papers/D17-2017/d17-
          <fpage>2017</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Diesner</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carley</surname>
            ,
            <given-names>K.M.:</given-names>
          </string-name>
          <article-title>Conditional random fields for entity extraction and ontological text coding</article-title>
          .
          <source>Computational &amp; Mathematical Organization Theory</source>
          <volume>14</volume>
          (
          <issue>3</issue>
          ),
          <fpage>248</fpage>
          -
          <lpage>262</lpage>
          (
          <year>2008</year>
          ). https://doi.org/10.1007/s10588-008-9029-z, https://doi.org/10. 1007/s10588-008-9029-z
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Donahue</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoffman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Zhang, N.,
          <string-name>
            <surname>Tzeng</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darrell</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Decaf: A deep convolutional activation feature for generic visual recognition</article-title>
          .
          <source>In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014</source>
          , Beijing, China,
          <fpage>21</fpage>
          -
          <lpage>26</lpage>
          June 2014. pp.
          <fpage>647</fpage>
          -
          <lpage>655</lpage>
          (
          <year>2014</year>
          ), http://jmlr.org/proceedings/papers/v32/ donahue14.html
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Florian</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ittycheriah</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jing</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Zhang, T.:
          <article-title>Named entity recognition through classifier combination</article-title>
          .
          <source>In: Proceedings of the Seventh Conference on Natural Language Learning</source>
          ,
          <source>CoNLL</source>
          <year>2003</year>
          ,
          <article-title>Held in cooperation with HLT-NAACL 2003</article-title>
          , Edmonton, Canada, May 31 - June 1,
          <year>2003</year>
          . pp.
          <fpage>168</fpage>
          -
          <lpage>171</lpage>
          (
          <year>2003</year>
          ), http://aclweb.org/anthology/W/W03/ W03-0425.pdf
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kayaalp</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Biological entity recognition with conditional random fields</article-title>
          .
          <source>In: AMIA</source>
          <year>2008</year>
          , American Medical Informatics Association Annual Symposium, Washington, DC, USA, November 8-
          <issue>12</issue>
          ,
          <year>2008</year>
          (
          <year>2008</year>
          ), http://knowledge.amia.org/ amia-55142
          <source>-a2008a-1</source>
          .625176/t-001
          <source>-1</source>
          .626020/f-001
          <source>-1</source>
          .626021/ a-060
          <source>-1</source>
          .626384/a-061
          <source>-1</source>
          .
          <fpage>626381</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bidirectional LSTM-CRF models for sequence tagging</article-title>
          .
          <source>CoRR abs/1508</source>
          .
          <year>01991</year>
          (
          <year>2015</year>
          ), http://arxiv.org/abs/1508.01991
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stratos</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarikaya</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jeong</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>New transfer learning techniques for disparate label sets</article-title>
          .
          <source>In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31</source>
          ,
          <year>2015</year>
          , Beijing, China, Volume
          <volume>1</volume>
          :
          <string-name>
            <given-names>Long</given-names>
            <surname>Papers</surname>
          </string-name>
          . pp.
          <fpage>473</fpage>
          -
          <lpage>482</lpage>
          (
          <year>2015</year>
          ), http://aclweb.org/ anthology/P/P15/P15-1046.pdf
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>CoRR abs/1412</source>
          .6980 (
          <year>2014</year>
          ), http://arxiv.org/abs/1412.6980
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lafferty</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.C.N.</given-names>
          </string-name>
          :
          <article-title>Conditional random fields: Probabilistic models for segmenting and labeling sequence data</article-title>
          .
          <source>In: Proceedings of the Eighteenth International Conference on Machine Learning (ICML</source>
          <year>2001</year>
          ), Williams College, Williamstown, MA, USA, June 28 - July 1,
          <year>2001</year>
          . pp.
          <fpage>282</fpage>
          -
          <lpage>289</lpage>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Lample</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subramanian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawakami</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Neural architectures for named entity recognition</article-title>
          .
          <source>In: NAACL HLT</source>
          <year>2016</year>
          ,
          <article-title>The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          , San Diego California, USA, June 12-17,
          <year>2016</year>
          . pp.
          <fpage>260</fpage>
          -
          <lpage>270</lpage>
          (
          <year>2016</year>
          ), http://aclweb.org/anthology/N/N16/N16-1030.pdf
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Mou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meng</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>How transferable are neural networks in NLP applications?</article-title>
          <source>In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP</source>
          <year>2016</year>
          , Austin, Texas, USA, November 1-
          <issue>4</issue>
          ,
          <year>2016</year>
          . pp.
          <fpage>479</fpage>
          -
          <lpage>489</lpage>
          (
          <year>2016</year>
          ), http://aclweb.org/anthology/D/D16/D16-1046.pdf
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>A survey on transfer learning</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>22</volume>
          (
          <issue>10</issue>
          ),
          <fpage>1345</fpage>
          -
          <lpage>1359</lpage>
          (
          <year>2010</year>
          ). https://doi.org/10.1109/TKDE.
          <year>2009</year>
          .
          <volume>191</volume>
          , https://doi.org/10. 1109/TKDE.
          <year>2009</year>
          .191
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferraro</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hou</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldwin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Named entity recognition for novel types by transfer learning</article-title>
          .
          <source>In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP</source>
          <year>2016</year>
          , Austin, Texas, USA, November 1-
          <issue>4</issue>
          ,
          <year>2016</year>
          . pp.
          <fpage>899</fpage>
          -
          <lpage>905</lpage>
          (
          <year>2016</year>
          ), http://aclweb.org/anthology/D/D16/D16-1087.pdf
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Razavian</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azizpour</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sullivan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carlsson</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>CNN features off-the-shelf: An astounding baseline for recognition</article-title>
          .
          <source>In: IEEE Conference on Computer Vision</source>
          and Pattern Recognition,
          <source>CVPR Workshops</source>
          <year>2014</year>
          , Columbus,
          <string-name>
            <surname>OH</surname>
          </string-name>
          , USA, June 23-28,
          <year>2014</year>
          . pp.
          <fpage>512</fpage>
          -
          <lpage>519</lpage>
          (
          <year>2014</year>
          ). https://doi.org/10.1109/CVPRW.
          <year>2014</year>
          .
          <volume>131</volume>
          , https://doi.org/10.1109/ CVPRW.
          <year>2014</year>
          .131
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Rusu</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rabinowitz</surname>
            ,
            <given-names>N.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Desjardins</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soyer</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kirkpatrick</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavukcuoglu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pascanu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hadsell</surname>
          </string-name>
          , R.:
          <article-title>Progressive neural networks</article-title>
          .
          <source>CoRR abs/1606</source>
          .04671 (
          <year>2016</year>
          ), http://arxiv.org/abs/1606.04671
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Sang</surname>
            ,
            <given-names>E.F.T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meulder</surname>
            ,
            <given-names>F.D.</given-names>
          </string-name>
          :
          <article-title>Introduction to the conll-2003 shared task: Languageindependent named entity recognition</article-title>
          .
          <source>In: Proceedings of the Seventh Conference on Natural Language Learning</source>
          ,
          <source>CoNLL</source>
          <year>2003</year>
          ,
          <article-title>Held in cooperation with HLT-NAACL 2003</article-title>
          , Edmonton, Canada, May 31 - June 1,
          <year>2003</year>
          . pp.
          <fpage>142</fpage>
          -
          <lpage>147</lpage>
          (
          <year>2003</year>
          ), http://aclweb.org/ anthology/W/W03/W03-0419.pdf
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Yimam</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gurevych</surname>
            , I., de Castilho,
            <given-names>R.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biemann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Webanno</surname>
          </string-name>
          :
          <article-title>A flexible,webbased and visually supported system for distributed annotations</article-title>
          .
          <source>In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (System Demonstrations) (ACL</source>
          <year>2013</year>
          ). pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . Association for Computational Linguistics (
          <year>August 2013</year>
          ), http://tubiblio.ulb.tu-darmstadt.de/98019/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>