=Paper=
{{Paper
|id=Vol-1747/BP03_ICBO2016
|storemode=property
|title=Label Embedding Approach for Transfer Learning
|pdfUrl=https://ceur-ws.org/Vol-1747/BP03_ICBO2016.pdf
|volume=Vol-1747
|authors=Rasha Obeidat,Xiaoli Fern,Prasad Tadepalli
|dblpUrl=https://dblp.org/rec/conf/icbo/ObeidatFT16
}}
==Label Embedding Approach for Transfer Learning ==
Label Embedding for Transfer Learning Rasha Obeidat, Xiaoli Fern, Prasad Tadepalli School of Electrical Engineering and Computer Science Oregon State University {obeidatr, xfern, tadepall}@eecs.oregonstate.edu Abstract. Automatically tagging textual mentions with the Standard Domain adaptation techniques [3,4] are not directly concepts, types and entities that they represent are important applicable to this problem because they assume that the label tasks for which supervised learning has been found to be very sets are invariant. Recent work proposed a solution based on effective. In this paper, we consider the problem of exploiting finding a mapping between the labels using Canonical multiple sources of training data with variant ontologies. We Correlation Analysis (CCA), and then reducing the problem to present a new transfer learning approach based on embedding multiple label sets in a shared space, and using it to augment the the standard domain adaptation setting [5]. training data. We develop a method that embeds the source and target labels in a shared space and takes advantage of the shared space to Keywords— transfer learning, Label embedding. transfer the knowledge. Instead of using the label embedding to produce a mapping between the source and target labels, we I. INTRODUCTION directly employ the label embeddings to augment the feature Automatically tagging textual mentions with ontological representation of the target examples by the predicted source concepts, types, and entities that they represent is useful in label embeddings. After that, a model is trained on the target many knowledge-intensive fields such as biology and side. We conducted a preliminary study on the task of Named medicine. This problem is studied under the names of Named Entity Recognition in which we used a two dataset that use Entity Recognition, Entity Linking, and Wikification. different but related annotation scheme. We ashow that our Supervised learning from annotated training data has been approach significantly outperforms several baselines. found to be an effective method to tackle this task. However, in most fields in general, and biology in particular, there are II. PROBLEM SETUP often multiple ontologies. For example, different ontologies A domain Di = (Xi, P(Xi)) consists of two components: the such as the Cell Type Ontology, the Protein Ontology, the feature space Xi and the corresponding marginal distribution Sequence Ontology, and the Gene Ontology might overlap, P(Xi). Let Ti = (Yi, fi(.)) be the task i where Yi is the label set of but use different vocabulary, and provide complementary the domain i, and let fi(.) = Xi →Yi be a function that maps Xi to information [11]. Each ontology comes with its own annotated Yi. The goal of transfer learning is to use the knowledge of fs training data, which presents the problem of reconciling the learned from source domain-task pair (Ds, Ts) to improve the different ontologies and effectively using the training data for learning of ft on the target side (Dt, Tt). the old (source) ontologies in training for a new (target) ontology. In standard domain adaptation (aka transductive transfer learning [3,4,6]), the source and the target tasks are the same , The above problem is an instance of Transfer learning, which i.e., Ts = Tt., while the domains differ (either Xs = Xt or P(Xs) aims to leverage the training data from one or more source = P(Xt) ). On the other hand, in the inductive transfer learning domains to improve the sample efficiency in a related target setting [7,5], which includes our work, the domains are the domain. Domain Adaptation is Transfer learning where the same or closely related, but the tasks differ, i.e, Ts ≠ Tt. . source and the target domains use the same label set but have different distributions [1]. Transfer learning where the label sets are variant across domains is far less studied. In many real III. TRANSFER LEARNING VIA LABEL EMBEDDING world applications, the ontologies or label sets of different In this section, we describe our approach to learn label tasks could be (implicitly) overlapping and/or intricately embeddings and use them to transfer the learning across the related. For example, one biological application of natural domains. We follow the method presented in Kim et al. [5] to language processing is to tag natural texts with proteins from a induce the label embeddings. Specifically, we use Canonical given protein ontology. In a related task, we might need to tag Correlation Analysis (CCA)[8] to project both source and the text with genes based on a specific gene ontology. The two target labels to a shared space where the correlation between ontologies are clearly related and may provide useful the projected vectors is maximized. Then, we employ these information toward one another. For such tasks, we need a embeddings to transfer the knowledge from the source domain transfer learning approach that can be applied with variant to the target domain. The projection vectors then can be used to ontologies/label sets, which will learn simultaneously from reduce the dimensionality of the variables by projecting them both domains and thus enhance the efficiency of learning. into k-dimensional space, where k is a parameter to be tuned. To use the extracted embeddings in transferring the knowledge, labels transfers the knowledge from CoNLL2003 to TAC- we propose a method that works as follows: first, we train a KBP2015 via these embeddings. model on the source domain, and use it to make predictions on the target domain. Then, we augment the feature space of each TABLE I. MICRO-AVERAGED AND MACRO-AVERAGED RECALL, instance in the target domain with the label embedding PRECISION AND F1 -SCORES OF THE METHODS TARGETONLY, PRED, AND corresponding to the predicted source label. Finally, a model is AUGMNTTR ON THE TASK OF NAMED ENTITY RECOGNITION. trained on the target domain. Baseline Avg-R Avg-P Avg-F1 A nice property of this method is that it can be applied TargetOnly 0.618 0.753 0.679 Pred 0.576 0.756 0.654 regardless the type of relationships between the source and the 0.745 0.746 0.745 AugmntTr target labels. It works with 1-to-1, n-to-1, and 1-to-n relationships. It is also applicable if the label types overlap. CONCLUSION IV. EXPERIMENTAL SETUP We present an approach to transfer the learning with different label sets between the source and the target domains. Our In this section, we describe our experimental setup and results approach makes use of label embeddings induced by CCA. We on the task of Named Entity Recognition (NER). augment the feature space of the target data with embeddings of the predicted source labels, and then, train a model on the Dataset. We used CoNLL 20031 NER benchmark dataset as a target domain. We find that CCA is able to produce high source domain and a small dataset called TAC-KBP20152 NER quality label embeddings that are capable of transferring the dataset as a target. CoNLL2003 defines four entity types: knowledge across domains, this explains the superiority of our Person (PER), Organization (ORG), Location (LOC), and approach over the baselines. Miscellaneous (MICS). TAC-KBP2015 defines six entity types: Person (PER), Title (TTL), Organization (ORG), Geopolitical Entities (GPE), Location (LOC), and Facilities ACKNOWLEDGMENTS (FAC). Our approach doesn’t need any prior knowledge of the We gratefully acknowledge the support of DARPA and AFRL matching types between CoNLL 2003 and TAC-KBP2015. under the contract number FA8750-13- 2-0033. Evaluation. We follow CoNLL exact match evaluation protocol for the NER task [9]. In particular, we calculate the REFERENCES recall, the precision, and the F1-score for each entity type, and [1] S. J. Pan and Q. Yang, “A survey on transfer learning,” Knowledge and then micro-average the recalls, the precisions, and the F1- Data Engineering, IEEE Transactions on, vol. 22, no. 10, pp. 1345– 1359, 2010. scores. [2] G. Schweikert, G. R¨atsch, C. Widmer, and B. Sch¨olkopf, “An Features and Training. We employ the standard set of empirical analysis of domain adaptation algorithms for genomic sequence analysis,”in Advances in Neural Information Processing features used by Stanford NLP group to train their NER3. The Systems, 2009, pp.1433–1440. feature set includes: word features, orthographic features, [3] H. Daum´e III, “Frustratingly easy domain adaptation,” arXiv preprint feature conjunctions and others. We also train our model using arXiv:0907.1815, 2009. Stanford NER system4. It provides a general implementation of [4] J. Blitzer, R. McDonald, and F. Pereira, “Domain adaptation with Conditional Random Field [10]. We use label embeddings of structural correspondence learning,” in Proceedings of the 2006 size 5 in all of our experiments. conference on empirical methods in natural language processing. Association for Computational Linguistics, 2006, pp. 120–128. Baselines.To investigate the effectiveness of our method [5] Y.-B. Kim, K. Stratos, R. Sarikaya, and M. Jeong, “New transfer AugmntTr, we compare it to two other baselines: learning techniques for disparate label sets,” ACL. Association for Computational Linguistics, 2015. TargetOnly: train a model on the target dataset. [6] J. Jiang and C. Zhai, “Instance weighting for domain adaptation in nlp,” in ACL, vol. 7, 2007, pp. 264–271. Pred: use the output of source predictor as an additional [7] S. J. Pan, Z. Toh, and J. Su, “Transfer joint embedding for cross-domain feature to train a model on the target dataset. named entity recognition,” ACM Transactions on Information Systems (TOIS), vol. 31, no. 2, p. 7, 2013. V. RESULTS AND DISCUSSION [8] H. Hotelling, “Relations between two sets of variates,” Biometrika, vol. 28, no. 3/4, pp. 321–377, 1936. In this section, we present the experimental results of all [9] D. Nadeau and S. Sekine, “A survey of named entity recognition and approaches under study. The results are summarized in Table classification,” Lingvisticae Investigationes, vol. 30, no. 1, pp. 3–26, 2007. 1. it shows that our method AugmntTr produces about 7% and [10] R. Leaman, G. Gonzalez et al., “Banner: an executable survey of 9% F1-score improvement over TargetOnly and Pred advances in biomedical named entity recognition.” in Pacific methods. This illustrates the ability of CCA to discover the Symposium on Biocomputing, vol. 13. Citeseer, 2008, pp. 652–663. relationship between label types in CoNLL2003 and TAC- [11] C.-T. Tsai and D. Roth, “Concept grounding to multiple knowledge KBP2015 datasets. Augmenting the feature space of TAC- bases via indirect supervision,” Transactions of the Association for Computational Linguistics, vol. 4, pp. 141–154, 2016. KBP2015 dataset with the label embedding of CoNLL2003 1 3 http://www.cnts.ua.ac.be/conll2003/ner/ http://nlp.stanford.edu/projects/biNER/en.prop 2 4 http://www.nist.gov/tac/2015/KBP/ http://nlp.stanford.edu/software/CRF-NER.shtml