<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multi-Arm Active Transfer Learning for Telugu Sentiment Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Subba Reddy Oota</string-name>
          <email>oota.subba@students.iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vijaysaradhi Indurthi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mounika Marreddy</string-name>
          <email>mounika0559@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sandeep Sricharan Mukku</string-name>
          <email>sandeep.mukku@research.iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Radhika Mamidi</string-name>
          <email>radhika.mamidi@iiit.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>International Institute of Information Technology</institution>
          ,
          <addr-line>Hyderabad</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Quadratyx</institution>
          ,
          <addr-line>Hyderabad</addr-line>
        </aff>
      </contrib-group>
      <fpage>62</fpage>
      <lpage>63</lpage>
      <abstract>
        <p>Transfer learning algorithms can be used when sufficient amount of training data is available in the source domain and limited training data is available in the target domain. The transfer of knowledge from one domain to another requires similarity between two domains. In many resource-poor languages, it is rare to find labeled training data in both the source and target domains. Active learning algorithms, which query more labels from an oracle, can be used effectively in training the source domain when an oracle is available in the source domain but not available in the target domain. Active learning strategies are subjective as they are designed by humans. It can be time consuming to design a strategy and it can vary from one human to other. To tackle all these problems, we design a learning algorithm that connects transfer learning and active learning with the well-known multi-armed bandit problem by querying the most valuable information from the source domain. The advantage of our method is that we get the best active query selection using active learning with multi arm and distribution matching between two domains in conjunction with transfer learning. The effectiveness of the proposed method is validated by running experiments on three Telugu language domain-specific datasets for sentiment analysis.</p>
      </abstract>
      <kwd-group>
        <kwd>Active Learning</kwd>
        <kwd>Transfer Learning</kwd>
        <kwd>Multi-Arm Bandit</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>People comment on online reviews and blog posts in social media about
trending activities in their regional languages. There are many tools, resources and
corpora available to analyze these activities for English language. However, not
many tools and resources are available to analyze these activities in resource poor
languages like Telugu. With the dearth of sufficient annotated sentiment data in
the Telugu language, we need to increase the existing available labeled datasets
in different domains. However, annotating abundant unlabeled data manually is
very time-consuming, cost-ineffective, and resource-intensive.</p>
      <p>
        To address the above problems, we propose a Multi-Arm Active Transfer
Learning (MATL) algorithm, which involves transfer learning [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and a
combination of query selection strategies in active learning [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. One of the prerequisites
for transfer learning is that the source and target domains should be closely
related. We use Maximum Mean Discrepancy (MMD) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as a measure to find
the closeness between two distributions of the source and target domains. In
this paper, we experiment with sentiment analysis of Telugu language domain
specific datasets: Movies, Political and Sports1. By considering each domain as
the source or target domain, we have a total of 6 domain pairs: M-P, M-S, P-M,
P-S, S-M, S-P. Figure 1 shows two domain pair results. We evaluate the
accuracy with three different classification techniques viz., support vector machines
(SVM), extreme gradient boosting (XGBoost), gradient boosted trees (GBT),
and meta learning of all these approaches and record the accuracy.
2
      </p>
      <p>
        Approach &amp; Results
In Multi-Arm active transfer learning approach, it takes both source domain:
S = {unlabeled data instances (SU ), labeled data instances (SL)}, and target
domain: T = {unlabeled data instances (TU ), labeled data instances (TL), test
data instances (TT ) (used for measuring classification accuracy at each
iteration)}, iterations (n) as an input. A decision making model is built along with
this approach to predict the posterior probability for each instance of SU . After
calculating the sampling query distribution φ(S(n)), based on multi-arm bandit
approach a best sample instance xin ∈ S is selected for querying. If xin ∈ SU ,
then this selected sample instance (xin ) is labeled with an oracle/labeler as yin
and added to SL. Now the classifier (Cn) is trained on the total set {updated
SL,TL}. Using MMD [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the distance between two distributions is calculated.
This process is repeated until reached query budget. The classification model Cn
is tested on target test data TT to measure the accuracy. The reward (rn(ak(n)))
and observation(on(ak(n))) is updated by comparing the label yin given by the
oracle/labeler with the classifier (Cn(xin )).
      </p>
    </sec>
    <sec id="sec-2">
      <title>Uncertainty Sampling Random Sampling QUIRE QBC</title>
      <p>DWUS
0 50 100 1N50umb20e0r of25q0ueri3e00d in3s5t0ance40s0 45M0ATL500
(b) S-P
0.7
0.5</p>
    </sec>
    <sec id="sec-3">
      <title>Uncertainty Sampling Random Sampling QUIRE QBC</title>
      <p>DWUS
0 50 100 1N50umb20e0r of25q0ueri3e00d in3s5t0ance40s0 45M0ATL500
(a) P-S
0.72
0.7
0.68
) 0.66
%
(
cy0.64
ra
cu0.62
c
A 0.6
0.58
0.56</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Gong</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Discriminatively learning domain-invariant features for unsupervised domain adaptation</article-title>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Gretton</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smola</surname>
            ,
            <given-names>A.J.:</given-names>
          </string-name>
          <article-title>A kernel method for the two-sample-problem (</article-title>
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Settles</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Active learning literature survey</article-title>
          .
          <source>Tech. rep. (</source>
          <year>2010</year>
          ) 1 https://github.com/subbareddy248/Datasets/tree/master
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>