<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Active Feature Acquisition for Opinion Stream Classification under Drift</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Otto-von-Guericke-University Magdeburg</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany ranjiths</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>@gmail.com</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>christian.beyer</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>vishnu.unnikrishnan</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>myra}@ovgu.de</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Leibniz University Hannover</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1828</year>
      </pub-date>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Active stream learning is frequently used to acquire labels for instances and less frequently to determine which features should be considered as the stream evolves. We introduce a framework for active feature selection, intended to adapt the feature space of a polarity learner over a stream of opinionated documents. We report on the first results of our framework on substreams of reviews on different product categories.</p>
      </abstract>
      <kwd-group>
        <kwd>Active Feature Acquisition</kwd>
        <kwd>Opinion Stream Classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Opinion stream classification algorithms assign a polarity label to each arriving
opinionated document. The feature space over the stream may change though,
e.g. when new product appear and the words/phrasing used by customers who
reviewed them changes. Feature space adaption can benefit from an active
learning approach, where a human expert specifies the features of importance.</p>
      <p>
        Contardo et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] use reinforcement learning to acquire features, and also
consider feature acquisition cost. Huang et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] take uncertainty into account.
The “sequential feature acquisition framework” of Shim et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] acquires one
feature at a time until the desired model confidence is achieved. These approaches
are for static data, though, which are processed in their entirety to build the
model. In the stream context, Barddal et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] survey methods that detect
feature drift and select features for learning, under the assumption that all features
are known in advance. We do not make this assumption. Rather, whenever drift
is detected, we use words from recent documents and rebuild the feature space.
      </p>
      <p>We propose a framework for active feature selection on a stream. It consists
of: an active learner of features (ALF) that ranks features on importance; a
recommender (RALF) that invokes ALF and then recommends a feature subspace
to be replaced with the new features; a drift monitor that invokes RALF when
model quality decreases. In the next section we present our framework. Section
3 contains our first results. Section 4 concludes our study.</p>
      <p>c 2019 for this paper by its authors. Use permitted under CC BY 4.0.
A2ctive FeRa.tSuhrievaAkcuqmuairsatsiwonamfoyreOtapli.nion Stream Classification under Drift
2</p>
    </sec>
    <sec id="sec-2">
      <title>Workflow Over the Document Stream</title>
      <p>Our framework slides a window W of n epochs (here: weeks) over the stream,
learning on n epochs and testing on the epoch n + 1.</p>
      <p>
        Module ALF for Feature Ranking: Our active feature selector ALF ranks features
on importance. Feature ranking methods include mutual information,
information gain, document frequency thresholding, chi-square and document frequency
thresholding (DFT) as discussed by Basu et al [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Distinguishing Feature
Selector (DFS), Odds Ratio and Normalized Difference Measure (NDM) as studied
in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], Gini-index, signed chi-square and signed information gain [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the
stratified feature ranking method of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and the approach proposed by [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. We opted
for the Distinguishing Feature Selector (ALF-DFS) and the Gini (ALF-Gini)
because they were found to have the most competitive performance [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
Module RALF for Feature Subspace Recommendation: The recommender takes
as input the size M of the subspace to be replaced and invokes ALF for feature
ranking. Currently we use M = F eatureSpaceSize . We have four variants of RALF:
2
– Baseline: invokes ALF-Gini on the data inside the current window
– Oracle-Random: picks randomly M features from the feature space of the
next epoch (the epoch n + 1, i.e. the first epoch in the future)
– Oracle-Gini: invokes ALF-Gini on epoch n+1 and returns the top-M features
– Oracle-DFS: similar to ALF-Gini, but invokes ALF-DFS on epoch n + 1
Hence, the Oracle variants simulate an expert who knows which features will
become important in the immediate future. We use the top-M of these features
to replace the least important ones of the current feature space, thus preserving
the presently informative features still.
      </p>
      <p>
        Stream Classification Core: The opinion stream learner replaces the least
informative features (according to ALF’s ranking) with the features suggested by
RALF. It re-learns on the current window and uses the next epoch for testing.
Then, the window shifts by one epoch, forgetting the least recent one.
Drift-driven Feature Space Update: Drift monitor that invokes RALF if and only
if drift occurs. For drift detection we use the method of Gama et al [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Experiments and Results</title>
      <p>
        We compared the RALF variants to a default model that does not change the
feature space. We performed prequential evaluation, aggregated the SGD log
loss values every two months. We used Friedman test with Iman-Davenport
modification, rejecting the H0 for p-values ≤ 0.01, and then applied Nemenyi
post-hoc test. All experiments and results are in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
ActivAecFteivaetuFreeatAurceqAuicsqtuioisnitifoonr fOorpOinpiionnionStSrteraemamCClalassssiifificcaattiioonnuunnddererDDrifrtift
Data Setup: We use the “clothing, shoes and jewelry” reviews (substream C),
“health and personal care” (substream H) and “sports and outdoors” (S) from
the Amazon data set of [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] (http://jmcauley.ucsd.edu/data/amazon/), from
01/2011 to 01/2013. There were very few reviews before 2011 and a steep
increase of positive ones from 2013 on: this product-independent drift calls for
conventional classifier adaption, which is beyond our scope. We map ratings 1
and 2 to “Negative”, 4 and 5 to “Positive”, and 3 to “Neutral”.
Feature Drift Imputation: We start and stop the substream of each product
category at specific time points (see Fig. 1). Hence, product-specific words appear
only at given time intervals. We slide a window of 5 weeks in one-week steps over
this stream. We build an initial model from the first three weeks, i.e. only from
substream C. The first drift occurs when substream H starts.
Setup of the Components: As classification core we use Stochastic Gradient
Descent (SGD) of scikit-learn (alpha = 0.001, l2 penalty and hinge loss). For text
preparation, we use the components of [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. We build the feature space using
bagof-words (“words”: 3-grams) and TFIDF, and invoke the dictionary vectorizer of
scikit-learn. We vary the feature space size Mfull = 500, 1000, 5000, 10000, 15000,
so RALF replaces the M = Mfull/2 least important features.
      </p>
      <p>Results: The default model always had inferior performance. Hence updating the
feature space is beneficial as response to drift caused through the introduction
of new products.Oracle-DFS performed best. Oracle-Gini was within the critical
distance to it. Oracle-Random improved as the feature space size increased.</p>
      <p>The Baseline, which uses ALF-Gini without benefiting from an Oracle, is
comparable to Oracle-Gini and Oracle-Random, It is better than the default
model except for Mfull = 500 (where it is within the critical distance from the
default model). Hence, ALF-Gini can improve model performance by replacing
the least informative features in the current window, when feature drift occurs.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>We presented an active feature selection framework for a stream of opinionated
documents. Upon drift detection, our framework re-ranks the features with help
of the Oracle and replaces the least informative old features with the most
informative new ones. We evaluated our framework by simulating topic drift. We
A4ctive FeRa.tSuhrievaAkcuqmuairsatsiwonamfoyreOtapli.nion Stream Classification under Drift
found that replacing a feature subspace in the presence of drift is beneficial,
even if there is no Oracle. We next plan to vary the size and position of the
feature subspace to be replaced. Replacing the currently most informative features
instead of the least informative ones might be better under concept shift.
Acknowledgement
This work is partially funded by the German Research Foundation, project
OSCAR ”Opinion Stream Classification with Ensembles and Active Learners”. We
further thank Elson Serrao who made the basic components of opinion stream
mining available under https://github.com/elrasp/osm.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Asim</surname>
            ,
            <given-names>M.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wasim</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ali</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rehman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Comparison of feature selection methods in text classification on highly skewed datasets</article-title>
          .
          <source>In: 1st Int. Conf. on Latest Trends in Electrical Engineering and Computing Technologies (INTELLECT)</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . IEEE (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Barddal</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomes</surname>
            ,
            <given-names>H.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Enembreck</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfahringer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A survey on feature drift adaptation: Definition, benchmark, challenges and future directions</article-title>
          .
          <source>Journal of Systems and Software</source>
          <volume>127</volume>
          ,
          <fpage>278</fpage>
          -
          <lpage>294</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Basu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murthy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Effective text classification by a supervised feature selection approach</article-title>
          .
          <source>In: 12th IEEE ICDM, Workshops</source>
          Volume. pp.
          <fpage>918</fpage>
          -
          <lpage>925</lpage>
          . IEEE (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>Supervised feature selection with a stratified feature weighting method</article-title>
          .
          <source>IEEE Access 6</source>
          ,
          <fpage>15087</fpage>
          -
          <lpage>15098</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Contardo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , Arti`eres, T.:
          <article-title>Sequential cost-sensitive feature acquisition</article-title>
          .
          <source>In: Int. Symp. on Intelligent Data Analysis</source>
          . pp.
          <fpage>284</fpage>
          -
          <lpage>294</lpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Fattah</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>A novel statistical feature selection approach for text categorization</article-title>
          .
          <source>Journal of Information Processing Systems</source>
          <volume>13</volume>
          (
          <issue>5</issue>
          ) (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gama</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Medas</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castillo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodrigues</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Learning with drift detection</article-title>
          .
          <source>In: Brazilian Symposium on Artificial Intelligence</source>
          . pp.
          <fpage>286</fpage>
          -
          <lpage>295</lpage>
          . Springer (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sugiyama</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Niu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Active feature acquisition with supervised matrix completion</article-title>
          .
          <source>In: 24th ACM SIGKDD Int. Conf. on Knowledge Discovery &amp; Data Mining</source>
          . pp.
          <fpage>1571</fpage>
          -
          <lpage>1579</lpage>
          . ACM (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>McAuley</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Targett</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          , Van Den Hengel, A.:
          <article-title>Image-based recommendations on styles and substitutes</article-title>
          .
          <source>In: 38th Int ACM SIGIR Conf on Research and Development in Information Retrieval</source>
          . pp.
          <fpage>43</fpage>
          -
          <lpage>52</lpage>
          . ACM (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ogura</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amano</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kondo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Comparison of metrics for feature selection in imbalanced text classification</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>38</volume>
          (
          <issue>5</issue>
          ),
          <fpage>4978</fpage>
          -
          <lpage>4989</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Serrao</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spiliopoulou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Active stream learning with an oracle of unknown availability for sentiment prediction</article-title>
          .
          <source>In: IAL@ECML PKDD</source>
          . pp.
          <fpage>36</fpage>
          -
          <lpage>47</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Shim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hwang</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Joint active feature acquisition and classification with variable-size set encoding</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          . pp.
          <fpage>1368</fpage>
          -
          <lpage>1378</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Shivakumaraswamy</surname>
          </string-name>
          , R.:
          <article-title>Active learning over text streams</article-title>
          .
          <source>Tech. rep.</source>
          ,
          <string-name>
            <surname>Otto-</surname>
          </string-name>
          vonGuericke-University Magdeburg Department of Computer Science (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Uysal</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          :
          <article-title>An improved global feature selection scheme for text classification</article-title>
          .
          <source>Expert systems with Applications</source>
          <volume>43</volume>
          ,
          <fpage>82</fpage>
          -
          <lpage>92</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>