<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Ensemble Learning-based model for Classification of Insincere Question</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhongyuan Han</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiaming Gao</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Huilin Sun</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruifeng Liu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chengzhe Huang</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Leilei Kong</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haoliang Qi</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hanzhongyuan</institution>
          ,
          <addr-line>gaojiaming24,sunhuilin24,liuruifeng812,huangchengz</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Harbin Engineering University Harbin</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Heilongjiang Institute of Technology Harbin</institution>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the method for the Classification of Insincere Question(CIQ) in FIRE 2019. In this evaluation, we use an ensemble learning method to unite multiple classification models, including logistic regression model, support vector machine, Naive Bayes, decision tree, K-Nearest Neighbor, Random Forest. The result shows that our classification achieves the 67.32% accuracy rate(rank top 1) on the test dataset.</p>
      </abstract>
      <kwd-group>
        <kwd>Classification</kwd>
        <kwd>Insincere Question</kwd>
        <kwd>Ensemble Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>Models</title>
      <p>
        In this task, we use an ensemble learning method to solve the multiple classification
task. Ensemble Learning is a way to combine multiple learning algorithms for better
performance. The ensemble learning[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]includes two stages. Step 1: learn first-level
classifiers. We select several base classifiers, such as LR(logistic regression),
SVM(support vector, machine), NB(naive Bayes), DT(decision tree), KNN(k-nearest
neighbor), RF(random forest), and train the independent classifiers. Step 2: learn a
second-level meta-classifier. We use the output of the first-level classifiers as the new
features. Next, use the new features to train the second level meta-classifier. The model
structure is shown in Fig. 1.
Logistic Regression[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a binary classification model. The input of the model is the
K-dimensional eigenvector of the sample, the output of the model is the probability of
a positive or negative class. For a given dataset T={( 1,  1),( 2,  2),⋯(  ,   )}, where
  ∈   ,   ∈ (0, 1), the model hypothetical function that the function is shown below:
ℎ ( ) = P( = 1| ;  ) =  (   +  ) =
1+
1
(−   + )
(1)
where  represents the model parameter, ie the weight before each feature,  represents
the bias, and  represents the sigmoid function. In order to enable a multi-classification
indicator the softmax function is needed to improve the logistic regression[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
2.2
      </p>
      <sec id="sec-2-1">
        <title>Support Vector Machine</title>
        <p>
          The support vector machine[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] is a binary classification model. The basic model
definition is the linear classifier with the largest interval in the feature space. For a given
training dataset T={( 1,  1 ),( 2,  2 ),…(  ,   )},  ∈   ,   ∈ {−1, +1}, the learning
goal of SVM is to find a separate hyperplane in the feature space, divide the feature
space into two parts. We use the linear separable support vector machine[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The
classification decision function is:
 ( ) =
        </p>
        <p>
          ( ∗ ∙  +  ∗)
where  represents the input feature,  ∗ represents the model weight, and  ∗ represents
the bias. For the Multi-classification problem, the one-against-one[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] method can be
used to construct a binary classification boundary between i-class and j-class data, train
a binary SVM to solve the Multi-classification problem.
(2)
(3)
2.3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Naïve Bayes Classifier</title>
        <p>
          Naïve Bayes[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] is a classification method based on Bayes theorem and the independent
assumption of feature conditions. For a given set of training data, the joint probability
distribution  ( ,  ) in the dataset is learned. Naive Bayes made a conditional
independence hypothesis for the conditional probability distribution, specifically:
 (
=  | =   ) = ∏
 =1  ( ( ) =  ( )| =   )

for
Ρ(
2.4
        </p>
        <p>The joint probability distribution  ( ,  ) can be obtained. When using Naive Bayes
classification, for
the
input 
, the
posterior
probability
distribution
=  |</p>
        <p>=   ) is calculated by the classification model, and the class with the
largest posterior probability is output as the category of  .</p>
      </sec>
      <sec id="sec-2-3">
        <title>K-Nearest Neighbor</title>
        <p>
          K neighbors[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] are a basic classification and regression method. Given a training
dataset, for the new input sample, find the  samples closest to the sample in the training
dataset. Most of the k samples belong to which class, and these samples are classified
as this class. For a given training dataset T={( 1,  1),( 2,  2),…(  ,   )}, where   is
the feature vector of the sample,   ∈ { 1,  2, ⋯   } is a sample category The category,
 = 1, 2, ...,  ; the sample feature vector is  ; output y is the category to which the
sample belongs:
y =
        </p>
        <p>∑
    ∈  ( )
 (  =   ),  ,  = 1,2, ⋯ ,  ;
(4)</p>
        <p>Where  is the indicator function, ie  is 1 when   =   , otherwise,  is 0.</p>
      </sec>
      <sec id="sec-2-4">
        <title>Decision Tree</title>
        <p>
          The decision tree model[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] is a tree structure classification model. A decision tree
consists of nodes and directed edges. There are two types of nodes: internal nodes and leaf
nodes. Internal nodes represent a feature or attribute, and leaf nodes represent a class.
Using a decision tree classification model, start from the root node, test a feature of the
sample, and assign the sample to its children according to the test results. The sample
is tested and assigned recursively until the leaf node is reached. Finally, the sample is
assigned to the class of the leaf node.
2.6
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>Random Forest</title>
        <p>
          Random forests[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] use a random approach to synthesize many decision trees into a
forest, and each decision tree votes to determine the final category of the test sample at
the time of classification. First, the bootstrap method is used to generate m training sets,
then, for each training set, a decision tree is constructed. When the nodes find features
to split, a part of the features are randomly extracted from all features, then find the
optimal solution by the extracted features, applied to the nodes, split, and finally
achieve the effect of multi-classification.
3
3.1
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <sec id="sec-3-1">
        <title>Datasets</title>
        <p>The evaluation organizing provides an enhanced subset of Quora questions that contain
the fine-grained category labels previously defined the insincere question. The
evaluation dataset includes 898 training samples and 101 test data. Each sample contains qid
as an identifier, question text show the content of the question, and the target indicates
the category. The tag value is 0 to 5, representing 6 categories. The "0" tag indicates
Sincere questions, the "1" tag indicates rhetorical questions, the "2" tags indicate
hypothetical questions, the "3" tags indicate Hate speech questions, the "4" tags indicate
sexually explicit questions, and the "5" tags indicate other. Each tag occupies the
following Table 1.</p>
        <p>Quantity
488
216
98
38
38
20</p>
      </sec>
      <sec id="sec-3-2">
        <title>Experiments Setting</title>
        <p>
          Our evaluation method consists of three parts: data processing, model training, and
ensemble learning. In the data processing stage, we remove the stop words, remove
punctuation, stemming. In the model training stage, the text is converted into a tf-idf vector
using TfidVectorizer in scikit-learn [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], which is used as the training feature of logistic
regression, support vector machine and other models. Get the best performance from
the model by adjusting the hyperparameters. Finally, using the ensemble learning
method, which can further improve the prediction accuracy.
        </p>
        <p>During the data preprocessing stage, we removed the extra spaces in the text, used
the stop vocabulary provided by Stanford1 to remove the stop words, used the NLTK
toolkit2 for stemming operations, and removed the punctuation of the sentence. Next,
we used 898 training data to directly train the ensemble learning model and 101 test
data for testing.</p>
        <p>In the model training stage, we select logistic regression as the meta-classifier to
learn a second-level classifier. Before using ensemble learning, we need to set the
hyperparameter of each classifier. We use train data and validation data to training each
independent classifier, adjust the hyperparameters to achieve the best performance of
the independent classifier on the validation set.</p>
        <p>We use scikit-learn3 to perform feature extraction and model training. Use the
TfidfVectorizer tool provided by scikit-learn to convert the text data into TF-IDF feature
vector, using the logistic regression, support vector machine, naive Bayes, K-nearest
neighbor, decision trees, randomForest models provided by the scikit-learn toolkit for
training. Ensemble learning uses the brew toolkit4 for model fusion. Brew uses the
output of each classifier as a new feature value, uses a logistic regression model to learn
the weights of each classifier, then outputs the classification results. Brew uses the
liblinear library for parameter solving, internally using the coordinate descent
optimization combined with L2 regularization to iteratively optimize the loss function. Model
parameter settings as shown in the following Table 2.
1 https://github.com/stanfordnlp
2 http://www.nltk.org/
3 https://scikit-learn.org/
4 https://pypi.org/project/brew/0.1.3/
alpha=0.01
n_neighbors=10
max_depth=3,min_samples_leaf=1,criterion=gini
n_estimators=10,max_depth=3,criterion=gini
layer1=[LR,SVM,NB,KNN,DT,RF],layer2=[LR]
The final ranking of this evaluation task is shown in Table 3.</p>
        <p>It can be seen from the table that ensemble learning has achieved the best
performance and the performance of the decision tree in the performance of a single classifier
is the best. Different classifiers can learn different data features, and ensemble learning
can integrate the features learned by each classification and the advantages of each
classifier. In addition, through experiments, we found that the performance of the
logistic regression and support vector machines is stable, and the classification
performance is not obviously different. Decision trees and random forests are sensitive to
train and test data. For the ensemble learning model, the performance is improved by
combining the advantages of each classifier. It is more stable and does not cause
performance degradation due to data replacement. And it will not have large performance
fluctuations. But at the same time, we also see that the serious imbalance of data has a
large impact on the individual classifier, which in turn affects the performance of the
ensemble model. How to reduce the influential impact of data imbalance and generate
more reasonable feature space is the direction of our next research.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>In this evaluation, we used an ensemble learning approach that incorporates multiple
text classification models to solve the Insincere Question Classification. Using TF-IDF
as a model input feature, the scikit-learn toolkit was used to train logistic regression,
support vector machines, naive Bayes and other models, and the models were merged
using the brew toolkit. In the process of participating in this evaluation, we also tried
Embedding as the feature input. Other machine learning models and deep learning
models have also been tested, but the experimental results are not very satisfactory. Finally,
TF-IDF features combined with ensemble learning were selected. The method is the
final submission result.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This research was supported by the National Social Science Fund of China
(No.18BYY125).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Smyth</surname>
            , Padhraic, and
            <given-names>David</given-names>
          </string-name>
          <string-name>
            <surname>Wolpert</surname>
          </string-name>
          .
          <article-title>"Linearly combining density estimators via stacking</article-title>
          .
          <source>" Machine Learning 36.1-2</source>
          (
          <year>1999</year>
          ):
          <fpage>59</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cox</surname>
          </string-name>
          ,
          <string-name>
            <surname>David R</surname>
          </string-name>
          .
          <article-title>"The regression analysis of binary sequences</article-title>
          .
          <source>" Journal of the Royal Statistical Society: Series B (Methodological) 20.2</source>
          (
          <year>1958</year>
          ):
          <fpage>215</fpage>
          -
          <lpage>232</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mingyang</surname>
          </string-name>
          , et al.
          <article-title>"Text classification based on deep belief network and softmax regression</article-title>
          .
          <source>" Neural Computing and Applications</source>
          <volume>29</volume>
          .1 (
          <year>2018</year>
          ):
          <fpage>61</fpage>
          -
          <lpage>70</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cortes</surname>
            , Corinna, and
            <given-names>Vladimir</given-names>
          </string-name>
          <string-name>
            <surname>Vapnik</surname>
          </string-name>
          .
          <article-title>"Support-vector networks</article-title>
          .
          <source>" Machine learning 20.3</source>
          (
          <year>1995</year>
          ):
          <fpage>273</fpage>
          -
          <lpage>297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Scholkopf</surname>
            , Bernhard, and
            <given-names>Alexander J.</given-names>
          </string-name>
          <string-name>
            <surname>Smola</surname>
          </string-name>
          .
          <article-title>Learning with kernels: support vector machines, regularization, optimization, and beyond</article-title>
          . MIT Press,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hsu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Chih-Wei</surname>
          </string-name>
          , and
          <string-name>
            <surname>Chih-Jen Lin</surname>
          </string-name>
          .
          <article-title>"A comparison of methods for multiclass support vector machines</article-title>
          .
          <source>" IEEE transactions on Neural Networks 13.2</source>
          (
          <year>2002</year>
          ):
          <fpage>415</fpage>
          -
          <lpage>425</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ng</surname>
            , Andrew Y., and
            <given-names>Michael I. Jordan.</given-names>
          </string-name>
          <article-title>"On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes."</article-title>
          <source>Advances in neural information processing systems</source>
          .
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Weinberger</surname>
            ,
            <given-names>Kilian Q.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lawrence</surname>
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Saul</surname>
          </string-name>
          .
          <article-title>"Distance metric learning for large margin nearest neighbor classification</article-title>
          .
          <source>" Journal of Machine Learning Research</source>
          <volume>10</volume>
          .
          <string-name>
            <surname>Feb</surname>
          </string-name>
          (
          <year>2009</year>
          ):
          <fpage>207</fpage>
          -
          <lpage>244</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Quinlan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Ross</surname>
          </string-name>
          .
          <article-title>"Induction of decision trees</article-title>
          .
          <source>" Machine learning 1.1</source>
          (
          <year>1986</year>
          ):
          <fpage>81</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Liaw</surname>
            , Andy, and
            <given-names>Matthew</given-names>
          </string-name>
          <string-name>
            <surname>Wiener</surname>
          </string-name>
          .
          <article-title>"Classification and regression by randomForest."</article-title>
          <source>R news 2.3</source>
          (
          <year>2002</year>
          ):
          <fpage>18</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <surname>Fabian</surname>
          </string-name>
          , et al.
          <article-title>"Scikit-learn: Machine learning in</article-title>
          <source>Python." Journal of machine learning research 12</source>
          .
          <string-name>
            <surname>Oct</surname>
          </string-name>
          (
          <year>2011</year>
          ):
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>