<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Studo Jobs: Enriching Data With Predicted Job Labels</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Markus Reiter-Haas</string-name>
          <email>markus.reiter-haas@studo.co</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentin Slawicek</string-name>
          <email>valentin.slawicek@studo.co</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emanuel Lacic</string-name>
          <email>elacic@know-center.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Know-Center</institution>
          ,
          <addr-line>Graz</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Moshbit GmbH</institution>
          ,
          <addr-line>Graz</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present the Studo Jobs platform in which we tackle the problem of automatically assigning labels to new job advertisements. For that purpose we perform an exhaustive comparison study of state-of-the-art classifiers to be used for label prediction in the job domain. Our findings suggest that in most cases an SVM based approach using stochastic gradient descent performs best on the textual content of job advertisements in terms of Accuracy, F1-measure and AUC. Consequently, we plan to use the best performing classifier for each label which is relevant to the Studo Jobs platform in order to automatically enrich the job advertisement data. We believe that our work is of interest for both researchers and practitioners in the area of automatic labeling and enriching text-based data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        The nature of the job market is a highly competitive one and finding a
new job is not an easy decision as it usually depends on many factors
like salary, job description or geographical location. This has led to
the recent rise of business-oriented social networks like LinkedIn1 or
XING2. Users of such networks organize and look after their profile
by describing their skills, interests and previous work experiences.
But finding relevant jobs for users using such carefully structured
content is actually a non-trivial task to perform [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Tackling the
same problem gets even more difficult for university students, as
they normally have only some or no relevant work experience at all.
This has become a real issue for students as they get more aware that
having a degree does not automatically guarantee them their desired
job after graduation. For instance, the recent study of [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] reports that
one third of graduates in the U.S. were employed in positions that
do not require a university degree. Moreover, the authors report that
23.5% of employed graduates in 2013 are not only underemployed
but also work in positions with a below-than-average salary.
      </p>
      <p>In 2016 we launched the Studo3 mobile application with the initial
aim to provide constant guidance and support to Austrian students
in their everyday life. As seen in Figure 1a, Studo integrates several
university-relevant services (e.g. course management, mail, calendar)
but also enriches the student’s daily life by providing relevant news
articles. With such a feature-set a student is not only better informed
but also encouraged to connect and collaborate with other peers
from the same community. Moreover, the ever increasing popularity
of the application at Austrian universities4 has shown that students
1http://linkedin.com
2http://xing.com
3https://studo.co/
4As of June 2017 the 30,000 monthly active users have on average 100 applications
starts per month
(a)
(b)
clearly need additional guidance throughout their studies. Thus, one
of the main goals of Studo is to better prepare the students for the
job market they need to face after graduation.</p>
      <p>
        Current work. In this paper, we present our work-in-progress of
the newest extension to Studo - the Studo Jobs platform. As seen in
Figure 1b, students can browse related job advertisements in order
to gather relevant working experience even before they graduate. In
our case, these job advertisements typically describe the candidate’s
job role, required skills, the expected educational background as
well as the company description, but only in an unstructured
freetext format. In the context of job recommendations, having such
unstructured data can be problematic as students already struggle
with having little job experience [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. To overcome this limitation,
in this work we focus on enriching the job advertisement data by
automatically generating and assigning labels (i.e., categories) to
which an advertised job belongs. The benefits of such data
enrichment are twofold. First, students can more easily navigate through
the available job offers (e.g., by filtering out irrelevant categories).
Second, by correctly enriching the job advertisement data we hope
to increase the performance of future job recommendations (e.g., by
performing clustering like in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). As such, we perform an extensive
algorithmic comparison study on how to predict suitable labels for a
particular job advertisement. We believe that our findings can
support both developers and researchers on how to enrich their data and
potentially improve the recommendation performance.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        Most work which is related to assigning labels to job advertisements
come from the research on multi-label classification, an emerging
machine learning paradigm which tries to assign a set of labels to
a document [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. In an extensive literature review [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], previous
multi-label learning has been divided into two main groups, i.e.,
algorithm adaptation and problem transformation methods.
Algorithm adaptation methods adapt popular learning techniques to deal
with multi-label data directly, while problem transformation
methods transform the multi-label classification problem into either one
or more single-label classification problems. In our work we build
upon the later, i.e., explore on how to construct several single-label
classifiers in order to assign relevant labels to job advertisements. We
base our decision as other work have shown that binary relevance is
a suitably method to tackle the problem of multi-label classification
(e.g., [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]). Moreover, the author of [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] have shown that task of
predicting the job sector (i.e., category or label as in our case) can
be done more accurately than its title or education requirement.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>METHODOLOGY</title>
      <p>
        Similar to the work of [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], we train multiple binary classifier on the
basis of features contained in the text (i.e. terms). Thus, for each label
l we define a parameter vector for class c: θc = {θc1,θc1, ...,θcn },
where n is the size of the vocabulary in the corresponding training
set. The values of vector θc are the calculated TF-IDF values as
denoted by Equation 1 and 2, where TF(t,j) is the term count within
the job advertisement and DF(t) is the number of job advertisements
in which that particular term occurs.
      </p>
      <p>TF-IDF(t,j) = TF(t,j) × IDF(t)
(1)
1 + n
IDF(t ) = loд + 1 (2)</p>
      <p>1 + DF(t )</p>
      <p>
        For our comparison study we performed experiments on different
job labels using several classification algorithms. As a baseline,
we first explored three well-known algorithms from the literature.
Specifically, we looked into: (1) the Naive Bayes algorithm which
assumes pairwise independence of the input features [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], (2) a
Classification And Regression Tree (CART), where at each node
one input is tested and depending on the results the left or right
subbranch is traversed [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and, (3) a Random Forest ensemble approach,
where each tree votes for a particular class [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Next we experimented with AdaBoost, a boosting algorithm
which does adaptive weight adjusting of incorrectly classified
instances [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Another approach was a linear model using Logistic
Regression which assumes that the posterior probability of a class is
equal to a logistic sigmoid function acting on a linear function [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Support Vector Machines (SVM) is another algorithm that has been
shown to perform well with text classification. As such, we used
two SVM methods: (1) a Linear SVM which tries to fit a hyperplane
Listing 1: Example of two crawled job advertisements (text was
shortened for readability) in JSON format. In our experiments
we only used the text of a given job in order to predict the best
suited labels.
with the maximum soft margin, thus allowing for a minimal number
of errors [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and, (2) a SVM-SGD approach, where the stochastic
gradient descent optimization method is applied on the SVM [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
      </p>
      <p>
        Finally, we experimented with three different neural network
approaches. We first trained a Multilayer Perceptron (MLP) which
consists of an input layer that is provided with the input vector θc ,
followed by one hidden layer with a size of 1024 units and two
smaller hidden layers with 128 units each. Each of the hidden layers
is followed by a batch normalization layer. The next two models
are based on the work of [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. As such, we used a convolutional
neural network (CNN) with an embedding layer of 200 units. The
embedding layer is followed by a 1-dimensional convolutional layer
with 128 filters and a kernel size of 3. This was followed by a
global max pooling layer and a dense layer with 128 units. The
third model was a multichannel CNN (M-CNN) that connects the
embedding layer with three convolutional layers in parallel, each
one having 128 filters and a kernel size of 3, 4 and 5 respectively.
Every convolutional layer is followed by a max pooling layer which
outputs are then afterwards merged together. In all three networks we
used rectified linear units as the activation function (i.e., Equation 3).
The output layer has always two units and uses a standard softmax
activation function (i.e., Equation 4). Each of the hidden layers uses
dropout with a rate of 0.2 for regularization. The models also use the
Adam optimizer [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] with a learning rate of 0.001. It also needs to
be noted that the last two CNN approaches do not utilize a TD-IDF
based input vector as the rest. The input is generated by transforming
the textual content of a job advertisement into a sequence of word
indices. Having a maximum sequence length of 1, 000, shorter texts
were just padded with a default zero index.
      </p>
      <p>f (x ) = x + = max(0,x )
σ (z)j =</p>
      <p>ezj
PK
k=1 ezk
(3)
(4)</p>
    </sec>
    <sec id="sec-4">
      <title>EXPERIMENTAL SETUP</title>
      <p>In order to perform a comparative study we first constructed a
training and test set by crawling job advertisement data from leading</p>
      <p>Austrian job platforms, i.e., stepstone.at, karriere.at and monster.at.
We utilized an incremental crawler which was given a manually
constructed list of URLs for each job type. If a particular job
advertisement was not in the database a new entry was added, otherwise
it was enriched with a new label. The crawler then iterated over a
ifnite number of pages. As seen in Listing 1, an extracted job entry
consists of an id, a title and description in plain text and a list of
labels which denote the type of the job.</p>
      <p>Dataset. We crawled 5, 602 job advertisements in total. On
average, a posted job had 1.05 labels assigned to it. In our experiments
we focused on four different labels which are mainly used in the
Studo Job platform, namely: Software, Catering, Technology and
Business. As the Studo-specific label Business could not be directly
crawled, we derived it by combining job advertisements from the
type Management and Marketing.</p>
      <p>
        Evaluation. As the crawled dataset is clearly imbalanced (e.g., as
seen in Figure 2 the Software label dominates), we further
constructed subdatasets for each label to experiment on. Thus, for each
label a subset was used containing all the jobs containing that
particular label as well as a random sample of the same size containing
other labels. Therefore the resulting subdataset had 50% of job
advertisments containing the evaluating label and 50% without it.
The evaluation was performed using Scikit-learn [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] for the Naive
Bayes, CART, Random Forest, AdaBoost, Logistic Regression,
Linear SVM and the SVM-SGD approach. For the neural networks we
utilized Keras [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. These models were trained and evaluated using a
10-fold stratified cross-validation on each subdataset respectively. In
order to finally quantify the prediction performance, we used a set
of well-known information retrieval metrics. In particular, we report
the prediction accuracy by means of Accuracy, the F1-measure and
the Area Under the ROC curve (AUC) [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>RESULTS</title>
      <p>
        The overall results of the algorithmic comparison can be seen in
Table 1. As each subdataset had a different size of the test set (i.e.,
due to the imbalanced nature of the original dataset), for each
approach we report a weighted average in terms of Accuracy.
F1measure and AUC. For example, the F1-measure would be
calculated as: F1 = α F1 (Business ) + β F1 (T echnoloдy) + γ F1 (Caterinд)
+ δ F1 (So f tware ), where the weight values contain the percentage
of the corresponding label in the dataset (i.e., α = 0.237, β = 0.197,
γ = 0.068 and δ = 0.497). In general, we found strong accuracy
performance in all models (e.g., the worst performing Random Forest
did have an Accuracy of 0.8017 and F1 of 0.7932). The best
performing approaches were the SVM based ones, where the linear
approach had the best AUC and the one using stochastic gradient
descent had Accuracy and F1. Interestingly enough, due to the recent
popularization of deep learning approaches our first assumption was
that the models based on CNN would perform much better than the
SVM based ones. Although still competitive, we assume that a lower
accuracy performance was reached because the hyperparameters
were not previously tuned enough and early stopping was not used
in order to cope with overfitting. As such, we hypothesize that there
is still much to gain from such approaches by learning these
parameters beforehand (e.g., using a nested cross-validation like in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]) and
incorporating a validation set to stop the model training at the most
optimal time.
      </p>
      <p>The individual label results in terms of the F1-measure can be seen
in Figure 3. We show only the F1-measure due to space restrictions
and the fact that the Accuracy values, when compared to, are almost
identical. The best performance was achieved on the Software label
using the SVM-SGD approach with an Accuracy of 0.9193 and F1 of
0.9196. A contributing factor to such performance is possibly the size
of the training set which was by far the largest for the Software label.
An interesting finding is that all of the approaches that were utilized
on the Catering label, which had the least training data, performed
much better than on the Business and Technology label. Looking at
the data, we think that the reason for such a performance difference
lies in the broader definition of these label terms. Moreover, the
reason that the Bussiness label had the worst performance could lie
in the fact that its data come from a combination of the crawled
Management and Marketing labels. It should also be noted that the
MLP approach outperformed all others when applied to the Catering
label. This suggests that when the right hyperparameters are picked,
an increase in performance could still be gained.
tion algorithms in terms of the F1-measure for all labels relevant
to the Studo Jobs platform.</p>
      <p>Overall, the SVM-SGD performed best. However, the MLP
approach outperformed others for the Catering label and the much
simpler Naive Bayes had almost the same performance as
SVMSGD for the Technology label. This suggests that a diversified model
combination could lead to even better performance.
6</p>
    </sec>
    <sec id="sec-6">
      <title>CONCLUSION</title>
      <p>In this work we presented the Studo Job platforms and showed how
we plan to tackle the problem of automatically assigning labels to
new job advertisements. For that purpose we performed an extensive
comparative study between several state-of-the art text-classification
algorithms. Our findings suggest the by utilizing an SVM approach
using stochastic gradient descent we can achieve the best
performance in terms of Accuracy, F1-measure and AUC. However, our
results revealed that deep learning approaches can also improve the
prediction performance, especially with a right hyperparameter setup.
As such, for our Studo Jobs platform we will use a combination of
those binary classifiers which showed the best performance results.
Limitation and Future Work. As already mentioned, one
limitation of our work is that we did not extensively explore the impact of
choosing the right hyperparameters for the deep learning approaches.
Therefore, we plan to extend the study by finding the optimal
hyperparameters for each label that is relevant to the Studo Jobs platform
(e.g., by setting up a nested cross-validation). In addition, we also
plan to extend our comparison study by including other features
besides the textual terms and incorporating methods that adapt
algorithms to directly perform multi-label classification. Building on the
data enrichment of job advertisements, we further plan to integrate
the generated labels and assess their impact on perceived usefulness
and navigability to users in a live setting (e.g., by letting users define
and store filters to narrow down the search for relevant jobs). Finally,
we plan to extend the Studo Job platform with personalized
recommendations which leverage the automatically generated job labels.
We not only want to investigate which approaches (e.g.,
contentbased, collaborative filtering, etc.) benefit the most from such data,
but also on how to incorporate recent label filters as additional
timedependent contextual cues in order to predict the current job interest.
For this we also plan to investigate the recently popularized deep
learning approaches (e.g., recurrent neural networks) to see if we
can predict the future shift in interest of a job type.</p>
      <p>Acknowledgments. This work is supported by the Know-Center
and ISDS Institute from Graz University of Technology. The authors
would also like to thank the AVL company, especially Dr. Markus
Tomaschitz, for the support at setting up this research project and
giving insights about the job market.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Abel</surname>
          </string-name>
          .
          <article-title>We know where you should work next summer: job recommendations</article-title>
          .
          <source>In Proceedings of the 9th ACM Conference on Recommender Systems</source>
          , pages
          <fpage>230</fpage>
          -
          <lpage>230</lpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Berk</surname>
          </string-name>
          .
          <article-title>Classification and Regression Trees</article-title>
          .
          <article-title>Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery</article-title>
          ,
          <string-name>
            <surname>Use</surname>
            <given-names>R</given-names>
          </string-name>
          , (November):
          <fpage>36</fpage>
          -
          <lpage>350</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Bishop</surname>
          </string-name>
          .
          <source>Pattern Recognition and Machine Learning</source>
          , volume
          <volume>53</volume>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          .
          <article-title>Random forests</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>45</volume>
          (
          <issue>1</issue>
          ):
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Chollet</surname>
          </string-name>
          and Others. https://github.com/fchollet/keras.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Cortes</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Vapnik</surname>
          </string-name>
          .
          <article-title>Support Vector Networks</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ):
          <fpage>273</fpage>
          -
          <lpage>297</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Freund</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Schapire</surname>
          </string-name>
          .
          <article-title>A desicion-theoretic generalization of on-line learning and an application to boosting</article-title>
          .
          <volume>139</volume>
          :
          <fpage>23</fpage>
          -
          <lpage>37</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>I.</given-names>
            <surname>Guyon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R. S. A.</given-names>
            <surname>Alamdari</surname>
          </string-name>
          , G. Dror, and
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Buhmann</surname>
          </string-name>
          .
          <article-title>Performance prediction challenge</article-title>
          .
          <source>In Neural Networks</source>
          ,
          <year>2006</year>
          . IJCNN'06. International Joint Conference on, pages
          <fpage>1649</fpage>
          -
          <lpage>1656</lpage>
          . IEEE,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>W.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Shi</surname>
          </string-name>
          .
          <article-title>A job recommender system based on user clustering</article-title>
          .
          <source>Journal of Computers</source>
          ,
          <volume>8</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1960</fpage>
          -
          <lpage>1967</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmitt</surname>
          </string-name>
          , et al.
          <article-title>A college degree is no guarantee</article-title>
          .
          <source>Technical report, Center for Economic and Policy Research (CEPR)</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kim</surname>
          </string-name>
          .
          <article-title>Convolutional neural networks for sentence classification</article-title>
          .
          <source>arXiv preprint arXiv:1408.5882</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          .
          <article-title>Adam: A Method for Stochastic Optimization</article-title>
          . pages
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ouyang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Rong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xiong</surname>
          </string-name>
          . Computational Science and
          <string-name>
            <surname>Its</surname>
            <given-names>Applications - ICCSA</given-names>
          </string-name>
          <year>2016</year>
          .
          <volume>9788</volume>
          :
          <fpage>453</fpage>
          -
          <lpage>467</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <article-title>Scikit-learn: Machine Learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>12</volume>
          :
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Powers</surname>
          </string-name>
          .
          <article-title>Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation</article-title>
          .
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Read</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pfahringer</surname>
          </string-name>
          , G. Holmes, and
          <string-name>
            <given-names>E.</given-names>
            <surname>Frank</surname>
          </string-name>
          .
          <article-title>Classifier chains for multi-label classification</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>85</volume>
          (
          <issue>3</issue>
          ):
          <fpage>333</fpage>
          -
          <lpage>359</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Tsoumakas</surname>
          </string-name>
          and
          <string-name>
            <given-names>I.</given-names>
            <surname>Katakis</surname>
          </string-name>
          <article-title>. Multi-Label Classification : An Overview</article-title>
          .
          <source>Int J Data Warehousing and Mining</source>
          ,
          <year>2007</year>
          :
          <fpage>1</fpage>
          --
          <lpage>13</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zavrel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Berck</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Lavrijssen</surname>
          </string-name>
          .
          <article-title>Information Extraction by Text Classiifcation: Corpus Mining for Features</article-title>
          .
          <source>Proceedings of the Second International Conference on Language Resources and Evaluation LREC00</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>H. Zhang.</surname>
          </string-name>
          <article-title>The Optimality of Naive Bayes</article-title>
          .
          <source>Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference FLAIRS</source>
          <year>2004</year>
          ,
          <volume>1</volume>
          (
          <issue>2</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>M. L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Z. H.</given-names>
            <surname>Zhou</surname>
          </string-name>
          .
          <article-title>A review on multi-label learning algorithms</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <volume>26</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1819</fpage>
          -
          <lpage>1837</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <article-title>Solving large scale linear prediction problems using stochastic gradient descent algorithms</article-title>
          .
          <source>Proceedings of the twenty-first international conference on Machine learning</source>
          ,
          <volume>6</volume>
          :
          <fpage>116</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>