<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Design of Negative Sampling Strategies for Distantly Supervised Skill Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jens-Joris Decorte</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeroen Van Hautte</string-name>
          <email>jeroen@techwolf.ai</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Deleu</string-name>
          <email>johannes.deleu@ugent.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chris Develder</string-name>
          <email>chris.develder@ugent.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Demeester</string-name>
          <email>thomas.demeester@ugent.be</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>( Jeroen Van Hautte)</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ghent University - imec</institution>
          ,
          <addr-line>9052 Gent</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TechWolf</institution>
          ,
          <addr-line>9000 Gent</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <fpage>18</fpage>
      <lpage>23</lpage>
      <abstract>
        <p>Skills play a central role in the job market and many human resources (HR) processes. In the wake of other digital experiences, today's online job market has candidates expecting to see the right opportunities based on their skill set. Similarly, enterprises increasingly need to use data to guarantee that the skills within their workforce remain future-proof. However, structured information about skills is often missing, and processes building on self- or manager-assessment have shown to struggle with issues around adoption, completeness, and freshness of the resulting data. These challenges can be tackled using automated techniques for skill extraction. Extracting skills is a highly challenging task, given the many thousands of possible skill labels mentioned either explicitly or merely described implicitly and the lack of finely annotated training corpora. Previous work on skill extraction overly simplifies the task to an explicit entity detection task or builds on manually annotated training data that would be infeasible if applied to a complete vocabulary of skills. We propose an end-to-end system for skill extraction, based on distant supervision through literal matching. We propose and evaluate several negative sampling strategies, tuned on a small validation dataset, to improve the generalization of skill extraction towards implicitly mentioned skills, despite the lack of such implicit skills in the distantly supervised data. We observe that using the ESCO taxonomy to select negative examples from related skills yields the biggest improvements, and combining three diferent strategies in one model further increases the performance, up to 8 percentage points in RP@5. We introduce a manually annotated evaluation benchmark for skill extraction based on the ESCO taxonomy, on which we validate our models. We release the benchmark dataset for research purposes to stimulate further research on the task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Skill extraction is an information extraction task that
tial for many HR applications, such as resume screening
and job recommendation systems. A comparative survey
on skill extraction indicates that research interest has
steadily grown over the last decade [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Traditionally,
skill extraction has been approached as finding and
disambiguating entities in texts. These methods typically rely
on a named entity recognition (NER) component based
      </p>
      <sec id="sec-1-1">
        <title>However, skills are often present implicitly as longer sequences of words (which we refer to as spans) or full sentences rather than being mentioned explicitly: over 85%</title>
        <p>nEvelop-O
LGOBE</p>
        <sec id="sec-1-1-1">
          <title>RecSys in HR’22: The 2nd Workshop on Recommender Systems for</title>
        </sec>
        <sec id="sec-1-1-2">
          <title>Human Resources, in conjunction with the 16th ACM Conference on</title>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>Some work avoids this labeling dificulty completely by using readily available labeled datasets. For example, model was trained based on a corpus of job ads with</title>
        <p>Job posting corpus</p>
        <p>Binary training data</p>
        <p>Binary classifiers</p>
        <p>Distant
supervision Distantly supervised
training corpus</p>
        <p>Data
sampling</p>
        <p>...
1</p>
        <p>ESCO
)
d
e
x
i(f
a
T
R
E
B
o</p>
        <p>
          R
2
3
4
attached skills provided by an online job ads platform. 2. Related Work
However, the authors reported that for that corpus, at
least 40% of the vacancies missed 20% of explicitly stated Multi-label classification datasets often have a skewed
skills in their labels. Recent work [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] successfully recon- label distribution, with many labels occurring only
structed the BERT-XLMC approach on Dutch vacancy a few times or even being completely absent in the
texts using the Dutch RobBERT model [11]. The training training data. Some works have focused on improving
dataset used for this work is however based on the the few-shot and zero-shot classification performance of
output of an existing commercial skill extraction solution. multi-label text classification on these rare or unseen
labels. Typically, the information in structured label
        </p>
        <p>
          We propose a new end-to-end approach to fine-grained graphs (such as label descriptions or relations) or word
skill extraction that does not rely on a large hand-labeled embeddings are used as an input to the system in order
training corpus. Instead, we ease the requirements on the to generalize to unseen labels [13, 14]. However, these
training data such that it can be automatically collected methods still rely on a large labeled training dataset
through distant supervision. We cast the multi-label skill to work. In the absence of any supervision, [15] uses
classification task into independent binary classification a novel self-supervision training objective to train a
problems, with skills labeled on the sentence level, to dense sentence representation model that is used to
encompass both explicit and implicit skill descriptions. assign labels based on cosine similarity in the learned
To the best of our knowledge, our work is the first one to space. Yin et al. [16] propose an entailment approach
tackle fine-grained skill extraction in such a flexible dis- to zero-short text classification, where the input text is
tant supervision setup. Our distant supervision training called the premise, and a hypothesis is constructed for
set contains few false positives, due to the literal match- each label using the template “the text is about label”.
ing of known skills, which is a task with low ambiguity. The premise and hypothesis are concatenated before
However, we expect many false negatives, for skills not being presented to a BERT-based model for prediction,
literally mentioned. This is quantified in Section 4.1. We making this method slow at inference for large label
investigate to what extent the distantly supervised train- spaces.
ing set can be leveraged at maximum efectiveness to
train a fine-grained skill extraction system. To that end, Multi-label classification datasets not only sufer from
we design a number of negative sampling strategies that the rare label problem, also many labels are just missing,
can be used to tune the extraction model training process since they are usually only partially labeled: instances
on a small annotated development set, covering only a without labels thus may either be truly negative, or
fraction of all potential skills (0.2%, to be precise, in our positive but not identified as such during labeling. The
experimental setting). Finally, in order to stimulate re- “Single Positive Labels” scenario is an extreme case of
search on automated skill extraction, and to facilitate the missing labels, where only one positive label is available
comparison of future models with our results, we release1 for each training instance [17]. Research on this topic is
our development and test data, which is constructed on limited, and typically focuses on designing custom loss
top of the “SkillSpan” dataset [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], adding annotations functions [18] or online estimation of the missing labels
with the ESCO [12] skill labels. during training [17]. This line of work is closely related
to “Positive-Unlabeled” (PU) binary classification, which
is typically also tackled using custom loss functions [19].
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>1https://github.com/jensjorisdecorte/Skill-Extraction-benchmark</title>
        <p>Typically in a distant supervision setup, the labeling distant supervision step, on average 365 sentences were
function is followed by a filtering step that aims to reduce labeled per skill (for the set of 13,891 ESCO skills). This
the number of false positives in the labels [20]. However, dataset follows a long tail distribution, with 75.1% of skills
we find that the number of false positives produced by occurring in only ten or fewer sentences.
the distant supervision step is low in our case of literal
skill mentions. This has been shown previously by [21] Skill extraction model: The model architecture is
dewhere literal skill mentions have been successfully used picted in Fig. 1. We use a frozen pre-trained RoBERTa
as distant supervision for the task of job title representa- [23] model with mean pooling to transform input
sention learning. Rather than focusing on a filtering step, we tences into fixed-length contextual representations,
bedraw inspiration from the idea of “hard negative exam- fore presenting them for classification. The classification
ples” in representation learning to improve the learning is performed by separate binary text classification
modprocess. In contrastive learning, hard negative examples els   , each generating an independent prediction value
refer to samples that are dificult to distinguish from an for their respective skill label  . In contrast to a typical
anchor point [22]. This approach improves the discrim- multi-label model, we optimize each classification model
inative abilities and downstream performance of unsu- separately on a diferent corresponding dataset, instead
pervised representation learning methods. We adapt this of training all weights together.
idea to the multi-label classification setup, by
oversampling negative examples from related labels. More details Training with negative sampling:   serves as
poson this approach are contained in the following section. itive training data for classifier   , and negative
examples are sampled from the union of all positive sentence
3. Skill Extraction Approach datasets of all other skills. The basic mechanism for
sampling negatives is uniform sampling from this union.</p>
        <p>We approach the task of skill extraction as a sentence- However, following the ideas in representation
learnlevel multi-label classification task. A high-level ing [22], we hypothesize that sentences from related skills
overview of the method is shown in Fig. 1. Our method are more informative, harder to distinguish from the
posuses distant supervision based on the ESCO skill taxon- itive sentences (i.e., closer to the decision boundary), and
omy to automatically assign (partial) skill labels for a could thus improve the learning process. As such, a
fracgiven set of sentences from the HR domain (in particular, tion of the negative examples are sampled specifically
mined from vacancies). Negative sampling strategies are from sentences that are labeled with a related (but
diferused to combine ‘positive sentences’ for a given skill (i.e., ent) label to skill  . We refer to these sentences as “hard”
sentences labeled with that skill during the distant su- negative samples. Our negative sampling strategy is thus
pervision step) with sentences not containing that skill defined by two important factors. First, the fraction of
(referred to as ‘negative sentences’). Finally, a binary uniformly sampled negatives versus the hard negative
classifier   is trained for each skill  , based on the con- samples is important. Secondly, how we define whether
structed positive and negative sentences for that skill. two skills are related is crucial to the learning process.
It consists of a logistic regression classifier on top of a We introduce three diferent strategies for selecting the
(frozen) representation for the sentences, as described in related skills in Section 3.1.
more detail below.</p>
        <p>Distantly supervised training set: Given a set  of
skills and a background corpus of sentences  , for each
skill  ∈  , a set   of positive sentences is collected from
 through distant supervision. In particular, we use the
ESCO [12] skills taxonomy as the set of classification
labels. The set   of positive sentences for each skill  ,
consists of those sentences in  that literally mention
the skill  or any of its alternative forms, as provided in
the taxonomy. This assumes that there are no ambiguous
skill names, which holds in most cases as skill names tend
to be specific. The positive labels are very precise, due to
the distant supervision process based on literal matches
with the highly specific ESCO skill names. However,
this means potentially many skills remain unlabeled, i.e.,
the training data is prone to false negatives. After the
Inference and evaluation: The final model is used
to rank the relevance of all skills for a given
sentence. Similar to [14], we use the macro-averaged
RPrecision@K (RP@K) metric to evaluate the performance
of the method. Since predictions are made on a
sentencebasis, we restrict the evaluation to low values of K. RP@K
is defined in (1), where the quantity (, ) is a binary
indicator of whether the  th ranked label is a correct label
for data sample  , and   is the number of gold labels
for sample  . In addition, we use the mean reciprocal
rank (MRR) of the highest ranked correct label as an
indicator of the ranking quality. More information on the
evaluation is presented in Section 4.1.
1</p>
        <p>∑ ∑
 =1 =1 ( , 
(, )
 )
(1)
disarm land mine
Haskell
manage musical staf
ensure flock safety
protect important clients
signal for explosion
deal with challenging people
DevOps
XQuery
Windows Phone
SPARK
discharge employees
manage volunteers
supervise nursing staf
guide staf</p>
        <p>Levenshtein
find land mines
search for land mines
identify land mines
dismantle machines
add smell
upsell
sink wells
speak well</p>
        <p>Embedding
repair mine machinery
handle mining plant waste
management of mine ventilation
construct road base
PostgreSQL
Erlang
JavaScript</p>
        <p>C++
manage musical groups
manage musical events
manage musicians
manage educational staf
manage agricultural staf
manage staf
manage dental staf
manage educational staf
3.1. Negative Sampling Strategies which contains job posting sentences annotated with skill
spans. We manually annotate each span in SkillSpan with
Rather than randomly sampling negative examples for its corresponding ESCO skill (if it exists). This span-based
training each binary skill classifier, we assume that sam- multi-class annotation is less complex than annotating
pling more informative negatives will likely lead to a more complete sentences with multiple labels. The process is
eficient training procedure. Instead of sampling hard performed on the test sets of the publicly released subsets
negative sentences directly, we first identify related (yet TECH and HOUSE. Details on the annotation guidelines
diferent) skills, and then sample sentences with those can be found in Appendix A. The annotation efort results
labels. We introduce three diferent strategies for identi- in fine-grained ESCO skill labels for 64.5% of the spans.
fying such related skills, which we analyze through the We split this dataset into a validation and test set using
experiments defined in Section 4. The considered sets a 20%/80% split. The validation set contains 165 unique
of related skills, given a particular skill  are obtained as skill labels, and over 80% of the unique skill labels in the
follows: test set never occur in the validation set. A more detailed
• Siblings: all skills that share a parent concept with  , breakdown of the number of spans and annotations is
as indicated by the “broader concepts” field in ESCO. shown in table 2.
• Levenshtein: The top 100 skills closest to  , according</p>
        <p>to their Levenshtein distance.
• Embedding: The top 100 skills closest to  in terms
of cosine similarity with their mean-pooled
RoBERTaencoded skill name representations.</p>
      </sec>
      <sec id="sec-1-4">
        <title>For each of the negative sampling strategies, some example ESCO skills with their related labels according to the strategy are shown in table 1.</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>4. Experimental setup</title>
      <p>
        4.1. Evaluation
While hand-labeling a training dataset for skill extraction
is infeasible (given the huge number of skills, e.g., over
13k in ESCO), we argue that with reasonable manual
work, it is possible to construct a benchmark that can be
used to compare the performance of diferent models. We
build upon the test set of the SkillSpan dataset from [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
# sentences
# spans
# spans with ESCO label
      </p>
      <p>TECH</p>
      <p>HOUSE
val</p>
      <p>In order to verify our hypothesis that the distant
supervision labeling leads to quite precise positive labels, at
the cost of many false negatives, we validated the distant
supervision labeling of the test set against the manual
annotations. The automatically assigned labels are
indeed rather precise (overall precision of 79%), but at the
cost of low coverage (i.e., a recall of 14.6%).
(a) TECH
(b) HOUSE
4.2. Experiments
the model. Secondly, the “levenshtein” strategy brings
the least improvements out of all three strategies.</p>
      <p>The sentences used for training are collected from a large
proprietary corpus of public job postings. This dataset Finally, we trained a model that combines all strategies.
has been collected from diferent public job boards and Based on the results of the above hyper-parameter
contains a large number of English job postings. ESCO search, we chose 5% as an optimal value for the fraction
is used for the distant supervision step: a skill label is as- of negatives sampled through the combined hard
signed when the skill itself, or one of its alternative forms negative strategies. To assess the impact of each of the
provided by ESCO, is literally mentioned in a sentence. strategies within this combination, we trained three
For each skill classifier   , a maximum of one thousand more models in which each of the three strategies is
positive sentences is retained. The amount of negative left out respectively. The performance of these final
examples per positive example is set to 10. We train a models is shown in table 3. The combination of all three
baseline classifier without hard negative sampling. In this strategies yielded the overall best model. This model
case, all negative examples are sampled uniformly from has large performance gains across the MRR and RP@K
the other positive corpora. To investigate the optimal metrics for both the TECH and HOUSE dataset.
hard negative sampling procedure, we conduct a
hyperparameter search for the fraction of negatives sampled us- Leaving out the “Levenshtein” strategy has a relatively
ing the three strategies (sibling, levenshtein, embedding) low impact on the performance. This might be
underversus uniform sampling. Based on the performance on stood by looking at the examples in table 1: string
simthe validation sets, we decide on an optimal value for this ilarity surfaces unrelated skills, for example for proper
percentage. Finally, we report the contribution of each of nouns such as Haskel. This could partially explain the
the negative sampling strategies when combined. This relatively low utility of this negative sampling strategy.
is reported based on performance on the unseen test set, On the other hand, leaving out the “siblings” strategy
and contributions of the strategies are shown through takes away the largest part of the performance
improveablations, by leaving one strategy out at a time. We refer ments. This strategy makes use of the hierarchy defined
to Appendix B for more details on the training procedure. in the ESCO taxonomy, and thus is a reliable method
for selecting informative hard negatives. The efect of
5. Results and Discussion the “embedding” strategy is comparable to the “siblings”
strategy and thus proves a good alternative in case a
hierarchy such as the one in ESCO is not available.</p>
      <sec id="sec-2-1">
        <title>The results of the hyper-parameter search for each of</title>
        <p>the negative sampling strategies are shown in Fig. 2.
From these results, it is clear that the diferent strategies
have diferent efects on the model performance. Most
notably, we find that the optimal fraction of hard
negative sampling is no higher than 5% for any strategy.
This is in line with previous findings on hard negative
sampling [22]. Sampling large amounts of hard negatives
even has a large negative impact on the performance of</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>6. Conclusion and Future Work</title>
      <sec id="sec-3-1">
        <title>We propose an end-to-end approach to skill extraction using distant supervision. The method is able to make ifne-grained skill predictions (using 13,891 skills from ESCO) for a given input sentence. We introduce the</title>
        <p>Baseline classifier
Classifierneg
Classifierneg without embeddings
Classifierneg without Levenshtein
Classifierneg without siblings
MRR
23.65
31.71
31.43
31.11
30.57
33.71
39.09
39.19
38.55
37.07</p>
        <p>MRR
26.66
30.82
29.09
30.14
29.20
34.19
38.69
37.70
37.22
35.91
idea of hard negative sampling through related labels
in a multi-label classification setup and propose three
diferent strategies to select these related labels. We
investigate the impact of each of the strategies, and
found that all three strategies combined yield the
highest increase on top of a baseline model without
hard negative sampling. Both the distant supervision
and the hard negative sampling are designed to work
well without manual labeling, which makes the whole
method very flexible. To the best of our knowledge, we
are the first to design such a system for skill extraction,
and we improve on prior work by providing methods
that have relaxed the requirements from ground-truth
data and that have the ability to make very fine-grained
skill predictions. Finally, we release our hand-labeled test
and validation dataset for skill extraction to stimulate
further research on the task.</p>
        <p>Future work could entail a more extensive
investigation of other hyper-parameters, such as the number of
negatives per positive sentence ( ), which was fixed to
10 in this work. Secondly, more performance gains could
be made if the RoBERTa weights were fine-tuned during
training, but this requires changes in the training setup
which should be carefully investigated. Lastly, it could
be interesting to investigate how limited manual labor
can maximally improve the performance of the method
even further with techniques such as active learning.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <sec id="sec-4-1">
        <title>We thank the anonymous reviewers for their valuable feedback. This project was funded by the Flemish Government, through Flanders Innovation &amp; Entrepreneurship (VLAIO, project HBC.2020.2893).</title>
        <p>S. Mesbah, Using RobBERT and eXtreme multi- esnay, Scikit-learn: Machine learning in Python,
label classification to extract implicit and explicit Journal of Machine Learning Research 12 (2011)
skills from Dutch job descriptions (2022). 2825–2830.
[11] P. Delobelle, T. Winters, B. Berendt, RobBERT: [25] N. Reimers, I. Gurevych, Sentence-bert: Sentence
a Dutch Roberta-based language model, arXiv embeddings using siamese bert-networks, arXiv
preprint arXiv:2001.06286 (2020). preprint arXiv:1908.10084 (2019).
[12] ESCO, European skills, competences, qualifications</p>
        <p>and occupations, EC Directorate E (2017).
[13] J. Lu, L. Du, M. Liu, J. Dipnall, Multi-label few/zero- A. Annotation guidelines
shot learning with knowledge aggregated from
multiple label graphs, arXiv preprint arXiv:2010.07459 Each item that needs to be annotated is a span, thus a
(2020). part of a longer job posting sentence. Both the span and
[14] I. Chalkidis, M. Fergadiotis, P. Malakasiotis, N. Ale- the complete sentence are shown to provide the right
tras, I. Androutsopoulos, Extreme multi-label legal context for annotation. When a span is ambiguous, the
text classification: A case study in EU legislation, full sentence must be read to understand the meaning of
arXiv preprint arXiv:1905.10892 (2019). the span.
[15] Y. Xiong, W.-C. Chang, C.-J. Hsieh, H.-F. Yu,</p>
        <p>I. Dhillon, Extreme zero-shot learning for extreme The task is to annotate the correct and most specific
text classification, arXiv preprint arXiv:2112.08652 skill that is mentioned or implied by the span. The place
(2021). of the candidate labels within the shortlist has no
impor[16] W. Yin, J. Hay, D. Roth, Benchmarking zero-shot tance during annotation. In the case that no correct skill
text classification: Datasets, evaluation and entail- is found in the shortlist, you may search for the correct
ment approach, arXiv preprint arXiv:1909.00161 skill using the ESCO interface [12]. If you still cannot
(2019). ifnd a correct label, select LABEL NOT PRESENT. If you
[17] E. Cole, O. Mac Aodha, T. Lorieul, P. Perona, D. Mor- find that the span can generally not be interpreted as a
ris, N. Jojic, Multi-label learning from single posi- skill, select UNDERSPECIFIED.
tive labels, in: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition,
2021, pp. 933–942. A.1. Examples
[18] D. Zhou, P. Chen, Q. Wang, G. Chen, P.-A. Heng, • Given the span “partner continuously with your many
Acknowledging the unknown for multi-label learn- stakeholders” and the candidate labels Communicate
ing with single positive labels, arXiv preprint With Stakeholders, Negotiate With Stakeholders and
arXiv:2203.16219 (2022). “Liaise With Shareholders”, only the first two labels are
[19] M. C. Du Plessis, G. Niu, M. Sugiyama, Analysis considered correct. “Communicate With Stakeholders”
of learning from positive and unlabeled data, Ad- is most specific with regards to the span, so this label
vances in neural information processing systems should be selected.</p>
        <p>27 (2014).
[20] L. Sterckx, T. Demeester, J. Deleu, C. Develder, • Spans such as “apply your depth of knowledge” or
“apKnowledge base population using semantic label ply your expertise” are classified as UNDERSPECIFIED.
propagation, Knowledge-Based Systems 108 (2016)
79–91.
[21] J.-J. Decorte, J. Van Hautte, T. Demeester, C. De- B. Training details
velder, Jobbert: Understanding job titles through The separate classifiers are implemented as a simple
loskills, arXiv preprint arXiv:2109.09605 (2021). gistic regression model, using the popular scikit-learn
[22] J. Robinson, C.-Y. Chuang, S. Sra, S. Jegelka, Con- toolkit [24]. All parameters are set to their default values,
trastive learning with hard negative samples, arXiv except for the inverse regularization strength parameter
preprint arXiv:2010.04592 (2020).  , which is set to 0.1 for stronger regularization. The
[23] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, RoBERTa model and the mean pooling operation are
imO. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov, plemented using the Sentence-BERT library [25].
Roberta: A robustly optimized bert pretraining
approach, arXiv preprint arXiv:1907.11692 (2019).
[24] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel,</p>
        <p>B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer,
R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,
D. Cournapeau, M. Brucher, M. Perrot, E.
Duch</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>I.</given-names>
            <surname>Khaouja</surname>
          </string-name>
          , I. Kassou,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghogho</surname>
          </string-name>
          ,
          <article-title>A survey on skill identification from online job ads</article-title>
          ,
          <source>IEEE Access 9</source>
          (
          <year>2021</year>
          )
          <fpage>118134</fpage>
          -
          <lpage>118153</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Javed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jacob</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>McNair, Skill: A system for skill identification and normalization</article-title>
          ,
          <source>in: Twenty-Seventh IAAI Conference</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Sayfullina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Malmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kannala</surname>
          </string-name>
          ,
          <article-title>Learning representations for soft skill matching</article-title>
          ,
          <source>in: International conference on analysis of images, social networks and texts</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>141</fpage>
          -
          <lpage>152</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sun</surname>
          </string-name>
          , T. Peng,
          <article-title>Representation of job-skill in artificial intelligence with knowledge graph analysis, in: 2018 IEEE symposium on product compliance engineering-asia (ISPCE-CN)</article-title>
          , IEEE,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Halder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Prasad</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kan</surname>
          </string-name>
          ,
          <article-title>Retrieving skills from job descriptions: A language model based extreme multi-label classiifcation framework</article-title>
          ,
          <source>in: Proceedings of the 28th International Conference on Computational Linguistics</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>5832</fpage>
          -
          <lpage>5842</lpage>
          . URL: https://aclanthology. org/
          <year>2020</year>
          .coling-main.
          <volume>513</volume>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          . coling- main.513.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Sonniks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          ,
          <article-title>Skillspan: Hard and soft skill extraction from English job postings</article-title>
          ,
          <source>arXiv preprint arXiv:2204.12811</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          , Kompetencer:
          <article-title>Fine-grained skill classification in Danish job postings via distant supervision and transfer learning</article-title>
          ,
          <source>arXiv preprint arXiv:2205.01381</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Beauchemin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Laumonier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. L.</given-names>
            <surname>Ster</surname>
          </string-name>
          , M. Yassine, “FIJO”
          <article-title>: a French insurance soft skill detection dataset</article-title>
          ,
          <source>arXiv preprint arXiv:2204.05208</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Tamburri</surname>
          </string-name>
          , W.-J. Van Den Heuvel, M. Garriga,
          <article-title>DataOps for societal intelligence: A data pipeline for labor market skills extraction and matching</article-title>
          ,
          <source>in: 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>391</fpage>
          -
          <lpage>394</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Vermeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Provatorova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Graus</surname>
          </string-name>
          , T. Rajapakse,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>