=Paper=
{{Paper
|id=Vol-3218/paper10
|storemode=property
|title=Skill Extraction from Job Postings using Weak Supervision
|pdfUrl=https://ceur-ws.org/Vol-3218/RecSysHR2022-paper_10.pdf
|volume=Vol-3218
|authors=Mike Zhang,Kristian Nørgaard Jensen,Rob van der Goot,Barbara Plank
|dblpUrl=https://dblp.org/rec/conf/hr-recsys/ZhangJGP22
}}
==Skill Extraction from Job Postings using Weak Supervision==
<pdf width="1500px">https://ceur-ws.org/Vol-3218/RecSysHR2022-paper_10.pdf</pdf>
<pre>
Skill Extraction from Job Postings using Weak Supervision
Mike Zhang1,∗ , Kristian Nørgaard Jensen1 , Rob van der Goot1 and Barbara Plank1,2
1
    IT University of Copenhagen, Rued Langgaards Vej 7, 2300, Copenhagen, Denmark
2
    Ludwig Maximilian University of Munich, Akademiestraße 7, 80799, Munich, Germany


                                             Abstract
                                             Aggregated data obtained from job postings provide powerful insights into labor market demands, and emerging skills, and
                                             aid job matching. However, most extraction approaches are supervised and thus need costly and time-consuming annotation.
                                             To overcome this, we propose Skill Extraction with Weak Supervision. We leverage the European Skills, Competences,
                                             Qualifications and Occupations taxonomy to find similar skills in job ads via latent representations. The method shows a
                                             strong positive signal, outperforming baselines based on token-level and syntactic patterns.

                                             Keywords
                                             Skill Extraction, Weak Supervision, Information Extraction, Job Postings, Skill Taxonomy, ESCO


1. Introduction                                                                                                                                           Job Posting                ESCO
                                                                                                                                                    Python   is   a     must       Python C#

The labor market is under constant development—
often due to changes in technology, migration, and                                                                                                                Language Model
digitization—and so are the skill sets required [1, 2]. Con-
sequentially, large quantities of job vacancy data is emerg-
ing on a variety of platforms. Insights from this data on
labor market skill set demands could aid, for instance, job
matching [3]. The task of automatic skill extraction (SE) is
                                                                                                                                                                               θ
to extract the competences necessary for any occupation
from unstructured text.
   Previous work on supervised SE frame it as a sequence
labeling task (e.g., [4, 5, 6, 7, 8, 9, 10]) or multi-label classi-                                                                   Figure 1: Weakly Supervised Skill Extraction. All ESCO
fication [11]. Annotation is a costly and time-consuming                                                                              skills and n-grams are extracted and embedded through a
process with little annotation guidelines to work with.                                                                               language model, e.g., RoBERTa [13], to get representations.
This could be alleviated by using predefined skill inven-                                                                             We label spans from job postings close in vector space to the
tories.                                                                                                                               ESCO skill.
   In this work, we approach span-level SE with weak
supervision: We leverage the European Skills, Compe-
tences, Qualifications and Occupations (ESCO; [12]) tax-                                                                              pervised method for SE; 2 A linguistic analysis of ESCO
onomy and find similar spans that relate to ESCO skills in                                                                            skills and their presence in job postings; 3 An empirical
embedding space (Figure 1). The advantages are twofold:                                                                               analysis of different embedding pooling methods for SE
First, labeling skills becomes obsolete, which mitigates                                                                              for two skill-based datasets.1
the cumbersome process of annotation. Second, by ex-
tracting skill phrases, this could possibly enrich skill in-
ventories (e.g., ESCO) by finding paraphrases of existing                                                                             2. Methodology
skills. We seek to answer: How viable is Weak Supervision
in the context of SE? We contribute: 1 A novel weakly su-             Formally, we consider a set of job postings 𝒟, where
                                                                      𝑑 ∈ 𝒟 is a set of sequences (e.g., job posting sentences)
RecSys in HR’22: The 2nd Workshop on Recommender Systems for
                                                                      with the 𝑖th input sequence 𝒯𝑑𝑖 = [𝑡1 , 𝑡2 , ..., 𝑡𝑛 ] and a target
                                                                                                       𝑖
Human Resources, in conjunction with the 16th ACM Conference on sequence of B I O -labels 𝒴𝑑 = [𝑦1 , 𝑦2 , ..., 𝑦𝑛 ] (e.g., “B - S K I L L ”,
Recommender Systems, September 18–23, 2022, Seattle, USA.             “I - S K I L L ”, “O ”).2 The goal is to use an algorithm, which
∗
     Corresponding author.                                            predicts skill spans by assigning an output label sequence
Envelope-Open mikz@itu.dk (M. Zhang); krnj@itu.dk (K. N. Jensen);
                                                                      𝒴𝑑𝑖 for each token sequence 𝒯𝑑𝑖 from a job posting based
robv@itu.dk (R. van der Goot); b.plank@lmu.de (B. Plank)
GLOBE https://jjzha.github.io/ (M. Zhang); http://kris927b.github.io/ on representational similarity of a span to any skill in
(K. N. Jensen); http://robvanderg.github.io/ (R. van der Goot);       ESCO.
http://bplank.github.io/ (B. Plank)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License   1
                                       Attribution 4.0 International (CC BY 4.0).                                                         https://github.com/jjzha/skill-extraction-weak-supervision
                                       CEUR Workshop Proceedings (CEUR-WS.org)                                                        2
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                                                                                                                          Definition of labels can be found in [8].
                  A                               Distribution of Length ESCO Skills in Tokens                                       B Distribution of ESCO Skills (Unigrams)                         C    Distribution of ESCO Skills (Bigrams)
                                                                                                                                    500                                                               70
            4000                                                                                                                                                                                      60
                                                                                                                                    400
            3000                                                                                                                                                                                      50
Frequency


                                                                                                                        Frequency


                                                                                                                                                                                          Frequency

                                                                                                                                                                                                           understand written

                                                                                                                                                                                                           understand spoken
                                                                                                                                    300


                                                                                                                                                                                                           ensure compliance
                                                                                                                                                                                                      40


                                                                                                                                                                                                           advise customers
                                                                                                                                                                                                           interact verbally
            2000


                                                                                                                                                                                                           music therapy
                                                                                                                                                                                                      30


                                                                                                                                                                                                           leather goods
                                                                                                                                                                                                           social service
                                                                                                                                    200


                                                                                                                                                                                                           health safety
                                                                                                                                                                                                           service users
                                                                                                                                             equipment
                                                                                                                                                                                                      20


                                                                                                                                             maintain
                                                                                                                                             perform
                                                                                                                                             manage


                                                                                                                                             develop
            1000


                                                                                                                                             prepare
                                                                                                                                             monitor
                                                                                                                                             operate


                                                                                                                                             provide
                                                                                                                                    100
                                                                                                                                                                                                      10


                                                                                                                                             use
                  0                                                                                                                      0                  Unigram                                    0                  Bigram
                              1     2     3        4      5     6    7     8                 9   10    11   12    13
                                                                Token Length
                  D      Distribution of ESCO Skills (Trigrams)         E                Skills POS Sequences in ESCO                F       Skills POS Sequences Sayfullina (Train)                  G    Skills POS Sequences SkillSpan (Train)
                                                                     2000
                                                                                                                                    140                                                            300
                        electrical household appliances


                  17
                                                                                                                                    120                                                            250
                  15


                                                                                      VERB NOUN NOUN NOUN
                                                                               1500


                                                                                      VERB NOUN ADP NOUN


                                                                                                                                             ADJ NOUN NOUN NOUN
                                                                                      VERB ADJ NOUN NOUN
                                                                                                                                    100
                        footwear leather goods


                        music therapy sessions
                        game creation systems
                        prepared animal feeds


                  12


                                                                                                                                             ADJ CCONJ ADJ NOUN
                                                                                                                                                                                                   200
                        digital game creation
      Frequency


                                                                   Frequency


                                                                                                                        Frequency


                                                                                                                                                                                       Frequency
                        ethical code conduct


                        ensure health safety


                                                                                      NOUN NOUN NOUN


                                                                                                                                             NOUN NOUN NOUN


                                                                                                                                                                                                           NOUN NOUN NOUN
                        social service users


                                                                                      VERB NOUN NOUN                                 80
                        follow ethical code


                  10                                                           1000


                                                                                                                                             ADJ NOUN NOUN


                                                                                                                                                                                                           ADJ NOUN NOUN
                                                                                                                                                                                                   150
                                                                                      VERB ADJ NOUN


                                                                                                                                             ADJ VERB NOUN
                                                                                                                                             ADJ PART VERB
                                                                                                                                     60


                                                                                                                                                                                                           PROPN NOUN
                   7
                                                                                      NOUN NOUN


                                                                                                                                             NOUN NOUN


                                                                                                                                                                                                           NOUN NOUN
                                                                                      VERB NOUN


                                                                                                                                                                                                           VERB NOUN
                                                                                                                                                                                                   100


                                                                                      ADJ NOUN


                                                                                                                                             ADJ NOUN


                                                                                                                                                                                                           ADJ NOUN
                   5                                                           500                                                   40


                                                                                                                                                                                                           PROPN
                                                                                                                                                                                                      50


                                                                                      NOUN


                                                                                                                                             NOUN


                                                                                                                                                                                                           NOUN
                   2                                                                                                                 20


                                                                                                                                                                                                           VERB
                                                                                                                                             ADJ


                                                                                                                                                                                                           ADJ
                   0                    Trigram                                  0               POS Sequence                         0                  POS Sequence                                 0               POS Sequence


  Figure 2: Surface-level Statistics of ESCO. We show various statistics of ESCO. (A) ESCO skills token length, the mode
  is three tokens. (B) Most frequent unigrams of ESCO skills. (C) Most frequent bigrams of ESCO skills. (D) Most frequent
  trigrams of ESCO skills. (E) Most frequent POS sequences of ESCO skills. Last, we show the POS sequences of unique skills in
  both train sets of Sayfullina and SkillSpan (F-G).


 Table 1                                                                                                                  ESCO Statistics We use ESCO as a weak supervision
 Statistics of Datasets. Indicated is each dataset and their                                                              signal for discovering skills in job postings. There are
 respective number of sentences, tokens, skill spans, and the                                                             13,890 ESCO skills.4 In Figure 2, we show statistics of the
 average length of skills in tokens.                                                                                      taxonomy: (A) On average most skills are 3 tokens long.
                                  Statistics                        Sayfullina                   SkillSpan
                                                                                                                          In (C-D), we show n-grams frequencies with range [1; 3].
                                                                                                                          We can see that the most frequent uni- and bigrams are
                                  # Sentences                                    3,703              5,866                 verbs, while the most frequent trigrams consist of nouns.
                      Train


                                  # Tokens                                      53,095            122,608
                                                                                                                             Additionally, we show an analysis of ESCO skills from
                                  # Skill Spans                                  3,703              3,325
                                                                                                                          a linguistic perspective. We tag the training data using
                                  # Sentences                                    1,856                 3,992              the publicly available MaChAmp v0.2 model [14] trained
                      Dev.


                                  # Tokens                                      26,519                52,084              on all Universal Dependencies 2.7 treebanks [15].5 Then,
                                  # Skill Spans                                  1,856                 2,697              we count the most frequent Part-of-Speech (POS) tags
                                  # Sentences                                    1,848                 4,680              in all sources of data (E-G). ESCO’s most frequent tag
                      Test


                                  # Tokens                                      26,569                57,528              sequences are V E R B - N O U N , these are not as frequent in
                                  # Skill Spans                                  1,848                 3,093              Sayfullina nor SkillSpan. Sayfullina mostly consists of
                                  Avg. Len. Skills                                    1.77              2.92              adjectives, which is attributed to the categorization of
                                                                                                                          soft skills. SkillSpan mostly consists of N O U N sequences.
                                                                                                                          Overall, we observe most skills consist of verb and noun
  2.1. Data                                                                                                               phrases.

 We use the datasets from [8] (SkillSpan) and a modifica-
 tion of [4] (Sayfullina).3 In Table 1, we show the statis-
                                                                                                                           2.2. Baselines
 tics of both. SkillSpan contain nested labels for skill and                                                               As our approach is to find similar n-grams based on ESCO
 knowledge components [12]. To make it fit for our weak                                                                    skills, we choose an n-gram range of [1; 4] (where 4 is the
 supervision approach, we simplify their dataset by con-                                                                   median) derived from Figure 2 (A). For higher matching
 sidering both skills and knowledge labels as one label                                                                    probability, we apply an additional pre-processing step to
 (i.e., B - K N O W L E D G E becomes B - S K I L L ).                                                                     the ESCO skills by removing non-tokens (e.g., brackets)
                                                                                                                           4
                                                                                                                               Per 25-03-2022, taking ESCO v 1 . 0 . 9 .
                                                                                                                           5
                                                                                                                               A Udify-based [16] multi-task model for POS, lemmatization, de-
  3
      In contrast to SkillSpan, Sayfullina has a skill in every sentence,                                                      pendency parsing, built on top of the t r a n s f o r m e r s library [17], and
      where they focus on categorizing sentences for soft skills.                                                              specifically using mBERT [18].
         Spans in Isolation (ISO)                 Average over Contexts (AOC)                                  Weighted Span Embedding (WSE)
                                                     [CLS] being able to code in Python [SEP]
                                                                                                                      [CLS] enjoy working in groups [SEP]
                                                     [CLS] Python experience [SEP]
                                                     …
           [CLS]       ti          [SEP]             [CLS] Experience in Python is a plus [SEP]
                                                                                                                 {t1 = enjoy, t2 = working, t3 = in, t4 = groups}


             Language Model                                  Language Model                                                 Language Model
                     LLM                                                 LLMSubword aggregation

                            t i⃗                                ⃗
                                                              t python          ⃗
                                                                              t python …              ⃗
                                                                                                    t python                   ⃗
                                                                                                                             t working         t in⃗               ⃗
                                                                                                                                                                 t groups

                                                                               1                                                                  1                     n
                                                                                  ∑t ⃗ ∈                                                               ∑ t ⃗ ∈ (− log ti ) ⋅ t i⃗
                                                                              | |    i
                                                                                             t i⃗                                              | |         i            N


                                                                                ⃗
                                                                              t python                                                         s j⃗


    Figure 3: Skill Representations. We show different methods to embed ESCO skill phrases. The approaches are inspired
    by Litschko et al. [19]. We embed a skill by encoding it directly without surrounding context (left). We aggregate different
    contextual representations of the same skill term (middle). Last, we encode the skill phrase via a weighted sum of embeddings
    with each token’s inverse document frequency as weight (right). For the middle and right methods, 𝒮 is the number of
    sentences where the ESCO skill appears.


    and words between brackets (e.g., “Java (programming)”                               Algorithm 1 Weakly Supervised Skill Extraction
    becomes “Java”). We have three baselines:                                            Require: 𝑀 ∈ {RoBERTa, JobBERT}
        Exact Match: We do exact substring matching with                                 Require: 𝐸 ∈ {ISO, AOC, WSE}
    ESCO and the sentences in both datasets.                                             Require: 𝜏 ∈ [0, 1]
𝒮
𝒮
        Lemmatized Match: ESCO skills are written in the                                   𝑃 ←𝐷           ▷ A set of sentences from job postings
    infinitive form. We take the same approach as exact                                    𝑆 ← 𝑆𝐸            ▷ ESCO Skill embeddings of type 𝐸
    match on the training sets, now with the lemmatized                                    𝐿←∅
    data of both. The data is lemmatized with MaChAmp                                      for 𝑝 ∈ 𝑃 do
    v0.2 [14].                                                                                 𝜃←0
        POS Sequence Match: Motivated by the observa-                                          for 𝑛 ∈ 𝑝 do          ▷ Each ngram 𝑛 of size 1 − 4
    tion that certain POS sequences often overlap between                                          𝐸 ← 𝑀(𝑛)
    sources (Figure 2, E-G), we attempt to match POS se-                                           Θ ← CosSim(𝑆, 𝐸)
    quences within ESCO with the POS sequences in the                                              if max(Θ) > 𝜏 ∧ max(Θ) > 𝜃 then
    datasets. For example N O U N - N O U N , N O U N , V E R B - N O U N and                          𝜃 ← max(Θ)
    A D J - N O U N sequences are commonly occurring in all three                                  end if
    sources.                                                                                   end for
                                                                                               𝐿 ← [𝐿, 𝜃]
    2.3. Skill Representations                                                             end for
                                                                                           return 𝐿
    We investigate several encoding strategies to match n-
    gram representations to embedded ESCO skills, the ap-
    proaches are inspired by Litschko et al. [19], where they                            sentences containing 𝑡. We use all available sentences
    applied them to Information Retrieval. The language                                  in the job postings dataset (excluding Test). For a given
    models (LMs) used to encode the data are RoBERTa [13]                                job posting sentence, we encode 𝑡 by using one of the
    and the domain-specific JobBERT [8]. All obtained vector                             previous mentioned LMs. We average the embeddings of
    representations of skill phrases with the three previous                             its constituent subwords to obtain the final embedding 𝑡.
    encoding methods are compared pairwise with each n-                                     Weighted Span Embedding (WSE): We obtain all
    gram created from Sayfullina and SkillSpan. An explana-                              inverse document frequency (idf) values of each token 𝑡𝑖
    tion of the methods (see Figure 3):                                                  via
       Span in Isolation (ISO): We encode skill phrases 𝑡
    from ESCO in isolation using the aforementioned LMs,                                                                                 𝑛𝑡𝑖
                                                                                                                     idf = −log
                                                                                                                          ,
    without surrounding contexts.                                                                                      𝑁
       Average over Contexts (AOC): We leverage the sur-                                 where 𝑛𝑡𝑖 is the number of occurrences of 𝑡𝑖 and 𝑁 the
    rounding context of a skill phrase 𝑡 by collecting all the                           total number of tokens in our dataset. We encode the
                                           Baseline                                   RoBERTa                                         JobBERT
          60           Sayfullina         SkillSpan
                            Strict-F1       Strict-F1
          40                Loose-F1        Loose-F1
Span-F1


          20

                 0
                               Exact       Lemma              POS               ISO    AOC           WSE                   ISO         AOC              WSE

 Figure 4: Results of Methods. Results on Sayfullina and SkillSpan are indicated by “Baseline” showing performance of Exact,
 Lemmatized (Lemma), and Part-of-Speech (POS). The performance of ISO, AOC, and WSE are separated by model, indicated
 by “RoBERTa” and “JobBERT”. The performance of RoBERTa and JobBERT on SkillSpan is determined by the best performing
 CosSim threshold (0.8).


 Table 2
 Qualitative Examples of Predicted Spans. We show the gold versus predicted spans of the best performing model on both
 datasets. The first 5 qualitative examples are from Sayfullina (RoBERTa with WSE), the last 5 are from SkillSpan. Yellow the
 gold span and pink indicates the predicted span. The examples show many partial overlaps with the gold spans (but also
 incorrect ones), hence the high loose-F1.
                     Gold                                                                    Predicted
                     ...a dynamic customer focused person to join...                         ...a dynamic customer focused person to join...
    Sayfullina


                     ...strong leadership and team management skills ...                     ...strong leadership and team management skills ...
                     ...speak and written english skills ...                                 ...speak and written english skills...
                     ...a team environment and working independently skills ...              ...a team environment and working independently skills...
                     ...tangible business benefit extremely articulate and...                ...tangible business benefit extremely articulate and...
                     ...researcher within machine learning and sensory system design ...     ...researcher within machine learning and sensory system design...
    SkillSpan


                     ...standards and procedures accessing and updating records ...          ...standards and procedures accessing and updating records...
                     ...with a passion for education to...                                   ...with a passion for education to...
                     ...understands Agile as a mindset...                                    ... understands Agile as a mindset...
                     ...experience with AWS GCP Microsoft Azure ...                          ...experience with AWS GCP Microsoft...


 input sentence and compute the weighted sum of the                                     on SkillSpan differs for JobBERT: Performance fluctuates,
 embeddings (⃗𝑠𝑗 ) of the specific skill phrase in the sentence,                        compared to RoBERTa. Precision goes up with a higher
 where each 𝑡𝑖 ’s IDF scores are used as weights. Again,                                threshold, while recall goes down. For RoBERTa, it stays
 we only use the first subword token for each tokenized                                 similar until CosSim= 0.9. We use CosSim= 0.8 as over
 word. Formally, this is                                                                2 LMs and 3 methods it provides the best cutoff.
                                                        𝑛𝑡𝑖
                                        𝑠⃗𝑗 = ∑(−log          ) ⋅ ⃗𝑡𝑖 .
                                             ⃗𝑡𝑖
                                                        𝑁                               3. Analysis of Results
                                                            Results Our main results (Figure 4) show the baselines
 Matching We rank pairs of ESCO embeddings 𝑡 and en- against ISO, AOC, and WSE of both datasets. We eval-
                                                  ⃗
 coded candidate n-grams 𝑔⃗ in decreasing order of cosine uate with two types of F1, following van der Goot et al.
 similarity (CosSim), calculated as                         [20]: s t r i c t and l o o s e - F 1 . For full model fine-tuning,
                                                            RoBERTa achieves 91.31 and 98.55 strict and loose F1
                                       ⃗𝑡𝑇 𝑔⃗               on Sayfullina respectively. For SkillSpan, this is 23.21
                   CosSim(⃗𝑡, 𝑔⃗ ) =           .            and 44.72 strict and loose F1 (on the available subsets of
                                     ‖⃗𝑡‖‖⃗ 𝑔‖
                                                            SkillSpan). JobBERT achieves 90.18 and 98.19 strict and
    We show our pseudocode of the matching algorithm loose F1 on Sayfullina, 49.44 and 74.41 strict and loose F1
 in Algorithm 1. Note that in SkillSpan we have to set on SkillSpan. The large difference between results is most
 a threshold for CosSim, as there are sentences with no likely due to lack of negatives in Sayfullina, i.e., all sen-
 skills. A threshold allows us to have a “no skill” option. tences contain a skill, which makes the task easier. These
 As seen in Figure 5, Appendix A the threshold sensitivity results highlight the difficulty of SE on SkillSpan, where
there are negatives as well (sentences with no skills).             Acknowledgments
   The exact match baseline on SkillSpan is higher than
Sayfullina. We attribute this to SkillSpan also containing          We thank the NLPnorth group for feedback on an ear-
“hard skills” (e.g., “Python”), which is easier to match            lier version of this paper—in particular, Elisa Bassignana
substrings with than “soft skills”.6                                and Max Müller-Eberstein for insightful discussions. We
   For the performance of the skill representations on              would also like to thank the anonymous reviewers for
Sayfullina, RoBERTa and JobBERT outperform the Exact                their comments to improve this paper. Last, we also thank
and Lemmatized baseline on strict-F1. For the POS base-             NVIDIA and the ITU High-performance Computing clus-
line, only the ISO method of both models is slightly better.        ter for computing resources. This research is supported
JobBERT performs better than RoBERTa in strict-F1 on                by the Independent Research Fund Denmark (DFF) grant
both datasets.                                                      9131-00019B.
   There is a substantial difference between strict and
loose-F1 on both datasets. This indicates that there
is partial overlap among the predicted and gold spans.
                                                                    References
RoBERTa performs best for Sayfullina, achieving 59.61                [1] E. Brynjolfsson, A. McAfee, Race against the ma-
loose-F1 with WSE. In addition, the best performing                      chine: How the digital revolution is accelerating
method for JobBERT is also WSE (52.69 loose-F1). For                     innovation, driving productivity, and irreversibly
SkillSpan we see a drop, JobBERT outperforms RoBERTa                     transforming employment and the economy, Bryn-
with AOC (32.30 vs. 26.10 loose-F1) given a threshold                    jolfsson and McAfee, 2011.
of CosSim = 0.8. We hypothesize this drop in perfor-                 [2] E. Brynjolfsson, A. McAfee, The second machine
mance compared to Sayfullina could be attributed again                   age: Work, progress, and prosperity in a time of
to SkillSpan containing negative examples as well (i.e.,                 brilliant technologies, WW Norton & Company,
sentences with no skill).                                                2014.
                                                                     [3] K. Balog, Y. Fang, M. De Rijke, P. Serdyukov, L. Si,
Qualitative Analysis A qualitative analysis (Table 2)                    Expertise retrieval, Foundations and Trends in In-
reveals there is strong partial overlap with gold vs. pre-               formation Retrieval 6 (2012) 127–256.
dicted spans on both datasets, e.g., “...strong leadership           [4] L. Sayfullina, E. Malmi, J. Kannala, Learning repre-
and team management skills...” vs. “...strong leadership                 sentations for soft skill matching, in: International
and team management skills...”, indicating the viability of              Conference on Analysis of Images, Social Networks
this method.                                                             and Texts, 2018, pp. 141–152.
                                                                     [5] D. A. Tamburri, W.-J. Van Den Heuvel, M. Garriga,
                                                                         Dataops for societal intelligence: a data pipeline
4. Conclusion                                                            for labor market skills extraction and matching, in:
We investigate whether the ESCO skill taxonomy suits                     2020 IEEE 21st International Conference on Infor-
as weak supervision signal for Skill Extraction. We apply                mation Reuse and Integration for Data Science (IRI),
several skill representation methods based on previous                   IEEE, 2020, pp. 391–394.
work. We show that using representations of ESCO skills              [6] M. Chernova, Occupational skills extraction with
can aid us in this task. We achieve high loose-F1, indi-                 FinBERT, Master’s Thesis (2020).
cating there is partial overlap between the predicted and            [7] M. Zhang, K. N. Jensen, B. Plank, Kompetencer:
gold spans, but need refined off-set methods to get the                  Fine-grained skill classification in danish job post-
correct span out (e.g., human post-editing or automatic                  ings via distant supervision and transfer learning,
methods such as candidate filtering). Nevertheless, we                   Under Review, LREC 2022 (2022).
see this approach as a strong alternative for supervised             [8] M. Zhang, K. N. Jensen, S. Sonniks, B. Plank,
Skill Extraction from job postings.                                      SkillSpan: Hard and soft skill extraction from En-
   Future work could include going towards multilingual                  glish job postings, in: Proceedings of the 2022
Skill Extraction, as ESCO consists of 27 languages, exact                Conference of the North American Chapter of the
matching should be trivial. For the other methods several                Association for Computational Linguistics: Human
considerations need to be taken into account, e.g., a POS-               Language Technologies, Association for Computa-
tagger and/or lemmatizer for another language and a                      tional Linguistics, Seattle, United States, 2022, pp.
language-specific model.                                                 4962–4984.
                                                                     [9] T. Green, D. Maynard, C. Lin, Development of a
                                                                         benchmark corpus to support entity recognition in
                                                                         job descriptions, in: Proceedings of the Language
6
    The exact numbers (+precision and recall) are in Table 3, Ap-        Resources and Evaluation Conference, European
    pendix A, including the definition of strict and loose-F1.
     Language Resources Association, Marseille, France,         E. Davidson, M.-C. de Marneffe, V. de Paiva, M. O.
     2022, pp. 1201–1208. URL: https://aclanthology.org/        Derin, E. de Souza, A. Diaz de Ilarraza, C. Dick-
     2022.lrec-1.128.                                           erson, A. Dinakaramani, E. Di Nuovo, B. Dione,
[10] A.-S. Gnehm, E. BÃ¼hlmann, S. Clematide, Evalua-           P. Dirix, K. Dobrovoljc, T. Dozat, K. Droganova,
     tion of transfer learning and domain adaptation for        P. Dwivedi, H. Eckhoff, S. Eiche, M. Eli, A. Elkahky,
     analyzing german-speaking job advertisements, in:          B. Ephrem, O. Erina, T. Erjavec, A. Etienne, W. Eve-
     Proceedings of the Language Resources and Eval-            lyn, S. Facundes, R. Farkas, M. Fernanda, H. Fernan-
     uation Conference, European Language Resources             dez Alcalde, J. Foster, C. Freitas, K. Fujita, K. Gaj-
     Association, Marseille, France, 2022, pp. 3892–3901.       došová, D. Galbraith, M. Garcia, M. Gärdenfors,
     URL: https://aclanthology.org/2022.lrec-1.414.             S. Garza, F. F. Gerardi, K. Gerdes, F. Ginter, G. Godoy,
[11] A. Bhola, K. Halder, A. Prasad, M.-Y. Kan, Retriev-        I. Goenaga, K. Gojenola, M. Gökırmak, Y. Goldberg,
     ing skills from job descriptions: A language model         X. Gómez Guinovart, B. González Saavedra, B. Gri-
     based extreme multi-label classification framework,        ciūtė, M. Grioni, L. Grobol, N. Grūzītis, B. Guil-
     in: Proceedings of the 28th International Con-             laume, C. Guillot-Barbance, T. Güngör, N. Habash,
     ference on Computational Linguistics, Interna-             H. Hafsteinsson, J. Hajič, J. Hajič jr., M. Hämäläinen,
     tional Committee on Computational Linguistics,             L. Hà Mỹ , N.-R. Han, M. Y. Hanifmuti, S. Hard-
     Barcelona, Spain (Online), 2020, pp. 5832–5842.            wick, K. Harris, D. Haug, J. Heinecke, O. Hell-
[12] M. le Vrang, A. Papantoniou, E. Pauwels, P. Fannes,        wig, F. Hennig, B. Hladká, J. Hlaváčová, F. Ho-
     D. Vandensteen, J. De Smedt, Esco: Boosting job            ciung, P. Hohle, E. Huber, J. Hwang, T. Ikeda,
     matching in europe with semantic interoperability,         A. K. Ingason, R. Ion, E. Irimia, Ọ. Ishola, K. Ito,
     Computer 47 (2014) 57–64.                                  T. Jelínek, A. Jha, A. Johannsen, H. Jónsdóttir, F. Jør-
[13] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen,        gensen, M. Juutinen, S. K, H. Kaşıkara, A. Kaasen,
     O. Levy, M. Lewis, L. Zettlemoyer, V. Stoyanov,            N. Kabaeva, S. Kahane, H. Kanayama, J. Kan-
     Roberta: A robustly optimized bert pretraining ap-         erva, N. Kara, B. Katz, T. Kayadelen, J. Ken-
     proach, arXiv preprint arXiv:1907.11692 (2019).            ney, V. Kettnerová, J. Kirchner, E. Klementieva,
[14] R. van der Goot, A. Üstün, A. Ramponi, I. Sharaf,          A. Köhn, A. Köksal, K. Kopacewicz, T. Korkiakan-
     B. Plank, Massive choice, ample tasks (MaChAmp):           gas, N. Kotsyba, J. Kovalevskaitė, S. Krek, P. Kr-
     A toolkit for multi-task learning in NLP, in:              ishnamurthy, O. Kuyrukçu, A. Kuzgun, S. Kwak,
     Proceedings of the 16th Conference of the Euro-            V. Laippala, L. Lam, L. Lambertino, T. Lando,
     pean Chapter of the Association for Computational          S. D. Larasati, A. Lavrentiev, J. Lee, P. Lê Hồng,
     Linguistics: System Demonstrations, Association            A. Lenci, S. Lertpradit, H. Leung, M. Levina, C. Y.
     for Computational Linguistics, Online, 2021, pp.           Li, J. Li, K. Li, Y. Li, K. Lim, B. Lima Padovani,
     176–197.                                                   K. Lindén, N. Ljubešić, O. Loginova, A. Luthfi,
[15] D. Zeman, J. Nivre, M. Abrams, E. Ackermann,               M. Luukko, O. Lyashevskaya, T. Lynn, V. Mack-
     N. Aepli, H. Aghaei, Ž. Agić, A. Ahmadi, L. Ahren-         etanz, A. Makazhanov, M. Mandl, C. Manning,
     berg, C. K. Ajede, G. Aleksandravičiūtė, I. Alfina,        R. Manurung, B. Marşan, C. Mărănduc, D. Mareček,
     L. Antonsen, K. Aplonova, A. Aquino, C. Aragon,            K. Marheinecke, H. Martínez Alonso, A. Martins,
     M. J. Aranzabe, B. N. Arıcan, H.⁀ Arnardóttir, G. Aru-     J. Mašek, H. Matsuda, Y. Matsumoto, A. Mazzei,
     tie, J. N. Arwidarasti, M. Asahara, D. B. Aslan,           R. McDonald, S. McGuinness, G. Mendonça,
     L. Ateyah, F. Atmaca, M. Attia, A. Atutxa, L. Au-          N. Miekka, K. Mischenkova, M. Misirpashayeva,
     gustinus, E. Badmaeva, K. Balasubramani, M. Balles-        A. Missilä, C. Mititelu, M. Mitrofan, Y. Miyao, A. Mo-
     teros, E. Banerjee, S. Bank, V. Barbu Mititelu,            jiri Foroushani, J. Molnár, A. Moloodi, S. Monte-
     S. Barkarson, V. Basmov, C. Batchelor, J. Bauer,           magni, A. More, L. Moreno Romero, G. Moretti, K. S.
     S. T. Bedir, K. Bengoetxea, G. Berk, Y. Berzak, I. A.      Mori, S. Mori, T. Morioka, S. Moro, B. Mortensen,
     Bhat, R. A. Bhat, E. Biagetti, E. Bick, A. Bielinskienė,   B. Moskalevskyi, K. Muischnek, R. Munro, Y. Mu-
     K. Bjarnadóttir, R. Blokland, V. Bobicev, L. Boizou,       rawaki, K. Müürisep, P. Nainwani, M. Nakhlé, J. I.
     E. Borges Völker, C. Börstell, C. Bosco, G. Bouma,         Navarro Horñiacek, A. Nedoluzhko, G. Nešpore-
     S. Bowman, A. Boyd, A. Braggaar, K. Brokaitė,              Bērzkalne, M. Nevaci, L. Nguyễn Thị, H. Nguyễn
     A. Burchardt, M. Candito, B. Caron, G. Caron,              Thị Minh, Y. Nikaido, V. Nikolaev, R. Nitisaroj,
     L. Cassidy, T. Cavalcanti, G. Cebiroğlu Eryiğit,           A. Nourian, H. Nurmi, S. Ojala, A. K. Ojha,
     F. M. Cecchini, G. G. A. Celano, S. Čéplö, N. Cesur,       A. Olúòkun, M. Omura, E. Onwuegbuzia, P. Osen-
     S. Cetin, Ö. Çetinoğlu, F. Chalub, S. Chauhan,             ova, R. Östling, L. Øvrelid, Ş. B. Özateş, M. Özçe-
     E. Chi, T. Chika, Y. Cho, J. Choi, J. Chun, A. T.          lik, A. Özgür, B. Öztürk Başaran, H. H. Park,
     Cignarella, S. Cinková, A. Collomb, Ç. Çöltekin,           N. Partanen, E. Pascual, M. Passarotti, A. Pate-
     M. Connor, M. Courtin, M. Cristescu, P. Daniel,            juk, G. Paulino-Passos, A. Peljak-Łapińska, S. Peng,
     C.-A. Perez, N. Perkova, G. Perrier, S. Petrov,            ers: State-of-the-art natural language process-
     D. Petrova, J. Phelan, J. Piitulainen, T. A. Piri-         ing, in: Proceedings of the 2020 Conference on
     nen, E. Pitler, B. Plank, T. Poibeau, L. Ponomareva,       Empirical Methods in Natural Language Process-
     M. Popel, L. Pretkalniņa, S. Prévost, P. Prokopidis,       ing: System Demonstrations, Association for Com-
     A. Przepiórkowski, T. Puolakainen, S. Pyysalo, P. Qi,      putational Linguistics, Online, 2020, pp. 38–45.
     A. Rääbis, A. Rademaker, T. Rama, L. Ramasamy,             URL: https://aclanthology.org/2020.emnlp-demos.6.
     C. Ramisch, F. Rashel, M. S. Rasooli, V. Ravis-            doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 0 . e m n l p - d e m o s . 6 .
     hankar, L. Real, P. Rebeja, S. Reddy, G. Rehm, I. Ri- [18] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:
     abov, M. Rießler, E. Rimkutė, L. Rinaldi, L. Rit-          Pre-training of deep bidirectional transformers for
     uma, L. Rocha, E. Rögnvaldsson, M. Romanenko,              language understanding, in: Proceedings of the
     R. Rosa, V. Roșca, D. Rovati, O. Rudina, J. Rueter,        2019 Conference of the North American Chap-
     K. Rúnarsson, S. Sadde, P. Safari, B. Sagot, A. Sa-        ter of the Association for Computational Linguis-
     hala, S. Saleh, A. Salomoni, T. Samardžić, S. Sam-         tics: Human Language Technologies, Volume 1
     son, M. Sanguinetti, E. Sanıyar, D. Särg, B. Saulīte,      (Long and Short Papers), Association for Com-
     Y. Sawanakunanon, S. Saxena, K. Scannell, S. Scar-         putational Linguistics, Minneapolis, Minnesota,
     lata, N. Schneider, S. Schuster, L. Schwartz, D. Sed-      2019, pp. 4171–4186. URL: https://aclanthology.org/
     dah, W. Seeker, M. Seraji, M. Shen, A. Shimada,            N19-1423. doi:1 0 . 1 8 6 5 3 / v 1 / N 1 9 - 1 4 2 3 .
     H. Shirasu, Y. Shishkina, M. Shohibussirri, D. Sichi- [19] R. Litschko, I. Vulić, S. P. Ponzetto, G. Glavaš, On
     nava, J. Siewert, E. F. Sigurðsson, A. Silveira, N. Sil-   cross-lingual retrieval with multilingual text en-
     veira, M. Simi, R. Simionescu, K. Simkó, M. Šimková,       coders, Information Retrieval Journal (2022) 1–35.
     K. Simov, M. Skachedubova, A. Smith, I. Soares- [20] R. van der Goot, I. Sharaf, A. Imankulova, A. Üstün,
     Bastos, C. Spadine, R. Sprugnoli, S. Steingrímsson,        M. Stepanovic, A. Ramponi, S. O. Khairunnisa,
     A. Stella, M. Straka, E. Strickland, J. Strnadová,         M. Komachi, B. Plank, From masked-language mod-
     A. Suhr, Y. L. Sulestio, U. Sulubacak, S. Suzuki,          eling to translation: Non-English auxiliary tasks
     Z. Szántó, D. Taji, Y. Takahashi, F. Tamburini,            improve zero-shot spoken language understanding,
     M. A. C. Tan, T. Tanaka, S. Tella, I. Tellier, M. Testori, in: Proceedings of the 2021 Conference of the North
     G. Thomas, L. Torga, M. Toska, T. Trosterud,               American Chapter of the Association for Computa-
     A. Trukhina, R. Tsarfaty, U. Türk, F. Tyers, S. Ue-        tional Linguistics: Human Language Technologies,
     matsu, R. Untilov, Z. Urešová, L. Uria, H. Uszkor-         Volume 1 (Long and Short Papers), Association for
     eit, A. Utka, S. Vajjala, R. van der Goot, M. Van-         Computational Linguistics, Mexico City, Mexico,
     hove, D. van Niekerk, G. van Noord, V. Varga,              2021.
     E. Villemonte de la Clergerie, V. Vincze, N. Vlasova,
     A. Wakasa, J. C. Wallenberg, L. Wallin, A. Walsh,
     J. X. Wang, J. N. Washington, M. Wendt, P. Wid-
     mer, S. Williams, M. Wirén, C. Wittern, T. Wolde-
     mariam, T.-s. Wong, A. Wróblewska, M. Yako,
     K. Yamashita, N. Yamazaki, C. Yan, K. Yasuoka,
     M. M. Yavrumyan, A. B. Yenice, O. T. Yıldız, Z. Yu,
     Z. Žabokrtský, S. Zahra, A. Zeldes, H. Zhu, A. Zhu-
     ravleva, R. Ziane, Universal dependencies 2.8.1,
     2021. LINDAT/CLARIAH-CZ digital library at the
     Institute of Formal and Applied Linguistics (ÚFAL),
     Faculty of Mathematics and Physics, Charles Uni-
     versity.
[16] D. Kondratyuk, M. Straka, 75 languages, 1 model:
     Parsing universal dependencies universally, in: Pro-
     ceedings of the 2019 Conference on Empirical Meth-
     ods in Natural Language Processing and the 9th In-
     ternational Joint Conference on Natural Language
     Processing (EMNLP-IJCNLP), 2019, pp. 2779–2795.
[17] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. De-
     langue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Fun-
     towicz, J. Davison, S. Shleifer, P. von Platen, C. Ma,
     Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gug-
     ger, M. Drame, Q. Lhoest, A. Rush, Transform-
                                    Dataset →                                                        Sayfullina                                                    SkillSpan
                                    ↓ Method, Metric →                       Strict (P | R | F1)                  Loose (P | R | F1)           Strict (P | R | F1)          Loose (P | R | F1)
                                    Exact                                        9.27 | 1.30 | 2.28               25.48 | 3.95 | 6.84          23.82 | 3.21 | 5.62         43.68 | 8.27 | 13.79
       JobBERT RoBERTa Baseline


                                    Lemmatized                                   8.49 | 1.19 | 2.09               25.87 | 4.00 | 6.93          23.90 | 2.97 | 5.21         41.09 | 7.49 | 12.52
                                    POS                                          5.99 | 5.95 | 5.97             36.55 | 34.51 | 35.50           5.97 | 7.88 | 6.79        19.34 | 34.71 | 24.80
                                    ISO                                          6.26 | 6.25 | 6.26             26.90 | 28.98 | 27.90           2.90 | 4.24 | 3.43        12.69 | 28.61 | 17.56
                                    AOC                                          3.24 | 3.24 | 3.24             64.04 | 55.53 | 59.48           2.23 | 2.93 | 2.53        20.08 | 37.56 | 26.10
                                    WSE                                          3.67 | 3.67 | 3.67             64.64 | 55.32 | 59.61           2.29 | 2.93 | 2.57        20.90 | 37.79 | 26.85
                                    ISO                                          7.71 | 7.72 | 7.71             27.76 | 29.95 | 28.82           4.17 | 4.65 | 4.39        17.07 | 29.48 | 21.61
                                    AOC                                          4.04 | 4.05 | 4.05             56.50 | 48.41 | 52.14           4.44 | 2.96 | 3.54        33.64 | 31.28 | 32.30
                                    WSE                                          4.15 | 4.16 | 4.15             56.98 | 49.00 | 52.69           4.78 | 3.08 | 3.74        34.01 | 30.33 | 31.95

 Table 3
 We show the exact numbers of the performance of the methods.

           X                                                                             Baselines (left: Sayfullina Test, right: SkillSpan Test)
           40                        Precision
                                     Recall
                                                         Strict
                                                         Loose
           30                        F1
F1-score


           20
           10
            0
                                                 Exact                Lemmatized              Part-of-Speech                           Exact                 Lemmatized          Part-of-Speech
           A                                      ISO (Sayfullina Test)                 B                      AOC (Sayfullina Test)              C                 WSE (Sayfullina Test)
           30                                                                           60                                                        60
           20
F1-score


                                                                                        40                                                        40
           10                                                                           20                                                        20
                  0                                                                      0                                                            0
                                       RoBERTa                         JobBERT                       RoBERTa                      JobBERT                     RoBERTa                  JobBERT
           D                               ISO RoBERTa (SkillSpan Test)                 E              AOC RoBERTa (SkillSpan Test)               F             WSE RoBERTa (SkillSpan Test)
           30                                                                           40                                                        40                                    Strict    Loose
                                                                                                                                                                                            P         P
           20                                                                           30                                                        30                                        R         R
                                                                                                                                                                                            F1        F1
Value


                                                                                        20                                                        20
           10                                                                           10                                                        10
                  0                                                                       0                                                         0
                                  0.70 0.75 0.80 0.85 0.90 0.95 1.00                           0.70 0.75 0.80 0.85 0.90 0.95 1.00                         0.70 0.75 0.80 0.85 0.90 0.95 1.00
           G                               ISO JobBERT (SkillSpan Test)                 H               AOC JobBERT (SkillSpan Test)              I             WSE JobBERT (SkillSpan Test)
           40
                                                                                        40                                                        40
           30
Value


           20                                                                           20                                                        20
           10
            0                                                                            0                                                            0
                                  0.70 0.75 0.80 0.85 0.90 0.95 1.00                           0.70 0.75 0.80 0.85 0.90 0.95 1.00                         0.70 0.75 0.80 0.85 0.90 0.95 1.00
                                                             CosSim                                                  CosSim                                               CosSim

  Figure 5: Results of Methods. Results of the baselines are in (X), the performance of ISO, AOC, and WSE on Sayfullina
  in (A-C), and the same performance on SkillSpan in (D-I) based on the model (RoBERTa or JobBERT). In D–F, we show the
  precision (P), recall (R), and F1 differences when taking an increasing CosSim.


 A. Exact Results                                             This is called s t r i c t - F 1 . In the second variant, we seek
                                                              for partial matches, i.e., overlap between the predicted
  Definition F1 As mentioned, we evaluate with two and gold span including the correct label, which counts
  types of F1-scores, following van der Goot et al. [20]. The towards true positives for precision and recall. This is
  first type is the commonly used span-F1, where only the called l o o s e - F 1 . We consider the loose variant as well,
  correct span and label are counted towards true positives. because we want to analyze whether the span is “almost
correct”.

Exact Numbers Results We show the exact numbers
of Figure 4 in Table 3 and more detailed results in Fig-
ure 5. Results show that there is high precision among
the baseline approaches compared to recall. This is bal-
anced using the representation methods for Sayfullina.
However, we observe that there is much higher recall for
SkillSpan than precision.

</pre>