<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Recommender Systems, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Job Postings using Weak Supervision</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mike Zhang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kristian Nørgaard Jensen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rob van der Goot</string-name>
          <email>robv@itu.dk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Barbara Plank</string-name>
          <email>b.plank@lmu.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IT University of Copenhagen</institution>
          ,
          <addr-line>Rued Langgaards Vej 7, 2300, Copenhagen</addr-line>
          ,
          <country country="DK">Denmark</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ludwig Maximilian University of Munich</institution>
          ,
          <addr-line>Akademiestraße 7, 80799, Munich</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Skill Extraction, Weak Supervision, Information Extraction</institution>
          ,
          <addr-line>Job Postings, Skill Taxonomy, ESCO</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <fpage>8</fpage>
      <lpage>23</lpage>
      <abstract>
        <p>Aggregated data obtained from job postings provide powerful insights into labor market demands, and emerging skills, and aid job matching. However, most extraction approaches are supervised and thus need costly and time-consuming annotation. To overcome this, we propose Skill Extraction with Weak Supervision. We leverage the European Skills, Competences, Qualifications and Occupations taxonomy to find similar skills in job ads via latent representations. The method shows a strong positive signal, outperforming baselines based on token-level and syntactic patterns.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>This could be alleviated by using predefined skill
inven</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        The labor market is under constant development—
often due to changes in technology, migration, and
digitization—and so are the skill sets required [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
Consequentially, large quantities of job vacancy data is
emerging on a variety of platforms. Insights from this data on
labor market skill set demands could aid, for instance, job
matching [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The task of automatic skill extraction (SE) is
to extract the competences necessary for any occupation
from unstructured text.
      </p>
      <p>
        Previous work on supervised SE frame it as a sequence
labeling task (e.g., [
        <xref ref-type="bibr" rid="ref10 ref4 ref5 ref6 ref7 ref8 ref9">4, 5, 6, 7, 8, 9, 10</xref>
        ]) or multi-label
classiifcation [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Annotation is a costly and time-consuming
tories.
      </p>
      <p>
        In this work, we approach span-level SE with weak
supervision: We leverage the European Skills,
Competences, Qualifications and Occupations (ESCO; [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ])
taxonomy and find similar spans that relate to ESCO skills in
embedding space (Figure 1). The advantages are twofold:
First, labeling skills becomes obsolete, which mitigates
the cumbersome process of annotation. Second, by
extracting skill phrases, this could possibly enrich skill
inventories (e.g., ESCO) by finding paraphrases of existing
skills. We seek to answer: How viable is Weak Supervision
RecSys in HR’22: The 2nd Workshop on Recommender Systems for
Human Resources, in conjunction with the 16th ACM Conference on
∗Corresponding author.
http://bplank.github.io/ (B. Plank)
must
      </p>
      <sec id="sec-2-1">
        <title>ESCO Python C#</title>
      </sec>
      <sec id="sec-2-2">
        <title>Language Model</title>
        <p>θ
skills and n-grams are extracted and embedded through a
We label spans from job postings close in vector space to the</p>
        <sec id="sec-2-2-1">
          <title>ESCO skill.</title>
          <p>for two skill-based datasets.1
pervised method for SE; 2 A linguistic analysis of ESCO
skills and their presence in job postings; 3 An empirical
analysis of diferent embedding pooling methods for SE</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methodology 2.</title>
      <p>∈ 
ESCO.</p>
      <p>is a set of sequences (e.g., job posting sentences)
with the  th input sequence</p>
      <p>= [ 1,  2, ...,   ] and a target
sequence of B I O -labels    = [ 1,  2, ...,   ] (e.g., “B - S K I L L ”,
“I - S K I L L ”, “O ”).2 The goal is to use an algorithm, which
predicts skill spans by assigning an output label sequence
   for each token sequence    from a job posting based
on representational similarity of a span to any skill in
in the context of SE? We contribute: 1 A novel weakly su- Formally, we consider a set of job postings  , where
D Distribution of ESCO Skills (Trigrams)
17 csen
15 lia
rcFyeeunq11750202 ilirrccssssvaeeeou lfttrrsaaeeeoooohgdw lliltrccsaaeeeoouhhppd littccccaeeoonuhddTrigilfrrsaaaeeeendppdmramiliittrcaaaeeongdgm lfttrssyaaeeeehhnu lillftccaeeooohdw ittrcsssyaaeeeongmm iitrcsssssyaeeohunpm
6 7 8
Token Length</p>
      <p>E
2000
itttrrsaeenunnddw illttrrcvyaaeenb trsskaeeonunnddp itrcsssvaeeoudm lfttsyaaeehh irrcsssveeeu ltrsaeeoohgd ilirccssvaeeo lirccsaeeeounnpm itrcsyaeuhpm</p>
      <p>Bigram
400
y
cen300
u
q
rFe200
100
0</p>
      <p>t
aeng ienpm traee iitaann frreom seu lveeopd itroonm irveodp rraeepp
am euq op m p</p>
      <p>Unigram</p>
      <p>ESCO Statistics We use ESCO as a weak supervision
signal for discovering skills in job postings. There are
13,890 ESCO skills.4 In Figure 2, we show statistics of the
taxonomy: (A) On average most skills are 3 tokens long.</p>
      <p>In (C-D), we show n-grams frequencies with range [1; 3].</p>
      <p>We can see that the most frequent uni- and bigrams are
verbs, while the most frequent trigrams consist of nouns.</p>
      <p>
        Additionally, we show an analysis of ESCO skills from
a linguistic perspective. We tag the training data using
the publicly available MaChAmp v0.2 model [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] trained
on all Universal Dependencies 2.7 treebanks [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].5 Then,
we count the most frequent Part-of-Speech (POS) tags
in all sources of data (E-G). ESCO’s most frequent tag
sequences are V E R B - N O U N , these are not as frequent in
Sayfullina nor SkillSpan. Sayfullina mostly consists of
adjectives, which is attributed to the categorization of
soft skills. SkillSpan mostly consists of N O U N sequences.
      </p>
      <p>
        Overall, we observe most skills consist of verb and noun
phrases.
2.1. Data
We use the datasets from [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] (SkillSpan) and a modifica- 2.2. Baselines
tion of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (Sayfullina).3 In Table 1, we show the
statistics of both. SkillSpan contain nested labels for skill and As our approach is to find similar n-grams based on ESCO
knowledge components [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. To make it fit for our weak skills, we choose an n-gram range of [1; 4] (where 4 is the
supervision approach, we simplify their dataset by con- median) derived from Figure 2 (A). For higher matching
sidering both skills and knowledge labels as one label probability, we apply an additional pre-processing step to
(i.e., B - K N O W L E D G E becomes B - S K I L L ). the ESCO skills by removing non-tokens (e.g., brackets)
3In contrast to SkillSpan, Sayfullina has a skill in every sentence,
where they focus on categorizing sentences for soft skills.
4Per 25-03-2022, taking ESCO v1.0.9.
5A Udify-based [16] multi-task model for POS, lemmatization,
dependency parsing, built on top of the transformers library [17], and
specifically using mBERT [ 18].
      </p>
      <p>Spans in Isolation (ISO)</p>
      <p>Average over Contexts (AOC)</p>
      <p>Weighted Span Embedding (WSE)
with each token’s inverse document frequency as weight (right). For the middle and right methods,  is the number of
sentences where the ESCO skill appears.</p>
      <p>
        Algorithm 1 Weakly Supervised Skill Extraction
Require:  ∈ { RoBERTa, JobBERT}
Require:  ∈ { ISO, AOC, WSE}
Require:  ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]
      </p>
      <p>A set of sentences from job postings
▷ ESCO Skill embeddings of type 
and words between brackets (e.g., “Java (programming)”
becomes “Java”). We have three baselines:</p>
      <p>Exact Match: We do exact substring matching with
ESCO and the sentences in both datasets.</p>
      <p>
        Lemmatized Match: ESCO skills are written in the
infinitive form. We take the same approach as exact
match on the training sets, now with the lemmatized
data of both. The data is lemmatized with MaChAmp
v0.2 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        POS Sequence Match: Motivated by the
observation that certain POS sequences often overlap between
sources (Figure 2, E-G), we attempt to match POS
sequences within ESCO with the POS sequences in the
datasets. For example N O U N - N O U N , N O U N , V E R B - N O U N and
A D J - N O U N sequences are commonly occurring in all three
sources.
2.3. Skill Representations
We investigate several encoding strategies to match
ngram representations to embedded ESCO skills, the
approaches are inspired by Litschko et al. [19], where they
applied them to Information Retrieval. The language
models (LMs) used to encode the data are RoBERTa [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
and the domain-specific JobBERT [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. All obtained vector
representations of skill phrases with the three previous
encoding methods are compared pairwise with each
ntion of the methods (see Figure 3):
      </p>
      <p>Span in Isolation (ISO): We encode skill phrases 
from ESCO in isolation using the aforementioned LMs,
without surrounding contexts.</p>
      <p>Average over Contexts (AOC): We leverage the
surrounding context of a skill phrase  by collecting all the
 ← 
 ←  
 ← ∅
for  ∈</p>
      <p>▷
do
where    is the number of occurrences of   and  the
total number of tokens in our dataset. We encode the
Exact</p>
      <p>Lemma</p>
      <p>POS</p>
      <p>ISO</p>
      <p>AOC</p>
      <p>WSE</p>
      <p>ISO</p>
      <p>AOC</p>
      <p>WSE
SkillSpan</p>
      <p>Strict-F1</p>
      <p>Loose-F1</p>
      <sec id="sec-3-1">
        <title>CosSim threshold (0.8).</title>
        <p>2 LMs and 3 methods it provides the best cutof.
⃗ = ∑(−log</p>
        <p>) ⋅ ⃗ .
⃗</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Analysis of Results</title>
      <p>Results</p>
      <p>Our main results (Figure 4) show the baselines
against ISO, AOC, and WSE of both datasets. We
evaluate with two types of F1, following van der Goot et al.
[20]: s t r i c t and l o o s e - F 1 . For full model fine-tuning,
RoBERTa achieves 91.31 and 98.55 strict and loose F1
on Sayfullina respectively. For SkillSpan, this is 23.21
and 44.72 strict and loose F1 (on the available subsets of
SkillSpan). JobBERT achieves 90.18 and 98.19 strict and
loose F1 on Sayfullina, 49.44 and 74.41 strict and loose F1
on SkillSpan. The large diference between results is most
likely due to lack of negatives in Sayfullina, i.e., all
senskills. A threshold allows us to have a “no skill” option. tences contain a skill, which makes the task easier. These
As seen in Figure 5, Appendix A the threshold sensitivity
results highlight the dificulty of SE on</p>
      <p>SkillSpan, where</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>there are negatives as well (sentences with no skills).</p>
      <p>The exact match baseline on SkillSpan is higher than
Sayfullina. We attribute this to SkillSpan also containing We thank the NLPnorth group for feedback on an
ear“hard skills” (e.g., “Python”), which is easier to match lier version of this paper—in particular, Elisa Bassignana
substrings with than “soft skills”. 6 and Max Müller-Eberstein for insightful discussions. We</p>
      <p>For the performance of the skill representations on would also like to thank the anonymous reviewers for
Sayfullina, RoBERTa and JobBERT outperform the Exact their comments to improve this paper. Last, we also thank
and Lemmatized baseline on strict-F1. For the POS base- NVIDIA and the ITU High-performance Computing
clusline, only the ISO method of both models is slightly better. ter for computing resources. This research is supported
JobBERT performs better than RoBERTa in strict-F1 on by the Independent Research Fund Denmark (DFF) grant
both datasets. 9131-00019B.</p>
      <p>There is a substantial diference between strict and
loose-F1 on both datasets. This indicates that there References
is partial overlap among the predicted and gold spans.</p>
      <p>RoBERTa performs best for Sayfullina, achieving 59.61
loose-F1 with WSE. In addition, the best performing
method for JobBERT is also WSE (52.69 loose-F1). For
SkillSpan we see a drop, JobBERT outperforms RoBERTa
with AOC (32.30 vs. 26.10 loose-F1) given a threshold
of CosSim = 0.8. We hypothesize this drop in
performance compared to Sayfullina could be attributed again
to SkillSpan containing negative examples as well (i.e.,
sentences with no skill).</p>
      <p>Qualitative Analysis A qualitative analysis (Table 2)
reveals there is strong partial overlap with gold vs.
predicted spans on both datasets, e.g., “...strong leadership
and team management skills...” vs. “...strong leadership
and team management skills...”, indicating the viability of
this method.</p>
    </sec>
    <sec id="sec-6">
      <title>4. Conclusion</title>
      <p>We investigate whether the ESCO skill taxonomy suits
as weak supervision signal for Skill Extraction. We apply
several skill representation methods based on previous
work. We show that using representations of ESCO skills
can aid us in this task. We achieve high loose-F1,
indicating there is partial overlap between the predicted and
gold spans, but need refined of-set methods to get the
correct span out (e.g., human post-editing or automatic
methods such as candidate filtering). Nevertheless, we
see this approach as a strong alternative for supervised
Skill Extraction from job postings.</p>
      <p>Future work could include going towards multilingual
Skill Extraction, as ESCO consists of 27 languages, exact
matching should be trivial. For the other methods several
considerations need to be taken into account, e.g., a
POStagger and/or lemmatizer for another language and a
language-specific model.
6The exact numbers (+precision and recall) are in Table 3,
Appendix A, including the definition of strict and loose-F1.
E. Davidson, M.-C. de Marnefe, V. de Paiva, M. O.</p>
      <p>Derin, E. de Souza, A. Diaz de Ilarraza, C.
Dickerson, A. Dinakaramani, E. Di Nuovo, B. Dione,
P. Dirix, K. Dobrovoljc, T. Dozat, K. Droganova,
P. Dwivedi, H. Eckhof, S. Eiche, M. Eli, A. Elkahky,
B. Ephrem, O. Erina, T. Erjavec, A. Etienne, W.
Evelyn, S. Facundes, R. Farkas, M. Fernanda, H.
Fernandez Alcalde, J. Foster, C. Freitas, K. Fujita, K.
Gajdošová, D. Galbraith, M. Garcia, M. Gärdenfors,
S. Garza, F. F. Gerardi, K. Gerdes, F. Ginter, G. Godoy,
I. Goenaga, K. Gojenola, M. Gökırmak, Y. Goldberg,
X. Gómez Guinovart, B. González Saavedra, B.
Griciūtė, M. Grioni, L. Grobol, N. Grūzītis, B.
Guillaume, C. Guillot-Barbance, T. Güngör, N. Habash,
H. Hafsteinsson, J. Hajič, J. Hajič jr., M. Hämäläinen,
L. Hà Mỹ , N.-R. Han, M. Y. Hanifmuti, S.
Hardwick, K. Harris, D. Haug, J. Heinecke, O.
Hellwig, F. Hennig, B. Hladká, J. Hlaváčová, F.
Hociung, P. Hohle, E. Huber, J. Hwang, T. Ikeda,
A. K. Ingason, R. Ion, E. Irimia, Ọ. Ishola, K. Ito,
T. Jelínek, A. Jha, A. Johannsen, H. Jónsdóttir, F.
Jørgensen, M. Juutinen, S. K, H. Kaşıkara, A. Kaasen,
N. Kabaeva, S. Kahane, H. Kanayama, J.
Kanerva, N. Kara, B. Katz, T. Kayadelen, J.
Kenney, V. Kettnerová, J. Kirchner, E. Klementieva,
A. Köhn, A. Köksal, K. Kopacewicz, T.
Korkiakangas, N. Kotsyba, J. Kovalevskaitė, S. Krek, P.
Krishnamurthy, O. Kuyrukçu, A. Kuzgun, S. Kwak,
V. Laippala, L. Lam, L. Lambertino, T. Lando,
S. D. Larasati, A. Lavrentiev, J. Lee, P. Lê Hồ ng,
A. Lenci, S. Lertpradit, H. Leung, M. Levina, C. Y.</p>
      <p>Li, J. Li, K. Li, Y. Li, K. Lim, B. Lima Padovani,
K. Lindén, N. Ljubešić, O. Loginova, A. Luthfi,
M. Luukko, O. Lyashevskaya, T. Lynn, V.
Macketanz, A. Makazhanov, M. Mandl, C. Manning,
R. Manurung, B. Marşan, C. Mărănduc, D. Mareček,
K. Marheinecke, H. Martínez Alonso, A. Martins,
J. Mašek, H. Matsuda, Y. Matsumoto, A. Mazzei,
R. McDonald, S. McGuinness, G. Mendonça,
N. Miekka, K. Mischenkova, M. Misirpashayeva,
A. Missilä, C. Mititelu, M. Mitrofan, Y. Miyao, A.
Mojiri Foroushani, J. Molnár, A. Moloodi, S.
Montemagni, A. More, L. Moreno Romero, G. Moretti, K. S.</p>
      <p>Mori, S. Mori, T. Morioka, S. Moro, B. Mortensen,
B. Moskalevskyi, K. Muischnek, R. Munro, Y.
Murawaki, K. Müürisep, P. Nainwani, M. Nakhlé, J. I.</p>
      <p>Navarro Horñiacek, A. Nedoluzhko, G.
NešporeBērzkalne, M. Nevaci, L. Nguyễn Thị, H. Nguyễn
Thị Minh, Y. Nikaido, V. Nikolaev, R. Nitisaroj,
A. Nourian, H. Nurmi, S. Ojala, A. K. Ojha,
A. Olúòkun, M. Omura, E. Onwuegbuzia, P.
Osenova, R. Östling, L. Øvrelid, Ş. B. Özateş, M.
Özçelik, A. Özgür, B. Öztürk Başaran, H. H. Park,
N. Partanen, E. Pascual, M. Passarotti, A.
Patejuk, G. Paulino-Passos, A. Peljak-Łapińska, S. Peng,
C.-A. Perez, N. Perkova, G. Perrier, S. Petrov, ers: State-of-the-art natural language
processD. Petrova, J. Phelan, J. Piitulainen, T. A. Piri- ing, in: Proceedings of the 2020 Conference on
nen, E. Pitler, B. Plank, T. Poibeau, L. Ponomareva, Empirical Methods in Natural Language
ProcessM. Popel, L. Pretkalniņa, S. Prévost, P. Prokopidis, ing: System Demonstrations, Association for
ComA. Przepiórkowski, T. Puolakainen, S. Pyysalo, P. Qi, putational Linguistics, Online, 2020, pp. 38–45.
A. Rääbis, A. Rademaker, T. Rama, L. Ramasamy, URL: https://aclanthology.org/2020.emnlp-demos.6.
C. Ramisch, F. Rashel, M. S. Rasooli, V. Ravis- doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 0 . e m n l p - d e m o s . 6 .
hankar, L. Real, P. Rebeja, S. Reddy, G. Rehm, I. Ri- [18] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT:
abov, M. Rießler, E. Rimkutė, L. Rinaldi, L. Rit- Pre-training of deep bidirectional transformers for
uma, L. Rocha, E. Rögnvaldsson, M. Romanenko, language understanding, in: Proceedings of the
R. Rosa, V. Roșca, D. Rovati, O. Rudina, J. Rueter, 2019 Conference of the North American
ChapK. Rúnarsson, S. Sadde, P. Safari, B. Sagot, A. Sa- ter of the Association for Computational
Linguishala, S. Saleh, A. Salomoni, T. Samardžić, S. Sam- tics: Human Language Technologies, Volume 1
son, M. Sanguinetti, E. Sanıyar, D. Särg, B. Saulīte, (Long and Short Papers), Association for
ComY. Sawanakunanon, S. Saxena, K. Scannell, S. Scar- putational Linguistics, Minneapolis, Minnesota,
lata, N. Schneider, S. Schuster, L. Schwartz, D. Sed- 2019, pp. 4171–4186. URL: https://aclanthology.org/
dah, W. Seeker, M. Seraji, M. Shen, A. Shimada, N19-1423. doi:1 0 . 1 8 6 5 3 / v 1 / N 1 9 - 1 4 2 3 .
H. Shirasu, Y. Shishkina, M. Shohibussirri, D. Sichi- [19] R. Litschko, I. Vulić, S. P. Ponzetto, G. Glavaš, On
nava, J. Siewert, E. F. Sigurðsson, A. Silveira, N. Sil- cross-lingual retrieval with multilingual text
enveira, M. Simi, R. Simionescu, K. Simkó, M. Šimková, coders, Information Retrieval Journal (2022) 1–35.
K. Simov, M. Skachedubova, A. Smith, I. Soares- [20] R. van der Goot, I. Sharaf, A. Imankulova, A. Üstün,
Bastos, C. Spadine, R. Sprugnoli, S. Steingrímsson, M. Stepanovic, A. Ramponi, S. O. Khairunnisa,
A. Stella, M. Straka, E. Strickland, J. Strnadová, M. Komachi, B. Plank, From masked-language
modA. Suhr, Y. L. Sulestio, U. Sulubacak, S. Suzuki, eling to translation: Non-English auxiliary tasks
Z. Szántó, D. Taji, Y. Takahashi, F. Tamburini, improve zero-shot spoken language understanding,
M. A. C. Tan, T. Tanaka, S. Tella, I. Tellier, M. Testori, in: Proceedings of the 2021 Conference of the North
G. Thomas, L. Torga, M. Toska, T. Trosterud, American Chapter of the Association for
ComputaA. Trukhina, R. Tsarfaty, U. Türk, F. Tyers, S. Ue- tional Linguistics: Human Language Technologies,
matsu, R. Untilov, Z. Urešová, L. Uria, H. Uszkor- Volume 1 (Long and Short Papers), Association for
eit, A. Utka, S. Vajjala, R. van der Goot, M. Van- Computational Linguistics, Mexico City, Mexico,
hove, D. van Niekerk, G. van Noord, V. Varga, 2021.</p>
      <p>E. Villemonte de la Clergerie, V. Vincze, N. Vlasova,
A. Wakasa, J. C. Wallenberg, L. Wallin, A. Walsh,
J. X. Wang, J. N. Washington, M. Wendt, P.
Widmer, S. Williams, M. Wirén, C. Wittern, T.
Woldemariam, T.-s. Wong, A. Wróblewska, M. Yako,
K. Yamashita, N. Yamazaki, C. Yan, K. Yasuoka,
M. M. Yavrumyan, A. B. Yenice, O. T. Yıldız, Z. Yu,
Z. Žabokrtský, S. Zahra, A. Zeldes, H. Zhu, A.
Zhuravleva, R. Ziane, Universal dependencies 2.8.1,
2021. LINDAT/CLARIAH-CZ digital library at the
Institute of Formal and Applied Linguistics (ÚFAL),
Faculty of Mathematics and Physics, Charles
University.
[16] D. Kondratyuk, M. Straka, 75 languages, 1 model:</p>
      <p>Parsing universal dependencies universally, in:
Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th
International Joint Conference on Natural Language</p>
      <p>Processing (EMNLP-IJCNLP), 2019, pp. 2779–2795.
[17] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C.
Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M.
Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma,
Y. Jernite, J. Plu, C. Xu, T. Le Scao, S.
Gugger, M. Drame, Q. Lhoest, A. Rush,
TransformX
40
re30
o
-sc20
1
F10
0
↓ Method, Metric →</p>
      <p>Strict (P | R | F1)</p>
      <p>Loose (P | R | F1)</p>
      <p>Strict (P | R | F1)</p>
      <p>C
60
40
20
0
F
40
30
20
10
0
I
40
20
0</p>
      <p>Loose (P | R | F1)</p>
    </sec>
    <sec id="sec-7">
      <title>A. Exact Results</title>
      <p>This is called s t r i c t - F 1 . In the second variant, we seek
for partial matches, i.e., overlap between the predicted
Definition F1 As mentioned, we evaluate with two and gold span including the correct label, which counts
types of F1-scores, following van der Goot et al. [20]. The towards true positives for precision and recall. This is
ifrst type is the commonly used span-F1, where only the called l o o s e - F 1 . We consider the loose variant as well,
correct span and label are counted towards true positives. because we want to analyze whether the span is “almost</p>
      <p>Exact Numbers Results We show the exact numbers
of Figure 4 in Table 3 and more detailed results in
Figure 5. Results show that there is high precision among
the baseline approaches compared to recall. This is
balanced using the representation methods for Sayfullina.</p>
      <p>However, we observe that there is much higher recall for
SkillSpan than precision.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Brynjolfsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McAfee</surname>
          </string-name>
          ,
          <article-title>Race against the machine: How the digital revolution is accelerating innovation, driving productivity, and irreversibly transforming employment and the economy</article-title>
          ,
          <source>Brynjolfsson and McAfee</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Brynjolfsson</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. McAfee,</surname>
          </string-name>
          <article-title>The second machine age: Work, progress, and prosperity in a time of brilliant technologies</article-title>
          ,
          <source>WW Norton &amp; Company</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Balog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. De Rijke</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Serdyukov</surname>
          </string-name>
          , L. Si, Expertise retrieval,
          <source>Foundations and Trends in Information Retrieval</source>
          <volume>6</volume>
          (
          <year>2012</year>
          )
          <fpage>127</fpage>
          -
          <lpage>256</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Sayfullina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Malmi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kannala</surname>
          </string-name>
          ,
          <article-title>Learning representations for soft skill matching</article-title>
          ,
          <source>in: International Conference on Analysis of Images, Social Networks and Texts</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>141</fpage>
          -
          <lpage>152</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Tamburri</surname>
          </string-name>
          , W.-J. Van Den Heuvel, M. Garriga,
          <article-title>Dataops for societal intelligence: a data pipeline for labor market skills extraction and matching</article-title>
          ,
          <source>in: 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>391</fpage>
          -
          <lpage>394</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Chernova</surname>
          </string-name>
          ,
          <article-title>Occupational skills extraction with FinBERT</article-title>
          ,
          <source>Master's Thesis</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          , Kompetencer:
          <article-title>Fine-grained skill classification in danish job postings via distant supervision and transfer learning, Under Review</article-title>
          ,
          <string-name>
            <surname>LREC</surname>
          </string-name>
          <year>2022</year>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sonniks</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Plank,</surname>
          </string-name>
          <article-title>SkillSpan: Hard and soft skill extraction from English job postings, in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</article-title>
          , Seattle, United States,
          <year>2022</year>
          , pp.
          <fpage>4962</fpage>
          -
          <lpage>4984</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Green</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maynard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Development of a benchmark corpus to support entity recognition in job descriptions</article-title>
          ,
          <source>in: Proceedings of the Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>1201</fpage>
          -
          <lpage>1208</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .lrec-
          <volume>1</volume>
          .
          <fpage>128</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>A</surname>
          </string-name>
          .
          <string-name>
            <surname>-S. Gnehm</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <article-title>BÃ¼hlmann, S. Clematide, Evaluation of transfer learning and domain adaptation for analyzing german-speaking job advertisements</article-title>
          ,
          <source>in: Proceedings of the Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>3892</fpage>
          -
          <lpage>3901</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .lrec-
          <volume>1</volume>
          .
          <fpage>414</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bhola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Halder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Prasad</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kan</surname>
          </string-name>
          ,
          <article-title>Retrieving skills from job descriptions: A language model based extreme multi-label classification framework</article-title>
          ,
          <source>in: Proceedings of the 28th International Conference on Computational Linguistics</source>
          ,
          <source>International Committee on Computational Linguistics</source>
          , Barcelona,
          <source>Spain (Online)</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>5832</fpage>
          -
          <lpage>5842</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>M. le Vrang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Papantoniou</surname>
            , E. Pauwels,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Fannes</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Vandensteen</surname>
          </string-name>
          , J. De Smedt,
          <article-title>Esco: Boosting job matching in europe with semantic interoperability</article-title>
          ,
          <source>Computer</source>
          <volume>47</volume>
          (
          <year>2014</year>
          )
          <fpage>57</fpage>
          -
          <lpage>64</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          ,
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          , arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>R. van der Goot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Üstün</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramponi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sharaf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          ,
          <article-title>Massive choice, ample tasks (MaChAmp): A toolkit for multi-task learning in NLP, in: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics</article-title>
          , Online,
          <year>2021</year>
          , pp.
          <fpage>176</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Zeman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nivre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Abrams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Ackermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Aepli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Aghaei</surname>
          </string-name>
          , Ž. Agić,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahmadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ahrenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Ajede</surname>
          </string-name>
          , G. Aleksandravičiūtė,
          <string-name>
            <surname>I. Alfina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antonsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Aplonova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aquino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Aragon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Aranzabe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. N.</given-names>
            <surname>Arıcan</surname>
          </string-name>
          , H⁀. Arnardóttir, G. Arutie,
          <string-name>
            <given-names>J. N.</given-names>
            <surname>Arwidarasti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Asahara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Aslan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ateyah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Atmaca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Attia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Atutxa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Augustinus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Badmaeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Balasubramani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ballesteros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Barbu Mititelu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Barkarson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basmov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Batchelor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Bedir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bengoetxea</surname>
          </string-name>
          , G. Berk,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Berzak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. A.</given-names>
            <surname>Bhat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Bhat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Biagetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bielinskienė</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bjarnadóttir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Blokland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bobicev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Boizou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Borges</given-names>
            <surname>Völker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Börstell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bosco</surname>
          </string-name>
          , G. Bouma,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bowman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Boyd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Braggaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Brokaitė</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Burchardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Candito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Caron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Caron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cassidy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Cavalcanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. Cebiroğlu</given-names>
            <surname>Eryiğit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Cecchini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. G. A.</given-names>
            <surname>Celano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Čéplö</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cesur</surname>
          </string-name>
          , S. Cetin, Ö. Çetinoğlu,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chalub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chauhan</surname>
          </string-name>
          , E. Chi,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Cignarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cinková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Collomb</surname>
          </string-name>
          , Ç. Çöltekin,
          <string-name>
            <given-names>M.</given-names>
            <surname>Connor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Courtin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cristescu</surname>
          </string-name>
          , P. Daniel,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>