-

Recommender Systems, September

Job Postings using Weak Supervision

Mike Zhang

0 2

Kristian Nørgaard Jensen

0 2

Rob van der Goot

robv@itu.dk 0 2

Barbara Plank

b.plank@lmu.de 0 1 2 0 IT University of Copenhagen , Rued Langgaards Vej 7, 2300, Copenhagen , Denmark 1 Ludwig Maximilian University of Munich , Akademiestraße 7, 80799, Munich , Germany 2 Skill Extraction, Weak Supervision, Information Extraction , Job Postings, Skill Taxonomy, ESCO

2022

1 8 23

Aggregated data obtained from job postings provide powerful insights into labor market demands, and emerging skills, and aid job matching. However, most extraction approaches are supervised and thus need costly and time-consuming annotation. To overcome this, we propose Skill Extraction with Weak Supervision. We leverage the European Skills, Competences, Qualifications and Occupations taxonomy to find similar skills in job ads via latent representations. The method shows a strong positive signal, outperforming baselines based on token-level and syntactic patterns.

This could be alleviated by using predefined skill inven

1. Introduction

The labor market is under constant development— often due to changes in technology, migration, and digitization—and so are the skill sets required [ 1, 2 ]. Consequentially, large quantities of job vacancy data is emerging on a variety of platforms. Insights from this data on labor market skill set demands could aid, for instance, job matching [ 3 ]. The task of automatic skill extraction (SE) is to extract the competences necessary for any occupation from unstructured text.

Previous work on supervised SE frame it as a sequence labeling task (e.g., [ 4, 5, 6, 7, 8, 9, 10 ]) or multi-label classiifcation [ 11 ]. Annotation is a costly and time-consuming tories.

In this work, we approach span-level SE with weak supervision: We leverage the European Skills, Competences, Qualifications and Occupations (ESCO; [ 12 ]) taxonomy and find similar spans that relate to ESCO skills in embedding space (Figure 1). The advantages are twofold: First, labeling skills becomes obsolete, which mitigates the cumbersome process of annotation. Second, by extracting skill phrases, this could possibly enrich skill inventories (e.g., ESCO) by finding paraphrases of existing skills. We seek to answer: How viable is Weak Supervision RecSys in HR’22: The 2nd Workshop on Recommender Systems for Human Resources, in conjunction with the 16th ACM Conference on ∗Corresponding author. http://bplank.github.io/ (B. Plank) must

ESCO Python C# Language Model

θ skills and n-grams are extracted and embedded through a We label spans from job postings close in vector space to the

ESCO skill.

for two skill-based datasets.1 pervised method for SE; 2 A linguistic analysis of ESCO skills and their presence in job postings; 3 An empirical analysis of diferent embedding pooling methods for SE

Methodology 2.

∈ ESCO.

is a set of sequences (e.g., job posting sentences) with the th input sequence

= [ 1, 2, ..., ] and a target sequence of B I O -labels = [ 1, 2, ..., ] (e.g., “B - S K I L L ”, “I - S K I L L ”, “O ”).2 The goal is to use an algorithm, which predicts skill spans by assigning an output label sequence for each token sequence from a job posting based on representational similarity of a span to any skill in in the context of SE? We contribute: 1 A novel weakly su- Formally, we consider a set of job postings , where D Distribution of ESCO Skills (Trigrams) 17 csen 15 lia rcFyeeunq11750202 ilirrccssssvaeeeou lfttrrsaaeeeoooohgdw lliltrccsaaeeeoouhhppd littccccaeeoonuhddTrigilfrrsaaaeeeendppdmramiliittrcaaaeeongdgm lfttrssyaaeeeehhnu lillftccaeeooohdw ittrcsssyaaeeeongmm iitrcsssssyaeeohunpm 6 7 8 Token Length

E 2000 itttrrsaeenunnddw illttrrcvyaaeenb trsskaeeonunnddp itrcsssvaeeoudm lfttsyaaeehh irrcsssveeeu ltrsaeeoohgd ilirccssvaeeo lirccsaeeeounnpm itrcsyaeuhpm

Bigram 400 y cen300 u q rFe200 100 0

t aeng ienpm traee iitaann frreom seu lveeopd itroonm irveodp rraeepp am euq op m p

Unigram

ESCO Statistics We use ESCO as a weak supervision signal for discovering skills in job postings. There are 13,890 ESCO skills.4 In Figure 2, we show statistics of the taxonomy: (A) On average most skills are 3 tokens long.

In (C-D), we show n-grams frequencies with range [1; 3].

We can see that the most frequent uni- and bigrams are verbs, while the most frequent trigrams consist of nouns.

Additionally, we show an analysis of ESCO skills from a linguistic perspective. We tag the training data using the publicly available MaChAmp v0.2 model [ 14 ] trained on all Universal Dependencies 2.7 treebanks [ 15 ].5 Then, we count the most frequent Part-of-Speech (POS) tags in all sources of data (E-G). ESCO’s most frequent tag sequences are V E R B - N O U N , these are not as frequent in Sayfullina nor SkillSpan. Sayfullina mostly consists of adjectives, which is attributed to the categorization of soft skills. SkillSpan mostly consists of N O U N sequences.

Overall, we observe most skills consist of verb and noun phrases. 2.1. Data We use the datasets from [ 8 ] (SkillSpan) and a modifica- 2.2. Baselines tion of [ 4 ] (Sayfullina).3 In Table 1, we show the statistics of both. SkillSpan contain nested labels for skill and As our approach is to find similar n-grams based on ESCO knowledge components [ 12 ]. To make it fit for our weak skills, we choose an n-gram range of [1; 4] (where 4 is the supervision approach, we simplify their dataset by con- median) derived from Figure 2 (A). For higher matching sidering both skills and knowledge labels as one label probability, we apply an additional pre-processing step to (i.e., B - K N O W L E D G E becomes B - S K I L L ). the ESCO skills by removing non-tokens (e.g., brackets) 3In contrast to SkillSpan, Sayfullina has a skill in every sentence, where they focus on categorizing sentences for soft skills. 4Per 25-03-2022, taking ESCO v1.0.9. 5A Udify-based [16] multi-task model for POS, lemmatization, dependency parsing, built on top of the transformers library [17], and specifically using mBERT [ 18].

Spans in Isolation (ISO)

Average over Contexts (AOC)

Weighted Span Embedding (WSE) with each token’s inverse document frequency as weight (right). For the middle and right methods, is the number of sentences where the ESCO skill appears.

Algorithm 1 Weakly Supervised Skill Extraction Require: ∈ { RoBERTa, JobBERT} Require: ∈ { ISO, AOC, WSE} Require: ∈ [ 0, 1 ]

A set of sentences from job postings ▷ ESCO Skill embeddings of type and words between brackets (e.g., “Java (programming)” becomes “Java”). We have three baselines:

Exact Match: We do exact substring matching with ESCO and the sentences in both datasets.

Lemmatized Match: ESCO skills are written in the infinitive form. We take the same approach as exact match on the training sets, now with the lemmatized data of both. The data is lemmatized with MaChAmp v0.2 [ 14 ].

POS Sequence Match: Motivated by the observation that certain POS sequences often overlap between sources (Figure 2, E-G), we attempt to match POS sequences within ESCO with the POS sequences in the datasets. For example N O U N - N O U N , N O U N , V E R B - N O U N and A D J - N O U N sequences are commonly occurring in all three sources. 2.3. Skill Representations We investigate several encoding strategies to match ngram representations to embedded ESCO skills, the approaches are inspired by Litschko et al. [19], where they applied them to Information Retrieval. The language models (LMs) used to encode the data are RoBERTa [ 13 ] and the domain-specific JobBERT [ 8 ]. All obtained vector representations of skill phrases with the three previous encoding methods are compared pairwise with each ntion of the methods (see Figure 3):

Span in Isolation (ISO): We encode skill phrases from ESCO in isolation using the aforementioned LMs, without surrounding contexts.

Average over Contexts (AOC): We leverage the surrounding context of a skill phrase by collecting all the ← ← ← ∅ for ∈

▷ do where is the number of occurrences of and the total number of tokens in our dataset. We encode the Exact

Lemma

POS

ISO

AOC

WSE

ISO

AOC

WSE SkillSpan

Strict-F1

Loose-F1

CosSim threshold (0.8).

2 LMs and 3 methods it provides the best cutof. ⃗ = ∑(−log

) ⋅ ⃗ . ⃗

3. Analysis of Results

Results

Our main results (Figure 4) show the baselines against ISO, AOC, and WSE of both datasets. We evaluate with two types of F1, following van der Goot et al. [20]: s t r i c t and l o o s e - F 1 . For full model fine-tuning, RoBERTa achieves 91.31 and 98.55 strict and loose F1 on Sayfullina respectively. For SkillSpan, this is 23.21 and 44.72 strict and loose F1 (on the available subsets of SkillSpan). JobBERT achieves 90.18 and 98.19 strict and loose F1 on Sayfullina, 49.44 and 74.41 strict and loose F1 on SkillSpan. The large diference between results is most likely due to lack of negatives in Sayfullina, i.e., all senskills. A threshold allows us to have a “no skill” option. tences contain a skill, which makes the task easier. These As seen in Figure 5, Appendix A the threshold sensitivity results highlight the dificulty of SE on

SkillSpan, where

Acknowledgments

there are negatives as well (sentences with no skills).

The exact match baseline on SkillSpan is higher than Sayfullina. We attribute this to SkillSpan also containing We thank the NLPnorth group for feedback on an ear“hard skills” (e.g., “Python”), which is easier to match lier version of this paper—in particular, Elisa Bassignana substrings with than “soft skills”. 6 and Max Müller-Eberstein for insightful discussions. We

For the performance of the skill representations on would also like to thank the anonymous reviewers for Sayfullina, RoBERTa and JobBERT outperform the Exact their comments to improve this paper. Last, we also thank and Lemmatized baseline on strict-F1. For the POS base- NVIDIA and the ITU High-performance Computing clusline, only the ISO method of both models is slightly better. ter for computing resources. This research is supported JobBERT performs better than RoBERTa in strict-F1 on by the Independent Research Fund Denmark (DFF) grant both datasets. 9131-00019B.

There is a substantial diference between strict and loose-F1 on both datasets. This indicates that there References is partial overlap among the predicted and gold spans.

RoBERTa performs best for Sayfullina, achieving 59.61 loose-F1 with WSE. In addition, the best performing method for JobBERT is also WSE (52.69 loose-F1). For SkillSpan we see a drop, JobBERT outperforms RoBERTa with AOC (32.30 vs. 26.10 loose-F1) given a threshold of CosSim = 0.8. We hypothesize this drop in performance compared to Sayfullina could be attributed again to SkillSpan containing negative examples as well (i.e., sentences with no skill).

Qualitative Analysis A qualitative analysis (Table 2) reveals there is strong partial overlap with gold vs. predicted spans on both datasets, e.g., “...strong leadership and team management skills...” vs. “...strong leadership and team management skills...”, indicating the viability of this method.

4. Conclusion

We investigate whether the ESCO skill taxonomy suits as weak supervision signal for Skill Extraction. We apply several skill representation methods based on previous work. We show that using representations of ESCO skills can aid us in this task. We achieve high loose-F1, indicating there is partial overlap between the predicted and gold spans, but need refined of-set methods to get the correct span out (e.g., human post-editing or automatic methods such as candidate filtering). Nevertheless, we see this approach as a strong alternative for supervised Skill Extraction from job postings.

Future work could include going towards multilingual Skill Extraction, as ESCO consists of 27 languages, exact matching should be trivial. For the other methods several considerations need to be taken into account, e.g., a POStagger and/or lemmatizer for another language and a language-specific model. 6The exact numbers (+precision and recall) are in Table 3, Appendix A, including the definition of strict and loose-F1. E. Davidson, M.-C. de Marnefe, V. de Paiva, M. O.

Derin, E. de Souza, A. Diaz de Ilarraza, C. Dickerson, A. Dinakaramani, E. Di Nuovo, B. Dione, P. Dirix, K. Dobrovoljc, T. Dozat, K. Droganova, P. Dwivedi, H. Eckhof, S. Eiche, M. Eli, A. Elkahky, B. Ephrem, O. Erina, T. Erjavec, A. Etienne, W. Evelyn, S. Facundes, R. Farkas, M. Fernanda, H. Fernandez Alcalde, J. Foster, C. Freitas, K. Fujita, K. Gajdošová, D. Galbraith, M. Garcia, M. Gärdenfors, S. Garza, F. F. Gerardi, K. Gerdes, F. Ginter, G. Godoy, I. Goenaga, K. Gojenola, M. Gökırmak, Y. Goldberg, X. Gómez Guinovart, B. González Saavedra, B. Griciūtė, M. Grioni, L. Grobol, N. Grūzītis, B. Guillaume, C. Guillot-Barbance, T. Güngör, N. Habash, H. Hafsteinsson, J. Hajič, J. Hajič jr., M. Hämäläinen, L. Hà Mỹ , N.-R. Han, M. Y. Hanifmuti, S. Hardwick, K. Harris, D. Haug, J. Heinecke, O. Hellwig, F. Hennig, B. Hladká, J. Hlaváčová, F. Hociung, P. Hohle, E. Huber, J. Hwang, T. Ikeda, A. K. Ingason, R. Ion, E. Irimia, Ọ. Ishola, K. Ito, T. Jelínek, A. Jha, A. Johannsen, H. Jónsdóttir, F. Jørgensen, M. Juutinen, S. K, H. Kaşıkara, A. Kaasen, N. Kabaeva, S. Kahane, H. Kanayama, J. Kanerva, N. Kara, B. Katz, T. Kayadelen, J. Kenney, V. Kettnerová, J. Kirchner, E. Klementieva, A. Köhn, A. Köksal, K. Kopacewicz, T. Korkiakangas, N. Kotsyba, J. Kovalevskaitė, S. Krek, P. Krishnamurthy, O. Kuyrukçu, A. Kuzgun, S. Kwak, V. Laippala, L. Lam, L. Lambertino, T. Lando, S. D. Larasati, A. Lavrentiev, J. Lee, P. Lê Hồ ng, A. Lenci, S. Lertpradit, H. Leung, M. Levina, C. Y.

Li, J. Li, K. Li, Y. Li, K. Lim, B. Lima Padovani, K. Lindén, N. Ljubešić, O. Loginova, A. Luthfi, M. Luukko, O. Lyashevskaya, T. Lynn, V. Macketanz, A. Makazhanov, M. Mandl, C. Manning, R. Manurung, B. Marşan, C. Mărănduc, D. Mareček, K. Marheinecke, H. Martínez Alonso, A. Martins, J. Mašek, H. Matsuda, Y. Matsumoto, A. Mazzei, R. McDonald, S. McGuinness, G. Mendonça, N. Miekka, K. Mischenkova, M. Misirpashayeva, A. Missilä, C. Mititelu, M. Mitrofan, Y. Miyao, A. Mojiri Foroushani, J. Molnár, A. Moloodi, S. Montemagni, A. More, L. Moreno Romero, G. Moretti, K. S.

Mori, S. Mori, T. Morioka, S. Moro, B. Mortensen, B. Moskalevskyi, K. Muischnek, R. Munro, Y. Murawaki, K. Müürisep, P. Nainwani, M. Nakhlé, J. I.

Navarro Horñiacek, A. Nedoluzhko, G. NešporeBērzkalne, M. Nevaci, L. Nguyễn Thị, H. Nguyễn Thị Minh, Y. Nikaido, V. Nikolaev, R. Nitisaroj, A. Nourian, H. Nurmi, S. Ojala, A. K. Ojha, A. Olúòkun, M. Omura, E. Onwuegbuzia, P. Osenova, R. Östling, L. Øvrelid, Ş. B. Özateş, M. Özçelik, A. Özgür, B. Öztürk Başaran, H. H. Park, N. Partanen, E. Pascual, M. Passarotti, A. Patejuk, G. Paulino-Passos, A. Peljak-Łapińska, S. Peng, C.-A. Perez, N. Perkova, G. Perrier, S. Petrov, ers: State-of-the-art natural language processD. Petrova, J. Phelan, J. Piitulainen, T. A. Piri- ing, in: Proceedings of the 2020 Conference on nen, E. Pitler, B. Plank, T. Poibeau, L. Ponomareva, Empirical Methods in Natural Language ProcessM. Popel, L. Pretkalniņa, S. Prévost, P. Prokopidis, ing: System Demonstrations, Association for ComA. Przepiórkowski, T. Puolakainen, S. Pyysalo, P. Qi, putational Linguistics, Online, 2020, pp. 38–45. A. Rääbis, A. Rademaker, T. Rama, L. Ramasamy, URL: https://aclanthology.org/2020.emnlp-demos.6. C. Ramisch, F. Rashel, M. S. Rasooli, V. Ravis- doi:1 0 . 1 8 6 5 3 / v 1 / 2 0 2 0 . e m n l p - d e m o s . 6 . hankar, L. Real, P. Rebeja, S. Reddy, G. Rehm, I. Ri- [18] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: abov, M. Rießler, E. Rimkutė, L. Rinaldi, L. Rit- Pre-training of deep bidirectional transformers for uma, L. Rocha, E. Rögnvaldsson, M. Romanenko, language understanding, in: Proceedings of the R. Rosa, V. Roșca, D. Rovati, O. Rudina, J. Rueter, 2019 Conference of the North American ChapK. Rúnarsson, S. Sadde, P. Safari, B. Sagot, A. Sa- ter of the Association for Computational Linguishala, S. Saleh, A. Salomoni, T. Samardžić, S. Sam- tics: Human Language Technologies, Volume 1 son, M. Sanguinetti, E. Sanıyar, D. Särg, B. Saulīte, (Long and Short Papers), Association for ComY. Sawanakunanon, S. Saxena, K. Scannell, S. Scar- putational Linguistics, Minneapolis, Minnesota, lata, N. Schneider, S. Schuster, L. Schwartz, D. Sed- 2019, pp. 4171–4186. URL: https://aclanthology.org/ dah, W. Seeker, M. Seraji, M. Shen, A. Shimada, N19-1423. doi:1 0 . 1 8 6 5 3 / v 1 / N 1 9 - 1 4 2 3 . H. Shirasu, Y. Shishkina, M. Shohibussirri, D. Sichi- [19] R. Litschko, I. Vulić, S. P. Ponzetto, G. Glavaš, On nava, J. Siewert, E. F. Sigurðsson, A. Silveira, N. Sil- cross-lingual retrieval with multilingual text enveira, M. Simi, R. Simionescu, K. Simkó, M. Šimková, coders, Information Retrieval Journal (2022) 1–35. K. Simov, M. Skachedubova, A. Smith, I. Soares- [20] R. van der Goot, I. Sharaf, A. Imankulova, A. Üstün, Bastos, C. Spadine, R. Sprugnoli, S. Steingrímsson, M. Stepanovic, A. Ramponi, S. O. Khairunnisa, A. Stella, M. Straka, E. Strickland, J. Strnadová, M. Komachi, B. Plank, From masked-language modA. Suhr, Y. L. Sulestio, U. Sulubacak, S. Suzuki, eling to translation: Non-English auxiliary tasks Z. Szántó, D. Taji, Y. Takahashi, F. Tamburini, improve zero-shot spoken language understanding, M. A. C. Tan, T. Tanaka, S. Tella, I. Tellier, M. Testori, in: Proceedings of the 2021 Conference of the North G. Thomas, L. Torga, M. Toska, T. Trosterud, American Chapter of the Association for ComputaA. Trukhina, R. Tsarfaty, U. Türk, F. Tyers, S. Ue- tional Linguistics: Human Language Technologies, matsu, R. Untilov, Z. Urešová, L. Uria, H. Uszkor- Volume 1 (Long and Short Papers), Association for eit, A. Utka, S. Vajjala, R. van der Goot, M. Van- Computational Linguistics, Mexico City, Mexico, hove, D. van Niekerk, G. van Noord, V. Varga, 2021.

E. Villemonte de la Clergerie, V. Vincze, N. Vlasova, A. Wakasa, J. C. Wallenberg, L. Wallin, A. Walsh, J. X. Wang, J. N. Washington, M. Wendt, P. Widmer, S. Williams, M. Wirén, C. Wittern, T. Woldemariam, T.-s. Wong, A. Wróblewska, M. Yako, K. Yamashita, N. Yamazaki, C. Yan, K. Yasuoka, M. M. Yavrumyan, A. B. Yenice, O. T. Yıldız, Z. Yu, Z. Žabokrtský, S. Zahra, A. Zeldes, H. Zhu, A. Zhuravleva, R. Ziane, Universal dependencies 2.8.1, 2021. LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University. [16] D. Kondratyuk, M. Straka, 75 languages, 1 model:

Parsing universal dependencies universally, in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language

Processing (EMNLP-IJCNLP), 2019, pp. 2779–2795. [17] T. Wolf, L. Debut, V. Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, A. Rush, TransformX 40 re30 o -sc20 1 F10 0 ↓ Method, Metric →

Strict (P | R | F1)

Loose (P | R | F1)

Strict (P | R | F1)

C 60 40 20 0 F 40 30 20 10 0 I 40 20 0

Loose (P | R | F1)

A. Exact Results

This is called s t r i c t - F 1 . In the second variant, we seek for partial matches, i.e., overlap between the predicted Definition F1 As mentioned, we evaluate with two and gold span including the correct label, which counts types of F1-scores, following van der Goot et al. [20]. The towards true positives for precision and recall. This is ifrst type is the commonly used span-F1, where only the called l o o s e - F 1 . We consider the loose variant as well, correct span and label are counted towards true positives. because we want to analyze whether the span is “almost

Exact Numbers Results We show the exact numbers of Figure 4 in Table 3 and more detailed results in Figure 5. Results show that there is high precision among the baseline approaches compared to recall. This is balanced using the representation methods for Sayfullina.

However, we observe that there is much higher recall for SkillSpan than precision.

[1]

Brynjolfsson ,

McAfee , Race against the machine: How the digital revolution is accelerating innovation, driving productivity, and irreversibly transforming employment and the economy , Brynjolfsson and McAfee , 2011 .

[2]

Brynjolfsson , A. McAfee, The second machine age: Work, progress, and prosperity in a time of brilliant technologies , WW Norton & Company , 2014 .

[3]

Balog ,

Fang , M. De Rijke , P. Serdyukov , L. Si, Expertise retrieval, Foundations and Trends in Information Retrieval 6 ( 2012 ) 127 - 256 .

[4]

Sayfullina ,

Malmi ,

Kannala , Learning representations for soft skill matching , in: International Conference on Analysis of Images, Social Networks and Texts , 2018 , pp. 141 - 152 .

[5]

D. A.

Tamburri , W.-J. Van Den Heuvel, M. Garriga, Dataops for societal intelligence: a data pipeline for labor market skills extraction and matching , in: 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI) , IEEE, 2020 , pp. 391 - 394 .

[6]

Chernova , Occupational skills extraction with FinBERT , Master's Thesis ( 2020 ).

[7]

Zhang ,

K. N.

Jensen ,

Plank , Kompetencer: Fine-grained skill classification in danish job postings via distant supervision and transfer learning, Under Review , LREC 2022 ( 2022 ).

[8]

Zhang ,

K. N.

Jensen ,

Sonniks , B. Plank, SkillSpan: Hard and soft skill extraction from English job postings, in: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics , Seattle, United States, 2022 , pp. 4962 - 4984 .

[9]

Green ,

Maynard ,

Lin , Development of a benchmark corpus to support entity recognition in job descriptions , in: Proceedings of the Language Resources and Evaluation Conference , European Language Resources Association, Marseille, France, 2022 , pp. 1201 - 1208 . URL: https://aclanthology.org/ 2022 .lrec- 1 . 128 .

[10] A . -S. Gnehm , E. BÃ¼hlmann, S. Clematide, Evaluation of transfer learning and domain adaptation for analyzing german-speaking job advertisements , in: Proceedings of the Language Resources and Evaluation Conference , European Language Resources Association, Marseille, France, 2022 , pp. 3892 - 3901 . URL: https://aclanthology.org/ 2022 .lrec- 1 . 414 .

[11]

Bhola ,

Halder ,

Prasad , M.-

Kan , Retrieving skills from job descriptions: A language model based extreme multi-label classification framework , in: Proceedings of the 28th International Conference on Computational Linguistics , International Committee on Computational Linguistics , Barcelona, Spain (Online) , 2020 , pp. 5832 - 5842 .

[12] M. le Vrang , A.

Papantoniou , E. Pauwels, P.

Fannes , D.

Vandensteen , J. De Smedt, Esco: Boosting job matching in europe with semantic interoperability , Computer 47 ( 2014 ) 57 - 64 .

[13]

Liu ,

Ott ,

Goyal ,

Du ,

Joshi ,

Chen ,

Levy ,

Lewis ,

Zettlemoyer ,

Stoyanov , Roberta: A robustly optimized bert pretraining approach , arXiv preprint arXiv: 1907 . 11692 ( 2019 ).

[14] R. van der Goot ,

Üstün ,

Ramponi ,

Sharaf ,

Plank , Massive choice, ample tasks (MaChAmp): A toolkit for multi-task learning in NLP, in: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, Association for Computational Linguistics , Online, 2021 , pp. 176 - 197 .

[15]

Zeman ,

Nivre ,

Abrams ,

Ackermann ,

Aepli ,

Aghaei , Ž. Agić,

Ahmadi ,

Ahrenberg ,

C. K.

Ajede , G. Aleksandravičiūtė, I. Alfina ,

Antonsen ,

Aplonova ,

Aquino ,

Aragon ,

M. J.

Aranzabe ,

B. N.

Arıcan , H⁀. Arnardóttir, G. Arutie,

J. N.

Arwidarasti ,

Asahara ,

D. B.

Aslan ,

Ateyah ,

Atmaca ,

Attia ,

Atutxa ,

Augustinus ,

Badmaeva ,

Balasubramani ,

Ballesteros ,

Banerjee ,

Bank ,

Barbu Mititelu ,

Barkarson ,

Basmov ,

Batchelor ,

Bauer ,

S. T.

Bedir ,

Bengoetxea , G. Berk,

Berzak ,

I. A.

Bhat ,

R. A.

Bhat ,

Biagetti ,

Bick ,

Bielinskienė ,

Bjarnadóttir ,

Blokland ,

Bobicev ,

Boizou ,

E. Borges

Völker ,

Börstell ,

Bosco , G. Bouma,

Bowman ,

Boyd ,

Braggaar ,

Brokaitė ,

Burchardt ,

Candito ,

Caron ,

Cassidy ,

Cavalcanti ,

G. Cebiroğlu

Eryiğit ,

F. M.

Cecchini ,

G. G. A.

Celano ,

Čéplö ,

Cesur , S. Cetin, Ö. Çetinoğlu,

Chalub ,

Chauhan , E. Chi,

Chika ,

Cho ,

Choi ,

Chun ,

A. T.

Cignarella ,

Cinková ,

Collomb , Ç. Çöltekin,

Connor ,

Courtin ,

Cristescu , P. Daniel,