<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Skills-Hunter: Adapting Large Language Models to the Labour Market for Skills Extraction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonio Serino</string-name>
          <email>a.serino3@campus.unimib.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Milan</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Italy</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Natural Language Processing, Large Language Models, Skill Extraction</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Economics, Management and Statistics, University of Milan-Bicocca</institution>
          ,
          <addr-line>Piazza dell'Ateneo Nuovo, 1, 20126</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In recent years, natural language processing (NLP) technologies have made a significant contribution in addressing a number of labour market tasks. One of the most interesting challenges is the automatic extraction of competences from unstructured texts. This paper presents an automated approach that exploits the latest NLP technologies, such as LLM, to overcome the data labelling problem in the task of extracting skills from job advertisements.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>In the global economy, the labor market serves as the domain where the interplay between
employer demand and applicant supply is characterized, forming a dense and intense system
for the exchange of human resources. Over time, a series of social dynamics has caused this
area to evolve more and more into the infosphere: the advent of job search platforms, such
as Linkedin, has enabled the digitisation of the entire labour market, thus facilitating the
emergence of Labour Market Information. Digital processing of data, including job adverts and
CVs, enables the opportunity to collect, analyse and interpret a wide range of labour market
data, including skills in demand. The analysis of large amounts of textual data allows the
extraction of structured information from unstructured data. Skill extraction is still an open
challenge that plays a key role in the labour market1. This task carries substantial importance
as its successful completion would facilitate valuable analyses for various stakeholders in the
sector. For instance, the efective extraction of skills from CVs could enhance the eficiency
of selection processes. Furthermore, analyzing the trends of the skills most in demand in job
advertisements could provide valuable insights for training institutions to better tailor their
curricula.</p>
      <p>In order to tackle all the tasks described, we need a tool that represents and organizes the skills
and competences in an organized manner, providing a lingua franca for the labour market. ESCO
is the multilingual European classification of Skills, Competences and Occupations, which aims
to create a more knowledgeable and aligned European labour market by improving the match
between labour demand and supply on the European labour market, facilitating labour mobility
and human resource management on a transnational level, providing useful information to
training providers, and keeping up-to-date through Data Driven approaches.</p>
      <sec id="sec-2-1">
        <title>1.1. Research Objective and Question</title>
        <p>
          Recent works in the literature are divided into that which exploits classification models that
exploit annotated data to treat the task as a Named Entity Recognition (NER) problem and that
which uses Large Language Models (LLM) to treat the extraction of skills as a text summary
task. In the past year, LLMs have demonstrated remarkable flexibility, allowing the seampless
execution of several tasks starting from the same pre-trained model without sacrificing good
performance [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>Our objective is to efectively enhance Large Language Models (LLMs) and tailor them to
the domain of the labor market. This adaptation enables us to address critical tasks, such as
skill extraction, while harnessing the pre-existing knowledge encapsulated within LLMs. This
approach eliminates the necessity for resource-intensive and costly retraining of the whole
model, while consistently achieving State-of-the-Art (SOTA) performance levels.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>2. Background and Related Work</title>
      <p>In this section, we will initially provide an exposition on two foundational technologies that
underpin the proposed methodologies for skills extraction: Transformer models and Large
Language Models (LLMs). Following this introduction, we will proceed to present relevant prior
research related to automatic skills extraction.</p>
      <sec id="sec-3-1">
        <title>2.1. Transformer Models</title>
        <p>
          Transformer models [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] are a type of deep learning architecture that has made significant
advancements in various other fields of machine learning, especially in Natural Language
Processing (NLP). The key innovation of Transformer models is the use of a self-attention
mechanism, which allows them to capture dependencies between diferent words or elements
in a sequence, regardless of their distance from each other. This self-attention mechanism
allows Transformers to excel at handling long-range dependencies and has largely replaced
recurrent neural networks (RNNs) and convolutional neural networks (CNNs) in many NLP tasks.
The self-attention mechanism allows the model to weigh the importance of diferent words or
elements in a sequence when processing each word, capturing contextual information efectively.
This, among others, allows to generate vector representations of words contextualized in the
sentences they belong to.
        </p>
        <p>In addition to the advantage of being able to realise multiple tasks through deep textual
understanding, Transformer architectures can parallelise the computation of self-attention due
to the inherent structure of the architecture that does not require sequential dependencies
between tokens. This has enabled training on huge amounts of data using devices with limited
resources.</p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Large Language Models</title>
        <p>Over the past year, the entire NLP has been revolutionised by the advent of Large Language
Models, i.e. language models based on Transformer architectures whose number of parameters
is exceptionally large.</p>
        <p>
          LLMs are models based on transformer architectures: given an input sequence X=(x1,...,xi-1)
where X is a text, an LLM is trained to predict the most probable token xi, following xi-1. To do
so, they always exploit the self-attention mechanism, but extend the number of layers, as it has
been shown that increasing the scale of parameters increases the models capabilities [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>The increasing comprehension capabilities of the models make them capable of generating text.
The generation of text by Large Language Models is a process based on conditional probability.
The model uses an initial context, generally a sequence of tokens, to predict the probability of
the next token in the sequence. This prediction is done through the Softmax formula, which
converts the scores associated with each token into a valid probability distribution. The Softmax
normalises the scores so that they represent a normality distribution. Then the model selects the
next token based on the predicted probabilities, generating the one with the highest probability
and continuing the generation process until it reaches a stop condition.</p>
        <p>
          The ability to generate texts from a natural language input prompt, coupled with the capacity
for deep understanding of texts, has enabled the emergence of conversational agents capable of
performing certain tasks simply by using their pre-trained knowledge [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>2.3. Related Works</title>
        <p>As indicated above, various approaches have been explored in the recent literature to address
the skills recognition task. One approach is the use of Transformer-based models to address
the problem of competence recognition in a supervised learning contexts [5, 6]. However,
these eforts have revealed a significant limitation, namely the scarcity of adequate annotated
data. Often, too limited data samples are used, which do not capture the full range of existing
variations in skills. Furthermore, the use of annotated data requires constant maintenance and
updating in order to reflect emerging skills in the labour market. Another strategy investigated
consists of adapting masked language models to the labour market context, using the ESCO
taxonomy [7] as a reference, to perform various tasks, including skills recognition. A further
attempt was to exploit the knowledge base of LLMs to tackle the skills recognition task through
summarising and embedding strategiess [8]. However, this methodology could lead to a potential
loss of information, especially since the summary is treated as a single embedding vector, thus
compromising the complete representation of skills.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Proposed Method</title>
      <p>As described in Sec.2.3, one of the main limitations of current skill extraction techniques, is
the necessity of huge amounts of labelled data. To alleviate this problem, the first step of the
proposed methodology aims at generating realistic job vacancies with labels of the related
occupation and the contained skills. Preliminary experiments have shown us that LLMs are
able to generate Online Job Advertisements (OJAs) very close to the real ones, through the
specification of the ESCO job profession and the ESCO skills that must be present in the
OJA, being able to create a data structure based on key-value pairs that for each generated
advertisement keeps track of the job title, OJA text and the skills present in the text. To improve
the quality of such manufactured OJAs, we are going to fine tune them based on the output of a
discriminator network, trained to discriminate real and manufactured data in a GAN fashion.</p>
      <p>In the second step, we want to find a way that allows us to smartly merge manufacturing
data with ESCO concepts, thus creating a data structure to fine-tune the model to fit its weights
to the labour market domain and to fulfil two specific tasks: the recognition of job occupations
described in the OJA and the extraction of explicit and implicit skills expressed in them. However,
full fine-tuning presents some limitations: (i) first, it can lead to the so-called catastrophic
forgetting [9], that is when a model trained on one task or dataset forgets its ability to perform
well on a previously learned task or dataset when it is trained on a new and diferent task, and (ii)
second, it require massive computational capabilities, especially when dealing with transformer
models. Therefore, we resort to Parameter-Eficient Fine-Tuning (PEFT), a technique designed
to fine-tune models while minimising the need for resources and high costs [ 10]. One of
the possibilities of performing PEFT is Low-Rank Adaptation (LoRA). LoRA is an application
methodology of PEFT that freezes the pre-trained model weights and injects trainable rank
decomposition matrices into each layer of the architecture, greatly reducing the number of
trainable parameters [11]. We will divide the manufacturing OJAs into two parts, using the first
part to fine-tune the model and the second part to evaluate it.</p>
      <p>We will create a first prompt that will take as a input an OJA asking the fine tuned model to
process it and extract the recognised occupation label from the input OJA text. The recognised
occupation label will be used as a query within a store containing the vector representation of
all ESCO occupation labels and will return the most similar ESCO occupation, which we will
define as OCC. We will then create a second prompt which will again contain the OJA provided
as input and which will also specify the job occupation of the OJA. The objective of the second
prompt is to ask the fine tuned model to parse the OJA and extract the explicit and implicit
skill labels within it, relevant to the ESCO job occupation provided as input, adapting the skills
to the semantic context described by the OJA to obtain a ’precise’ extraction and asking it to
return the list of skills.</p>
      <p>Having obtained the list of skills  = ( 1, … ,   ) for each of its elements, we will perform a
semantic comparison with the previously extracted ESCO occupation. We expect that, within a
multidimensional vector space, the representation of relevant and pertinent skills for the input
OJA is ”close” to the representation of the job occupation described by the OJA. This would
allow us to validate the semantic quality of the skills extracted by the model: all skills that
exceed a similarity threshold will therefore be kept. This choice is also justified by the desire to
try to overcome the possible problem of hallucinations [12], whereby the model might extract
elements that are not semantically consistent with the OJA.</p>
      <p>Having obtained the filtered list of skills  = ( 1, … ,   ) with  ≤  , each of its elements will
be used as a query within a second vector archive containing the representation of each ESCO
skill, which will return as a result the top-1 skill closest to the skill used as a query. At the
end of this, given an OJA, we expect to obtain as output the list of ESCO competences present
within it. Having for each manufactured OJA the list of ESCO skills used for generation, we will
perform an evaluation of the methodology by comparing the ESCO skills found and the ESCO
skills efectively present, calculating the typical information retrieval metrics of Precision and
Recall. Some preliminary experiments seem to overcome all the limitations present in recent
SOTA work.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Expected Contributions to the Community</title>
      <p>We proposed a new method for extracting and standardising skills based on the ESCO taxonomy.
The idea is to adapt a model for the extraction of explicit and implicit skills. Current techniques
in the area of Skills Intelligence sufers from the scarcity of labelled data. In this research we are
going to study the automatic creation of a gold standard for the trainig and validation of skill
extraction methods. Then, we proposed a skill extraction methodology using LLMs. In addition,
another interesting application could be to create a prompter for new or alternative skills labels,
useful for enriching the ESCO taxonomy. Future developments could be the development of a
tool for filtering OJA, with the aim of reducing noise and optimising calculation time by keeping
only those parts of the OJA relevant for competence extraction.
[5] A. Nigam, S. Tyagi, K. Tyagi, A. Saxena, Skillbert:“skilling” the bert to classify skills!
(2020).
[6] M. Chernova, Occupational skills extraction with finbert, 2020.
[7] M. Zhang, R. van der Goot, B. Plank, Escoxlm-r: Multilingual taxonomy-driven pre-training
for the job market domain, arXiv (2023).
[8] N. Li, B. Kang, T. De Bie, Skillgpt: a restful api service for skill extraction and
standardization using a large language model, arXiv (2023).
[9] V. V. Ramasesh, A. Lewkowycz, E. Dyer, Efect of scale on catastrophic forgetting in neural
networks, in: International Conference on Learning Representations, 2021.
[10] Z. Hu, Y. Lan, L. Wang, W. Xu, E.-P. Lim, R. K.-W. Lee, L. Bing, X. Xu, S. Poria, Llm-adapters:</p>
      <p>An adapter family for parameter-eficient fine-tuning of large language models, 2023.
[11] E. J. Hu, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, W. Chen, et al., Lora:
Lowrank adaptation of large language models, in: International Conference on Learning
Representations, 2021.
[12] Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, P. Fung, Survey
of hallucination in natural language generation, ACM Computing Surveys 55 (2023) 1–38.
URL: https://doi.org/10.1145%2F3571730. doi:10.1145/3571730.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Child</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Amodei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          , et al.,
          <article-title>Language models are unsupervised multitask learners</article-title>
          ,
          <source>OpenAI blog 1</source>
          (
          <year>2019</year>
          )
          <article-title>9</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCandlish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Henighan</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chess</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Scaling laws for neural language models</article-title>
          ,
          <source>arXiv</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Min</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , P. Liu,
          <string-name>
            <given-names>J.-Y.</given-names>
            <surname>Nie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>A survey of large language models</article-title>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>