<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Ontology Learning for ESCO: Leveraging LLMs to Navigate Labor Dynamics</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jarno</forename><surname>Vrolijk</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of Amsterdam</orgName>
								<address>
									<addrLine>Plantage Muidergracht 12</addrLine>
									<postCode>1018TV</postCode>
									<settlement>Amsterdam</settlement>
									<country key="NL">Netherlands</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<address>
									<addrLine>Randstad, Diemermere 25</addrLine>
									<postCode>1112TC</postCode>
									<settlement>Diemen</settlement>
									<country key="NL">Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Victor</forename><surname>Poslavsky</surname></persName>
							<affiliation key="aff1">
								<address>
									<addrLine>Randstad, Diemermere 25</addrLine>
									<postCode>1112TC</postCode>
									<settlement>Diemen</settlement>
									<country key="NL">Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Thijmen</forename><surname>Bijl</surname></persName>
							<affiliation key="aff1">
								<address>
									<addrLine>Randstad, Diemermere 25</addrLine>
									<postCode>1112TC</postCode>
									<settlement>Diemen</settlement>
									<country key="NL">Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Maksim</forename><surname>Popov</surname></persName>
							<affiliation key="aff1">
								<address>
									<addrLine>Randstad, Diemermere 25</addrLine>
									<postCode>1112TC</postCode>
									<settlement>Diemen</settlement>
									<country key="NL">Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Rana</forename><surname>Mahdavi</surname></persName>
							<affiliation key="aff1">
								<address>
									<addrLine>Randstad, Diemermere 25</addrLine>
									<postCode>1112TC</postCode>
									<settlement>Diemen</settlement>
									<country key="NL">Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mohammad</forename><surname>Shokri</surname></persName>
							<affiliation key="aff1">
								<address>
									<addrLine>Randstad, Diemermere 25</addrLine>
									<postCode>1112TC</postCode>
									<settlement>Diemen</settlement>
									<country key="NL">Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Ontology Learning for ESCO: Leveraging LLMs to Navigate Labor Dynamics</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">ADFDAF85A8528DBAAEAC220B12462A35</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T19:21+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Knowledge Graph</term>
					<term>Natural Language Processing</term>
					<term>Ontology Learning</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The labor market is a dynamic environment that supports numerous knowledge-driven applications through ontologies, such as ESCO and O*NET. Maintaining the relevance and accuracy of information within these ontologies and taxonomies is both resource-intensive and time-consuming. In this paper, we propose an ontology learning system that utilizes self-supervised learning, retrieval-augmented generation, and autoregressive language models to identify, classify, and link labor market mentions and entities from raw job postings. Additionally, we demonstrate the language model's ability to discover "alternative labels" and "preferred labels", and perform relation classification.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Labor market ontologies enable the organization of information about jobs, skills, and qualifications, facilitating communication between job seekers and employers <ref type="bibr" target="#b0">[1]</ref>. However, the labor market is a constantly evolving environment influenced by technological advancements, increasing individual choices, and shifting demographics. Consequently, educators, job seekers, and lifelong learners struggle to identify the relevant knowledge, skills, abilities, and competencies needed to distinguish themselves, each with unique objectives. Keeping these individuals and organizations informed about labor market developments in a timely and accurate manner is challenging and requires significant time and resources.</p><p>While many knowledge-driven applications, such as ESCO <ref type="bibr" target="#b1">[2]</ref> and the O*NET <ref type="bibr" target="#b2">[3]</ref>, have proven valuable in addressing some of the challenges within the labor market, they struggle to keep the information in their ontologies and taxonomies relevant and up-to-date <ref type="bibr" target="#b3">[4]</ref>. These ontologies provide information about occupations, knowledge, skills, competences, and qualifications. Constructing these systems is complex, and current approaches are inadequate in handling the incomplete and dynamic nature of real-world knowledge graphs (KGs) <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b5">6]</ref>. These approaches often fail to represent unseen entities, overlook the abundant textual information in ontologies and taxonomies, and are frequently based on ontological commitments that render them taskspecific <ref type="bibr" target="#b4">[5]</ref>. Extensive research has been conducted on the (semi-)automated identification of terms, types, relations, and potential axioms from text, a process known as Ontology Learning (OL) <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b10">11]</ref>. Traditional methods for semi-automated extraction rely on lexico-syntactic pattern mining and clustering. However, considering the individual stages of OL-i) mention extraction (ME) and term typing (TT), ii) the discovery of hierarchical relationships, and iii) the discovery of non-taxonomic relationships-the recent advancements in large language models (LLMs) offer a cost-effective and scalable solution to OL.</p><p>LLMs enable the development of general-purpose and adaptable language models that can be tailored to various natural language processing (NLP) tasks such as classification, generation, and sequence labeling. Adapting LLMs to specific NLP tasks involves two phases: the pre-training phase, to obtain pre-trained language models (PLMs) typically formalized as a cloze-style task (i.e., sequential and/or masked language models), and the downstream phase, which involves fine-tuning the model or prompt tuning <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b12">13]</ref>. In the downstream phase, KGs are considered in recent research to adapt PLMs to tasks such as Named Entity Recognition (NER), Relation Extraction (RE), Open Information Extraction, Entity Linking (EL), and Relation Linking <ref type="bibr" target="#b13">[14]</ref>. To perform these tasks, the PLM is guided by the KG provided by the ontology (i.e., concepts, relations, domain/range constraints) and a set of sentences.</p><p>Despite the significant accomplishments, using PLMs remains challenging and error-prone, irrespective of their size. Firstly, the absence of a grounding mechanism complicates the fact-checking of answers, particularly for tasks with an extractive nature, which are prone to hallucination risks. Plus, many business automation workflows demand a high level of accuracy and thus often incorporate human-in-the-loop interactions for auditing and correcting predictions. This process necessitates knowledge about the precise location of the extracted mentions in the text. Besides, disambiguation of the actual terms requires extra domain-specific knowledge (such as soft skills <ref type="bibr" target="#b14">[15]</ref>), making these processes as tedious and error-prone as their predecessor equivalents (i.e., knowledge-driven applications using ESCO <ref type="bibr" target="#b1">[2]</ref> and O*NET <ref type="bibr" target="#b2">[3]</ref>).</p><p>To tackle the aforementioned issues, in this paper, we proposed a framework with OL to extend and maintain ESCO. The primary contribution of this paper is the development and implementation of a system capable of processing online job postings to extract skills, occupation entities, and their corresponding relationships. Furthermore, the system can identify "new entities" that are not yet included in ESCO, flagging them for further examination by a knowledge or ontology engineer. Our methodology addresses multiple core aspects of OL to answer the following research questions:</p><p>• RQ1: How effective is the proposed system in automated skill mention extraction from online job postings? • RQ2: How effectively is the proposed system classifying non-taxonomic relations between skill and occupation types? • RQ3: Is the proposed system capable of finding existing and/ or new entities that can extend ESCO?</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Works</head><p>OL addresses the challenges of knowledge acquisition and representation across various domains <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b6">7]</ref>. OL can be subdivided into several sub-processes, including the automatic identification and extraction of terms, types, relations, and axioms from text. The study by <ref type="bibr" target="#b16">[17]</ref> introduces LLMs for OL using prompt-based learning. This approach leverages PLMs and cloze-style language prompts to achieve promising results in various NLP tasks, such as sentiment classification, knowledge probing, and natural language inference <ref type="bibr" target="#b17">[18]</ref>.</p><p>In their evaluation, <ref type="bibr" target="#b13">[14]</ref> assessed two open-source LLMs (Vicuna-13B and Alpaca-LoRA-13B with in-context learning) and a sentence transformer model (SBERT T5-XXL) using the benchmark Text2KGBench <ref type="bibr" target="#b13">[14]</ref>. The results indicate high ontological conformance for both Wikidata-TekGen and DBpedia-WebNLG corpora. However, these off-the-shelf LLMs performed poorly on fact extraction, which the authors attribute to a lack of fine-tuning <ref type="bibr" target="#b13">[14]</ref>. To bridge the gap between semantic labelling tasks and text generation models for NER, <ref type="bibr" target="#b18">[19]</ref> proposed GPT-NER. This method transforms the NER task into a text-generation task and includes a self-verification strategy to mitigate the excessive confidence of LLMs <ref type="bibr" target="#b18">[19]</ref>. The results show performance comparable to fully supervised baselines based on BERT.</p><p>Additionally, <ref type="bibr" target="#b19">[20]</ref> introduced LMDX, a methodology for using LLMs in information extraction, particularly from visually rich documents. Their approach achieved a new state-of-the-art on publicly available benchmarks such as CORD and VRDU <ref type="bibr" target="#b19">[20]</ref>. Despite these notable findings, the authors primarily focused on extracting mentions from text while grounding their predictions.</p><p>In the labor market, many researchers leverage occupation ontologies to extract relevant information from job posts. The work in <ref type="bibr" target="#b20">[21]</ref> adapts a language model from ESCO to extract and classify skill requirements from German-speaking job descriptions. The work in <ref type="bibr" target="#b21">[22]</ref> detects skills that are literally or implicitly mentioned in job ads and links them to ESCO. Besides, <ref type="bibr" target="#b22">[23]</ref> fine-tuned the Llama model for extracting skills from job advertisements and user profiles. The authors of <ref type="bibr" target="#b23">[24]</ref> investigate the zero-shot approach for extracting skills based on ESCO. Despite the aforementioned work, the work in <ref type="bibr" target="#b24">[25]</ref> pre-trains a skill-aware language model usable for domain-specific downstream tasks, such as job classification or skill extraction.</p><p>There have been several works addressing OL in the labor market domain. The work by <ref type="bibr" target="#b25">[26]</ref>, proposes NEO, a framework using approximately 2 million online job vacancies for the enrichment of ESCO occupations. NEO identified 49 novel occupations of which 43 were validated by an expert panel <ref type="bibr" target="#b25">[26]</ref>. Furthermore, <ref type="bibr" target="#b7">[8]</ref> proposed OntoJob, a cost-efficient unsupervised framework that identifies and extracts knowledge, skill, ability, and competence mentions and their corresponding relations using the C-value method and smoothed point-wise mutual information (SPMI). <ref type="bibr" target="#b26">[27]</ref> investigate the use of large language models for skill extraction leveraging in-context learning to test two different prompting strategies. While there are parallels to the work presented by us, <ref type="bibr" target="#b26">[27]</ref> does not focus on knowledge discovery and relation classification.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head><p>To ensure the labor market ontology remains current and accurate, we propose a three-layered framework: the knowledge layer, the back-end layer, and the front-end layer. The knowledge layer houses the existing knowledge graph, initially based on established labor market graphs such as ESCO. The back-end layer processes online job postings over specified intervals (e.g., weekly or monthly). This layer recommends new labor market mentions and entities, including occupations and skills. The front-end layer serves as the user interface (UI), enabling a humanin-the-loop mechanism via human annotators for updating the knowledge graph. Figure <ref type="figure" target="#fig_1">1</ref> illustrates each layer in updating the labor market's knowledge graph. Our back-end architecture is divided into three stages: (i) preprocessing, (ii) extraction, and (iii) postprocessing.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">The preprocessing stage</head><p>To address input size limitation in LLMs, we split incoming data into manageable segments, allowing the models to better process and understand context. Given document lengths and PLM context window limitations, each job description text is divided into chunks. Then, The prompt construction is used for formatting language model's inputs and specifying how to generate the output. Carefully designing prompts plays an essential role in specifying exactly what sort of task and output format is expected. In our problem, for the prompt generation stage, we largely adhere to the recommendations by <ref type="bibr" target="#b19">[20]</ref>. We take the full set of documents and apply a prompt template to format the document, beginning with "Document:". Next, we append the task description and schema representation containing the entities to be extracted.</p><p>The task description includes hard-coded instructions to guide the PLM in formatting the output according to the schema. We provide the PLM with the following instruction: "Extract [ENTITY TYPE] from the following document and format the output as a JSON with the following structure: <ref type="bibr">[SCHEMA]</ref>. " In line with <ref type="bibr" target="#b19">[20]</ref>, the schema representation is a structured JSON object, where the keys are the entity types to be extracted, and the values indicate their occurrence (e.g., "" for a single instance and [] for multiple instances). For example, "occupation": "", "skill": [] instructs the PLM to extract a single mention of the entity type "occupation" and multiple mentions of the entity type "skill." In this context, a prompt template is a predefined format used to structure input for the PLM, and the schema is a blueprint describing the structure and organization of the data elements that need to be extracted.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">The extraction stage</head><p>Once the input documents are processed in the preprocessing stage, we can begin extracting the labor market mentions specified by the ESCO. Specifically, we focus on identifying the ESCO skill entities, which involves locating skill mentions within the documents. However, our method can be extended with other mention types like occupations or other entity types that might occur in job postings.</p><p>We define the mention extraction task to be two-fold, namely: (i) identification and extraction of (sub-)strings in a given job posting, and (ii) indicating the term type of the given mention (i.e., is it a skill or occupation mention). In short, given a set of ESCO types 𝒯 = {𝑠𝑘𝑖𝑙𝑙, 𝑜𝑐𝑐𝑢𝑝𝑎𝑡𝑖𝑜𝑛}, we aim to find the mentions 𝑚 of the types 𝒯 from the total set of job posting documents 𝒟.</p><p>Similar to the design by <ref type="bibr" target="#b19">[20]</ref>, we use a decoder-only model for the extraction. A decoder-only model predicts the next word to generate based on the previous words in the sequence. We use the prompt instruction together with the desired output schema to instruct the PLM on what entity mentions we want it to extract. We instruction-tune task-specific parameters in addition to the pre-trained Mistral parameters on a data mixture containing a variety of (document, schema, extraction) tuples <ref type="bibr" target="#b27">[28,</ref><ref type="bibr" target="#b19">20]</ref>. In contrast to the work by <ref type="bibr" target="#b19">[20]</ref>, we have a fixed schema during our instruction tuning phase (i.e., we only train on a domain-specific data mixture). Furthermore, all our documents are online job postings.</p><p>Since we use a decoder-only model for "completing" the "output" with the correct mentions, and types found in the actual job postings, extraction and typing happen at the same time. However, we do want to separate the two different tasks since it is quite possible for the model to extract mentions that are suitable for extraction but are put into the wrong category, which leads to a mention typing mistake.</p><p>Following the earlier instructions on the prompt construction and consequently, the task description and the schema representation, the prompt for the model for the mention extraction and entity linking tasks is as follows:</p><p>"&lt;s&gt;[INST] Extract {types} from the following document and format the output as a json with the following structure: {format} Document: {document} [/INST] {output}&lt;/s&gt;" Then, we collect the mentions that are not being passed from the previous step to report them to the human annotators as metadata that helps them annotate. Afterwards, given the set of extracted mentions ℳ, and the set of ESCO concepts 𝒞, we aim to map the extracted mentions ℳ to the closest entity in the ESCO taxonomy. To achieve this, we define entity linking to provide a many-to-one mapping for all mentions 𝑚 ∈ ℳ to their corresponding entity 𝑐 ∈ 𝒞 as 𝑓 (𝑚) : ℳ ↦ → 𝒞. To create the mapping 𝑓 (.), we propose leveraging a retriever, following the retrieval augmented generation design first proposed by <ref type="bibr" target="#b28">[29]</ref>, where for each extracted mention, the top five closest entities are chosen. Given the extracted mention, we will retrieve the approximate nearest ESCO entities using the retriever. Note that the index will be entity-specific, meaning that the dense vector representations for ESCO skill entities are in a different index than those for ESCO occupation entities. To map extracted skill mentions to ESCO skills, we make use of both the "PrefferedLabel" and the "AlternativeLabels" provided by ESCO for each skill to make it easier for the retriever to retrieve the right skills, if they exist in the taxonomy. Next, we leverage the results from the retriever and combine them with instruction tuning <ref type="bibr" target="#b29">[30]</ref>. We use the following prompt for the familiar entity check task:</p><p>"&lt;s&gt;[INST] Given a skill and options, select the best option that is a semantically exact synonym for the skill. If none of the options is a semantically exact synonym, select 'No Match'. Skill: {skill} Options: {options} Simply answer with the correct option with no explanation. [/INST] {output}&lt;/s&gt;" In contrast to the work by <ref type="bibr" target="#b30">[31]</ref>, we also consider the "No Match" option to indicate that none of the ESCO entities are a good match for the given mention.</p><p>Essentially, the PLM is tasked with flagging the entity as "undiscovered" or mapping it to one of the given ESCO entities provided by the retriever. The retriever acts as a filter to limit the solution space of the matching to the most likely candidates, thus reducing the |𝐸𝑆𝐶𝑂 − 𝑆𝑘𝑖𝑙𝑙𝑠|-classification to a 6-class classification problem instead.</p><p>After classifying the found mentions in the entity linking, we use mentions that were marked as "undiscovered" and occur frequently, to propose new entities to human annotators. When the frequency of a mention exceeds a set threshold, we propose it as a new entity to the human annotators, who can decide whether to add it as a new entity, add it as a synonym to an existing entity, or not add it to the taxonomy at all if it is irrelevant.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">The postprocessing stage</head><p>Conceptualization of the identified and extracted mentions is fourfold, namely; i) mention extraction -identifying and extracting relevant terms, ii) entity linking -mapping the identified and extracted mentions to their corresponding entities, iii) relationship extraction -identification and extraction of the relations between the identified and extracted mentions, and lastly iv) relationship classification -to map the identified and extracted relationships to their domainspecific equivalent. We will primarily focus our attention on the non-taxonomical relations, in particular: i) relationship(s) between mentions 𝑚 ∈ ℳ, and entities 𝑐 ∈ 𝒞 such that for 𝑣 ∈ 𝒱, where 𝒱 is the total set of relation types in our knowledge graph, we look for the triplet relation 𝑟 = (𝑚, 𝑣, 𝑐) ∈ ℛ, and ii) relations between entities such that (𝑐 𝑖 , 𝑣, 𝑐 𝑗 ) ∈ ℛ, where 𝑐 𝑖 ̸ = 𝑐 𝑗 . We will refer to relations i) as classifying whether a mention is an alternative labelor synonym -for the entity in question, whereas we refer to ii) as finding out whether a skill entity is essential, optional, or unrelated to an occupation entity.</p><p>Since our research focuses on a subset of the entities in ESCO, we will also solely focus on the possible links between these entities. As such, we will mainly look at the "IsOptionalFor", and "IsEssentialFor" labels. In a similar fashion to the work by <ref type="bibr" target="#b30">[31]</ref>, we will construct a dataset using the relations found in ESCO. Given that there are only three potential options, namely 𝒮 = {𝑖𝑠𝑂𝑝𝑡𝑖𝑜𝑛𝑎𝑙𝐹 𝑜𝑟, 𝑖𝑠𝐸𝑠𝑠𝑒𝑛𝑡𝑖𝑎𝑙𝐹 𝑜𝑟, 𝑛𝑜𝑡𝑅𝑒𝑙𝑎𝑡𝑒𝑑}, we opt for a similar approach to the entity linking discussed earlier, but without a retriever (since there is no need to reduce the number of classes, as was the case with the entity linking task). As such, we will task the PLM to select the relation between a given skill and occupation entity from the set of relations 𝒮. We use the following prompt for the model for the relation classification task:</p><p>"&lt;s&gt;[INST] Given a skill and an occupation, tell me how important the skill is for the occupation choosing from the following three options: essential, optional or not important. Skill: {skill} Occupation: {occupation} Simply answer with the correct option with no explanation. [/INST] {output}&lt;/s&gt;" Instruction training of the Mistral 7B model was done by leveraging the "AlternativeLabel" and "PreferredLabel" data from ESCO in the generation of a train and test set. We used a true "AlternativeLabel" related to the actual PreferredLabel as a positive example and randomly sampled 𝐾 − 1 non-related "AlternativeLabels" for negative examples. We would then use this dataset and the instruction to train task-specific parameters in addition to the pre-trained Mistral parameters for the entity linking task <ref type="bibr" target="#b27">[28]</ref>. For more information on the evaluation and implementation details, we refer to Section 4.</p><p>In the current implementation, we only take into consideration two relations, namely: i) "isOptionalSkillFor", and ii) "isEssentialSkillFor". Therefore, the check is relatively straightforward: we see if the incoming entity pair (𝑠𝑘𝑖𝑙𝑙, 𝑜𝑐𝑐𝑢𝑝𝑎𝑡𝑖𝑜𝑛) are related via either i) or ii), or we deem that the skill entity is not important to the occupation at all. If the relationship between the (𝑠𝑘𝑖𝑙𝑙, 𝑜𝑐𝑐𝑢𝑝𝑎𝑡𝑖𝑜𝑛) pair did not exist, we propose a new relation with the given prediction.</p><p>Similar to how we suggest new entities to human annotators, we also identify and propose new relations between different entities in the taxonomy. By classifying relations between skill entities and the occupation of the posting, we analyze currently non-existent relations. If a frequently occurring new relation is found, we propose it to human annotators, who can then decide whether to add it to the taxonomy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments</head><p>This paper aims to leverage ESCO to cost-efficiently optimize and fine-tune PLMs for i) mention extraction and term typing, ii) entity linking and knowledge discovery, and iii) relationship classification. These fine-tuned PLMs, in turn, will help in the construction and maintenance of the ontology and taxonomy. As described, we propose an evaluation of the full system and the individual PLMs performance on each of these three tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Dataset</head><p>In order to answer our research questions, we propose the three following experiments. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Experiment 1: Mention Extraction</head><p>In order to extract skill mentions, we make us of the SkillSpan benchmark dataset, which was provided by <ref type="bibr" target="#b31">[32]</ref>. In particular, we employ the publicly available HOUSE, and TECH data annotations, which contain, respectively, 90 and 110 job postings. In addition to the SkillSpan dataset, we also make use of approximately 495 manually annotated job postings. This proprietary dataset contains 1,058 chunks, which in total comprise 24,760 skills. As the data from the SkillSpan dataset and the proprietary dataset used internally were in the BIO-tagging format, it was necessary to transform this data into a list of raw skill mentions, as seen in the texts. It was essential to obtain the full raw skill mentions, as this is the type of output that our extractor will be trained to extract. As our objective is to extract both hard and soft skills, we employ both the "skill" and "knowledge" mentions as annotated in the SkillSpan dataset. Furthermore, while the original data was segmented on a sentence level, we utilize the provided vacancy index to convert the sentences back to their original job posting format.</p><p>Subsequently, the postings were divided into sections of 384 tokens to ensure that the complete prompt, along with the document and the expected output, would always fit within the context window of the model. To assess the performance of both the base and fine-tuned models, we conducted evaluations using all 58 job postings obtained from the SkillSpan test dataset. To ensure reproducibility, no additional job postings were incorporated into the evaluation set. As the input, we provided the model with complete job postings. Subsequently, the generated output, which consists of the extracted raw skill mentions, was utilized to determine the F1 score of the model in categorizing each token in the vacancy text as either a skill (1) or a non-skill (0) token. The F1 score for the test set is presented in Table <ref type="table" target="#tab_2">2</ref>. For the mention extraction, we train a LoRA of the Mistral-7B model. The model is trained on the job posting chunks for four epochs and a batch size of eight. The ADAM optimizer is quantized with an 8-bit precision, with 5 * 10 −5 learning rate, 50 warmup steps, and a weight decay of 0.01.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1.">Baselines</head><p>For the mention extraction, we will utilize the most effective model from the study by <ref type="bibr" target="#b26">[27]</ref> as a baseline. This decision is primarily motivated by the fact that both our study and the study by <ref type="bibr" target="#b26">[27]</ref> use the evaluation data provided by <ref type="bibr" target="#b31">[32]</ref>. Furthermore, <ref type="bibr" target="#b26">[27]</ref> considers an extracted entity correct even if it only partially overlaps with the gold span from the annotation. This aligns with our metric, thus facilitating comparison. Additionally, we will compare the results of our model to those discussed in the blog by <ref type="bibr" target="#b32">[33]</ref>. The Skills Extractor Library, developed by <ref type="bibr" target="#b32">[33]</ref>, employs a NER model based on spaCy's architecture. This model maps the extracted skills to existing taxonomies using semantic similarity. Conducting this comparison will allow us to evaluate the performance of our model relative to their established skill extraction framework. Our last baseline for the mention extraction experiment will be the models developed in the SkillSpan paper <ref type="bibr" target="#b31">[32]</ref>. The two models, one for "knowledge" extraction and one for "skill" extraction, are BERT-based token classification models. We combine the results of both the "knowledge" extractor and the "skill" extractor and consider tokens as either not a skill token or a skill token without making any distinction between "knowledge" or "skill" or the beginning tokens (B) and inside (I) tokens.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Experiment 2: Relation Classification</head><p>In the second experiment, the objective is to classify relations between skill and occupation pairs. To construct the datasets for this experiment, we make use of all known relations between skills and occupations in the ESCO database. For each skill/occupation relation, up to six variations are included, utilizing the available alternative skill labels in ESCO. This encompasses the optional and essential relations. Furthermore, for each of the existing relations, five random skill-occupation combinations are sampled from the taxonomy. The aforementioned random combinations will serve as the not important relation samples. Table <ref type="table" target="#tab_1">1</ref> provides an overview of the dataset distribution. We evaluate the system by computing the F1-score on the test set detailed in Table <ref type="table" target="#tab_1">1</ref>.</p><p>The training setup is as follows: A LoRA of the Mistral-7B model was trained. The model was trained for 1500 steps with a batch size of 32 and 20 warmup steps. The Adam optimizer, quantized to 8 bits, was employed with a learning rate of 5 * 10 −5 and a weight decay of 0.01.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Experiment 3: Knowledge Discovery</head><p>To create the dataset for the entity classification dataset, we again make use of the ESCO alternative labels for the skills. For each skill, we use up to 4 alternative labels to generate data points. Each data point consists of the alternative label that is treated as the raw skill mention, the preferred label as the correct answer, and the top 5 closest skills from the retriever. With this information, we can fill in the prompt and expected output to train the model. For 40% of the data points we show to the model, we will not include the correct skill as an option, and instead, we show 5 incorrect options for which the target response will be "No Match" to allow for the discovery of new skills or to discard mentions that should not be seen as a skill.</p><p>To get insights into the performance of the proposed decoder-only model for i) linking extracted skill mentions to ESCO skill entities, and ii) discovery of new potential ESCO skill entities, we propose the following experimental setup. First, we will test the performance of the model in the entity linking task by evaluating the F1 score on our test set.</p><p>In addition, to know how the model performs on the discovery of new potential ESCO skill entities, we manually annotate 1, 237 skill mentions extracted by our earlier stages in two different stages. In the first stage, we manually annotate whether the skill mention matches one of the five (i.e., assigning it the number of the suggestion) proposed suggestions by the retriever or assign it a 6 if it matches none of the suggestions. Next, we filter out all the skill mentions annotated with a 6, and check them for the following cases: i) the extracted mention itself does not describe a skill (e.g., an occupation or organization name etc.), ii) the mention maps to one (or more) existing ESCO skills but these options were not in the 5 suggestions, iii) the model selected one or more good options for the list but the mention also includes additional skills that are not in the top 5, iv) the mention is a proper skill mention, but ESCO does not contain any suitable skills in the current taxonomy, and lastly, there is not enough context in the extracted "mention" to judge (e.g., the mention just states "development").</p><p>We train a LoRA of the Mistral-7B model. We train the model for 300 steps with a batch size of 2 and 50 warmup steps. We use a quantized Adam 8-bit optimizer with a learning rate of 2.5 * 10 −5 and a weight decay of 0.01. Retrieval of the five "closest" skills, is done with dense retrieval. For embedding the skills, we used MixedBread without any additional fine-tuning.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.1.">Baselines</head><p>To evaluate the performance of our mapper, we will compare its results to those of the mapper introduced by <ref type="bibr" target="#b32">[33]</ref>. The Skills Extractor Library developed by <ref type="bibr" target="#b32">[33]</ref> utilizes MiniLM to encode skills and map them to the closest ESCO entity. To assess the effectiveness of our approach, we use the same dataset with the ESCO alternative labels but without "No Match" data points. We employ the evaluation method from <ref type="bibr" target="#b32">[33]</ref> and present the F1 score in Table <ref type="table" target="#tab_2">2</ref> for comparative analysis. It is worthless that in this case, the mapper from <ref type="bibr" target="#b32">[33]</ref> solves a simpler problem compared to ours since it does not consider "No Match".</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Experiment 1: Mention Extraction</head><p>Table <ref type="table" target="#tab_2">2</ref> demonstrates that the Mistral 7B model (the Base model) has an F1 score of 0.28 for skill mention extraction across 58 job postings. Following the instruction tuning of a Low-Rank Adaptation (LoRA) model on 635 manually annotated job postings (denoted as Base + ME), evaluation on the same set of 58 job postings results in an F1 score of 0.54. This indicates that instruction tuning on self-supervised labor market ontology data enhances performance by approximately 0.26. We also conducted a comprehensive comparison by evaluating three existing methods on the same dataset. Specifically, the model proposed by <ref type="bibr" target="#b26">[27]</ref> yielded an F1 score of 0.46, while the baseline model discussed in <ref type="bibr" target="#b32">[33]</ref> achieved one of 0.27. Notably, the model developed by <ref type="bibr" target="#b31">[32]</ref> outperformed our Base + ME model, achieving an F1 score of 0.80 on the evaluation dataset. These results suggest that our model's performance is comparable to, but not in all cases superior to, other existing methodologies on this task. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Experiment 2: Relation Classification</head><p>Results from Table <ref type="table" target="#tab_2">2</ref> show that the Base model scored a F1 score of 0.54. The instruction finetuned LoRA model, the so-called "Base + RC", scored an F1 of 0.66. Results include incorporation of negative examples as dictated in Table <ref type="table" target="#tab_1">1</ref>, and shuffling the options into a random ordering.</p><p>In total, we see that instruction tuning leads to an approximate performance increase of 0.12.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3.">Experiment 3: Knowledge Discovery</head><p>The third experiment was essentially twofold. Firstly, we report the results of evaluating the parsed skill mentions from the job posting test set. Table <ref type="table" target="#tab_2">2</ref> shows us that the "Base + EL" model scored an F1 score of 0.67, while the Base model only scored an F1 score of 0.30. As such, instruction tuning the Mistral Base leads to an increased performance of approximately 0.37. Besides, we compared our method with the baseline model described in <ref type="bibr" target="#b32">[33]</ref>, which yielded an F1 score of 0.57. In this case, "Base + EL" model showed a better result, despite the fact that it solved a more complex problem, not only identifying the most fitting ESCO skills, but also indicating potentially new ones.</p><p>Results for the manual annotation of the mentions marked as "new entities" showed an F1 score of 0.41. In the cases where the extracted mention is not a valid skill mention or there is not enough context, the model shows an F1 score of 0.42. Lastly, in the case where the extracted mention can map to an existing ESCO skill but this ESCO skill was not included as one of the 5 options that the model could select from, the model obtains an F1 score of only 0.16.</p><p>Manual annotation of the 1, 237 extracted skill mentions showed that 704 of those mentions could not be linked to one of the five provided suggestions of the retriever. Careful examination of the 704 skill mentions that could not be linked to the five provided suggestions showed that 94 had an existing ESCO entity, but the retriever failed to select the appropriate entity for the suggestion. From the 704 total skill mentions 282 were manually annotated to be a potential "New Skill". Additionally, 136 were not actual skill types, and 160 lacked the context to make a valid prediction. In total, the model extracted a total of 253 skill mentions that were annotated as a potential new skill to be reviewed by human annotators for addition to the ESCO taxonomy.</p><p>A few examples of these mentions are: "ReactJS", "AWS", and "Docker".</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Discussion</head><p>Our paper explored three different research questions. To address RQ1, we instruction-tuned a LoRA model to the Mistral Base on a total of 635 job postings. According to 2, the "Base + ME" model outperforms the "Base" model in extracting the skill mentions by scoring an F1 of 0.54 compared to 0.28. Accordingly, although the "Base" model struggles with the extraction of the skill mentions, decoder-only architectures can be instruction-tuned to extract and format skill mentions from raw job posting texts. Plus, on the ME task, the "Base + ME" model outperforms the best-performing model from <ref type="bibr" target="#b26">[27]</ref> by 0.8. While it is difficult to credit this difference solely to the proposed system (i.e., this would require more details on the actual differences between gpt-3.5 turbo and the Mistral Base model), we believe that it demonstrates the competitiveness of our proposed system with the state-of-the-art. We still see that the BERT-based models from <ref type="bibr" target="#b31">[32]</ref> outperform our proposed method. However their proposed system has a major drawback that it requires BIO-tagged data on the token level to train, which is labour-intensive to obtain compared to just having to obtain a list of mentions in the text like we need for our approach.</p><p>To answer RQ2, we performed the relation classification which determines if a skill entity is "optional", "essential", or "not important" to be added ESCO. The results indicate that the autoregressive model is capable of learning the relation classification task via self-supervised instruction tuning from ESCO. However, we only trained the Base + RC model on 32.000 examples due to time constraints. Thus, there exist opportunities to improve the current model's performance by training with more examples.</p><p>To the best of our knowledge, there is no other study that looks at the relation classification between ESCO skill, and occupation entities using autoregressive models as proposed in this work. However, the work by <ref type="bibr" target="#b30">[31]</ref> can be regarded as very similar. <ref type="bibr" target="#b30">[31]</ref> perform entity classification, and relation classification at once, therefore, we can't use their F1 scores for direct comparison. Having said that, the "Base + RC" models' performance appears to be on par with the F1 score of 0.51, outperforming by 0.15 on the slightly different task. <ref type="bibr" target="#b30">[31]</ref> has a model that predicts both the types of the subject and object and the predicate while constraining the possible labels to a predefined set (i.e., choose between skill and occupation for the entity type). On the other hand, the "Base + RC" model only predicts the predicate.</p><p>Lastly, to answer RQ3, in the knowledge discovery experiment, we consider two different experiments. Results from the first experiment help us figure out whether instruction tuning the "Base" model with self-supervised data from ESCO would increase the performance on the entity-linking task. The "Base + EL" model scores approximately 0.37 above the "Base" model, demonstrating the effectivity of self-supervised instruction tuning using ESCO. Additionally, having a more complex task, which includes, in addition to entity linking, the indication of not familiar entities, the "Base + EL" model outperforms the method proposed in <ref type="bibr" target="#b32">[33]</ref> by 0.1. This demonstrates that our approach is highly competitive with other methods in entity linking.</p><p>The second experiment grants us insight into the ability of the decoder-only model to augment and enrich the ESCO with skill mention suggestions for human annotators. The "Base + EL" model suggested 1, 054 skill mentions with no matching ESCO entity, whereas manual annotation by 6 human domain experts revealed only 704. However, there was no further manual annotation of the 1, 054 mentions predicted to have "No Match" and the quality of the suggestions provided by the retriever. We believe that similarly to the 704 mentions that received annotation, there will be a proportion where the provided suggestions by the retriever were wrong (i.e., 94 out of the 704 for which we provided manual annotation had existing ESCO skill entities that were not suggested as an option by the retriever).</p><p>Overall, the "Base + EL" models are capable of selecting mentions that have no match in ESCO. However, there is a need for post-processing steps to filter out "false positives" inter alia; the extracted mention is not a "skill", lacks the full context to make a prediction, and was wrongly classified as "No Match" due to missing suggestion. For this study, post-processing was done manually, leading to flagging 253 extracted skill mentions as a potential addition to ESCO.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Implications and Limitations</head><p>The results from this paper demonstrate the effectiveness and adaptability of using LoRA and decoder-only architectures for ontology learning in the labor market setting. Results indicate that the generation of self-supervised datasets, combined with instruction tuning, leads to impressive performance gains on skill mention extraction, relation classification (i.e., on skill and occupation entity pairs), and lastly, discovery of skill mentions that could potentially extend the labor market ontology/ taxonomy. We believe the unification of LLMs and ontologies can aid in the lacking abilities of knowledge persistence in LLMs. Since the labor market is a constantly changing environment, editing knowledge without re-training the whole LLM is of utmost importance. The proposed system provides an intuitive way to enrich and maintain ontologies (not just labor market specific), while at the same time leveraging the knowledge of the ontology to keep the models up-to-date by creating self-supervised training sets.</p><p>Furthermore, we believe that our results demonstrated the potential strength of using a proposed system to augment and enrich existing labor market ontologies and/ or taxonomies (i.e., ESCO, the O*NET, etc.). In particular, our results show the pivotal role of the retriever in easing the construction of self-supervised data that the decoder-only model easily leverages via instruction tuning. Additionally, our models demonstrate utility in alleviating the time-and resource constraints in human annotation by training "smaller" language models to assist (i.e., models that fit on a single GPU).</p><p>The current study has some limitations to be considered. Firstly, for entity linking, we tried leveraging the skill attribute "AlternativeLabel" as provided by ESCO. However, we did so under the assumption that the listed alternative labels are in a way synonyms to the "PreferredLabel". This assumption does not necessarily hold in reality. Secondly, the current study did not experiment with hyperparameter tuning during model training and evaluation. There is considerable room for improvement of the models via heyperparameter tuning.</p><p>Thirdly, another limitation in the entity-linking experiment is that we do not consider the full context of the job posting when linking the entity. For example, the extracted mention "engineering", gets five valid suggestions, namely; "software engineering", "packaging engineering", and "power engineering". Since there is no context, none of the five options is more valid than the other. Finally, our current implementation does not incorporate any knowledge-grounding methodologies. During the study, we experimented with the incorporation of index values to ground the extractions by similarly checking the index to the work by <ref type="bibr" target="#b19">[20]</ref>. However, Mistral 7B seemed to have trouble with the provided index values.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.">Conclusion</head><p>We introduce an OL system to upgrade the existing labor market ontologies. We analyzed the job postings to extract new entities such as occupations and skills. We propose a framework to recommend entity linking between skills and occupations. To evaluate the performance, we designed multiple experiments to address our research questions about the performance of skill extraction, non-taxonomical relationship retrieval, and knowledge discovery.</p><p>As future work, we add "post-processing" filters to better distinguish the different types of non-matches would be valuable. This potentially saves the human annotator from sifting through mentions that are amongst other things; not skill types, too vague, and/ or present in ESCO. Furthermore, there is considerable room for improvement on the hyperparameter settings used in training the models in this paper, we consider this one of the easiest avenues for improvement of the results. Lastly, we would be very interested in testing out the current system on a variety of different types that are currently not part of ESCO. For example, extracting wage information, educational requirements, benefits, requirements about work experience, etc.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>KBC-LM'24: Knowledge Base Construction from Pre-trained Language Models workshop at ISWC 2024 0000-0003-0409-4924 (J. Vrolijk); 0009-0002-4535-1413 (V. Poslavsky); 0009-0000-9550-2502 (T. Bijl); 0009-0000-1667-3216 (M. Popov); 0009-0003-2330-8470 (R. Mahdavi); 0000-0001-9250-6743 (M. Shokri)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Overview of the ontology learning system for the labor market. The solid lines are used during both the training and serving stages. However, the dashed lines are only used for training.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>Statistics of the different datasets used in this study. Where we use the following abbreviation for tasks:</figDesc><table><row><cell>Dataset</cell><cell></cell><cell>Train</cell><cell></cell><cell></cell><cell>Dev</cell><cell></cell><cell></cell><cell>Test</cell><cell></cell></row><row><cell></cell><cell>ME</cell><cell>EL</cell><cell>RC</cell><cell>ME</cell><cell>EL</cell><cell>RC</cell><cell>ME</cell><cell>EL</cell><cell>RC</cell></row><row><cell># total</cell><cell>635</cell><cell cols="2">40,409 1,459,630</cell><cell>-</cell><cell cols="2">5,051 250</cell><cell>58</cell><cell cols="2">5,052 2,000</cell></row><row><cell># occupations</cell><cell>-</cell><cell>-</cell><cell>3,005</cell><cell>-</cell><cell>-</cell><cell>236</cell><cell>-</cell><cell>-</cell><cell>1,392</cell></row><row><cell># skills</cell><cell cols="2">29,837 10,617</cell><cell>13,078</cell><cell>-</cell><cell cols="2">1,339 231</cell><cell cols="3">2,142 1,332 1,496</cell></row><row><cell># essential</cell><cell>-</cell><cell>-</cell><cell>448,363</cell><cell>-</cell><cell>-</cell><cell>70</cell><cell>-</cell><cell>-</cell><cell>609</cell></row><row><cell># optional</cell><cell>-</cell><cell>-</cell><cell>393,495</cell><cell>-</cell><cell>-</cell><cell>74</cell><cell>-</cell><cell>-</cell><cell>509</cell></row><row><cell># negative</cell><cell>-</cell><cell>-</cell><cell>617,772</cell><cell>-</cell><cell>-</cell><cell>106</cell><cell>-</cell><cell>-</cell><cell>882</cell></row></table><note>i) ME, ii) EL, and iii) relation classification (RC).</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2</head><label>2</label><figDesc>F1 scores for three experiments evaluating the effect of fine-tuning LoRA weights on task-specific domain data, in comparison to existing methods. Column ME shows the performance on the mention extraction and term typing task (Experiment 1). Column RC compares the performance on the relation classification task (Experiment 2). Column EL presents the results for Experiment 3, which measures how well the models contribute to knowledge discovery through entity linking.</figDesc><table><row><cell>Model ↓ / Task →</cell><cell>ME</cell><cell>RC</cell><cell>EL</cell></row><row><cell>[27] *</cell><cell>0.46</cell><cell>-</cell><cell>-</cell></row><row><cell>[33] *</cell><cell>0.27</cell><cell>-</cell><cell>0.57</cell></row><row><cell>[32] *</cell><cell>0.80</cell><cell>-</cell><cell>-</cell></row><row><cell>Base</cell><cell>0.28</cell><cell>0.54</cell><cell>0.30</cell></row><row><cell>Base + ME</cell><cell>0.54</cell><cell>-</cell><cell>-</cell></row><row><cell>Base + RC</cell><cell>-</cell><cell>0.66</cell><cell>-</cell></row><row><cell>Base + EL</cell><cell>-</cell><cell>-</cell><cell>0.67</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Smedt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vrang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Papantoniou</surname></persName>
		</author>
		<title level="m">Esco: Towards a semantic web for the european labor market</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note>LDOW@WWW</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><surname>Commission</surname></persName>
		</author>
		<title level="m">Esco handbook european skills, competences, qualifications and occupations</title>
				<imprint>
			<publisher>Publications Office of the EU</publisher>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m">O*net online</title>
				<imprint>
			<date type="published" when="2024-04">2024. 4-June-2024</date>
		</imprint>
		<respStmt>
			<orgName>National Center for O*NET Development</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">An open and data-driven taxonomy of skills extracted from online job adverts</title>
		<author>
			<persName><forename type="first">J</forename><surname>Djumalieva</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sleeman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Developing skills in a changing world of work</title>
				<imprint>
			<publisher>Rainer Hampp Verlag</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="425" to="454" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Unifying large language models and knowledge graphs: A roadmap</title>
		<author>
			<persName><forename type="first">S</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Knowledge and Data Engineering</title>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Ontology-guided job market demand analysis: a cross-sectional study for the data science field</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Sibarani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Scerri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Morales</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Collarana</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 13th international conference on semantic systems</title>
				<meeting>the 13th international conference on semantic systems</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="25" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Buitelaar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Cimiano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Magnini</surname></persName>
		</author>
		<title level="m">Ontology learning from text: methods, evaluation and applications</title>
				<imprint>
			<publisher>IOS press</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="volume">123</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Ontojob: Automated ontology learning from labor market data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Vrolijk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">T</forename><surname>Mol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Weber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Tavakoli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kismihók</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Pelucchi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 16th International Conference on Semantic Computing (ICSC), IEEE</title>
				<imprint>
			<date type="published" when="2022">2022. 2022</date>
			<biblScope unit="page" from="195" to="200" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Automatic recognition of multi-word terms:. the c-value/nc-value method</title>
		<author>
			<persName><forename type="first">K</forename><surname>Frantzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ananiadou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Mima</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International journal on digital libraries</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="115" to="130" />
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Roller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kiela</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Nickel</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1806.03191</idno>
		<title level="m">Hearst patterns revisited: Automatic hypernym detection from large text corpora</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Harvesting domain specific ontologies from text</title>
		<author>
			<persName><forename type="first">H</forename><surname>Mousavi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kerr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Iseli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zaniolo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2014 IEEE International Conference on Semantic Computing</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="211" to="218" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Language models are few-shot learners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ryder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Subbiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Kaplan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dhariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Neelakantan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shyam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Herbert-Voss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Krueger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Henighan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Child</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ziegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Winter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hesse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sigler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Litwin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chess</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Berner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mccandlish</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amodei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Larochelle</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Ranzato</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Hadsell</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Balcan</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Lin</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="1877" to="1901" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">It&apos;s not just size that matters: Small language models are also few-shot learners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Schick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schütze</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Rumshisky</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Hakkani-Tur</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">I</forename><surname>Beltagy</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Bethard</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Cotterell</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Chakraborty</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</editor>
		<meeting>the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
			<biblScope unit="page" from="2339" to="2352" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Text2kgbench: A benchmark for ontology-driven knowledge graph generation from text</title>
		<author>
			<persName><forename type="first">N</forename><surname>Mihindukulasooriya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tiwari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">F</forename><surname>Enguix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lata</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="247" to="265" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">fijo&quot;: a french insurance soft skill detection dataset</title>
		<author>
			<persName><forename type="first">D</forename><surname>Beauchemin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Laumonier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">L</forename><surname>Ster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yassine</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2204.05208</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Knowledge repository of ontology learning tools from text</title>
		<author>
			<persName><forename type="first">A</forename><surname>Konys</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Knowledge-Based and Intelligent Information &amp; Engineering Systems: Proceedings of the 23rd International Conference KES2019</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">159</biblScope>
			<biblScope unit="page" from="1614" to="1628" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Llms4ol: Large language models for ontology learning</title>
		<author>
			<persName><forename type="first">H</forename><surname>Babaei Giglou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Souza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="408" to="427" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Han</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Xie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-T</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H.-G</forename><surname>Kim</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2108.10604</idno>
		<title level="m">Promptlearning for fine-grained entity typing</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ouyang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Wang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2304.10428</idno>
		<title level="m">Gpt-ner: Named entity recognition via large language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><surname>Perot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Luisier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Boppana</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Mu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Hua</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2309.10952</idno>
		<title level="m">Lmdx: Language model-based document information extraction and localization</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Fine-grained extraction and classification of skill requirements in German-speaking job ads</title>
		<author>
			<persName><forename type="first">E</forename><surname>A.-S. Gnehm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Bühlmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Buchs</surname></persName>
		</author>
		<author>
			<persName><surname>Clematide</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/2022.nlpcss-1.2</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS), Association for Computational Linguistics</title>
				<editor>
			<persName><forename type="first">D</forename><surname>Bamman</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Hovy</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Jurgens</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Keith</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>O'connor</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Volkova</surname></persName>
		</editor>
		<meeting>the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS), Association for Computational Linguistics<address><addrLine>Abu Dhabi, UAE</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="14" to="24" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<author>
			<persName><forename type="first">J.-J</forename><surname>Decorte</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Verlinden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">V</forename><surname>Hautte</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Deleu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Develder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Demeester</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2307.10778</idno>
		<title level="m">Extreme multilabel skill extraction training using large language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">D</forename><surname>Bie</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2304.11060</idno>
		<title level="m">Skillgpt: a restful api service for skill extraction and standardization using a large language model</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Large language models as batteries-included zero-shot esco skills matchers</title>
		<author>
			<persName><forename type="first">B</forename><surname>Clavié</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Soulié</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2307.03539</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Recruitpro: A pretrained language model with skill-aware prompt learning for intelligent recruitment</title>
		<author>
			<persName><forename type="first">C</forename><surname>Fang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Yao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Zhuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Xiong</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD &apos;23</title>
				<meeting>the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD &apos;23</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Neo: A tool for taxonomy enrichment with new emerging occupations</title>
		<author>
			<persName><forename type="first">A</forename><surname>Giabelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Malandri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Mercorio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mezzanzanica</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Seveso</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Semantic Web Conference</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="568" to="584" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Montariol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bosselut</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2402.03832</idno>
		<title level="m">Rethinking skill extraction in the job market domain using large language models</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b27">
	<monogr>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">J</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Shen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Wallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Allen-Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2106.09685</idno>
		<title level="m">Lora: Low-rank adaptation of large language models</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Retrieval-augmented generation for knowledge-intensive nlp tasks</title>
		<author>
			<persName><forename type="first">P</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Perez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Piktus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Petroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Karpukhin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Küttler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>-T. Yih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rocktäschel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in Neural Information Processing Systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="9459" to="9474" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bosma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">Y</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Guu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">W</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Lester</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M</forename><surname>Dai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2109.01652</idno>
		<title level="m">Finetuned language models are zero-shot learners</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Vrolijk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Graus</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2308.16770</idno>
		<title level="m">Enhancing plm performance on labour market tasks via instructionbased finetuning and prompt-tuning with rules</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b31">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">N</forename><surname>Jensen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">D</forename><surname>Sonniks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Plank</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2204.12811</idno>
		<title level="m">Skillspan: Hard and soft skill extraction from english job postings</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<title level="m" type="main">The skills extractor library</title>
		<author>
			<persName><forename type="first">E</forename><surname>Gallagher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Kerle</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Sleeman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vines</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="2024" to="2026" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
