<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Projects in the Swedish Consultancy Market with</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Synteda AB</string-name>
          <email>mi@synteda.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gothenburg</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sweden</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Halmstad</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sweden</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diogo Buarque Franzosi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kristijan Capovski</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maycel Isaac</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Byttner</string-name>
          <email>stefan.byttner@hh.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Karlskrona, Sweden</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Halmstad University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article presents the recommendation system of Personas, a microservice-based platform designed to assist Human Resources (HR) teams in streamlining the recommendation and presentation of candidates to clients based on posted project descriptions. Personas ofers functionalities for recommendation, automatic generation of tailored curricula and motivation letters, and conversational support through client- and consultant-facing chatbots.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>Recruitment processes are becoming increasingly complex, involving large volumes of candidate
profiles, job postings, and client-specific requirements. Traditional systems struggle to scale</p>
      <p>
        LGOBE
https://www.synteda.se/ (M. Isaac)
eficiently with this complexity while providing high-quality matches. In recent years, artificial
intelligence (AI) and machine learning (ML) have been increasingly adopted to support and
automate parts of the recruitment pipeline, such as resume screening, job recommendation,
and candidate ranking. In particular, the rise of Large Language Models (LLMs) has opened
new possibilities for semantic matching and deep content analysis in Human Resources (HR)
tools [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>This paper presents the recommendation system of Personas, a microservice-based platform
that supports HR teams in streamlining the recommendation and presentation of candidates to
diferent clients according to posted project descriptions. Personas includes tools for candidate
recommendation, automatic generation of personalized curricula and motivation letters, and
conversational support via consultant and client-facing chatbots. At the core of this platform
lies a recommendation system that automatically suggests relevant client-requested projects to
each candidate, typically on a daily basis.</p>
      <p>The system leverages both structured and unstructured data sources, including web-collected
projects posted by clients, user-uploaded resumés, and documents curated or generated by
LLMs. All documents are embedded into a vector database using pre-trained sentence
embedding models, allowing eficient semantic comparisons through cosine similarity and enabling
Retrieval-Augmented Generation (RAG) techniques.</p>
      <p>To handle the large volume of incoming client requests, Personas uses a two-stage
recommendation pipeline. First, a pre-selection stage quickly filters and ranks client requests using
lightweight similarity-based models. Then, an in-depth analysis stage applies LLM-based
scoring methods to assess candidate-assignment compatibility with greater nuance. Interestingly,
these in-depth LLM analyses also serve a dual role as soft labels for evaluating the quality of
pre-selection models—supporting a continuous improvement cycle for the recommendation
system.</p>
      <p>This paper provides a detailed description of each component of the recommendation pipeline,
including document collection and curation, embedding and structuring, pre-selection, in-depth
LLM analysis, and evaluation procedures. We support this discussion with experimental results
based on a large corpus of over 35,000 assignment projects requested by clients and a curated
sample of candidate CVs. The results highlight the efectiveness of in-depth LLM models and
their utility as proxies for human evaluation.</p>
      <p>Despite overlapping with broader recruitment practices, the consultancy market—particularly
within regional environments—poses distinct challenges. These include high turnover rates,
rapid project-based hiring cycles, and the need for precise skill-client alignment under tight
deadlines. Furthermore, assignment descriptions and candidate resumes are often maintained
in multiple languages depending on client and candidate backgrounds. In the case of our study,
we focus on a Swedish-English dual-language context, which introduces additional semantic
and syntactic complexity. Standard online job recommendation systems often overlook these
localized and multilingual nuances. This work aims to address these gaps by adapting
LLMbased scoring to the consultancy domain and by demonstrating a robust performance across
linguistic variations.</p>
      <p>The rest of the paper is organized as follows: Section 2 describes related work. Section 3
introduces the methodology and technical design of the Personas recommendation system.
Section 4 presents the experimental setup and quantitative results. Section 5 discusses the
implications of our findings and outlines limitations. Finally, Section 6 concludes with future
directions.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>
        Recent work has explored the application of Large Language Models (LLMs) to job
recommendation systems from various perspectives. For instance, Kavas et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] illustrates a multilingual,
hybrid system that combines LLMs and recruiter input for better CV-job matching. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
experiments on extracting, matching and ranking skills between CV and job profiles. Zheng et al.
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and Wu et al. Wu et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] examine generative and graph-based approaches using LLMs
for candidate-job alignment. Other studies investigate multilingual and zero-shot matching
techniques [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], as well as hybrid recommendation pipelines [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. A broader review of LLM
applications in recommendation tasks is provided in surveys by Wu et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and Hou et al.
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], highlighting the rapid convergence of recommendation systems and large-scale language
modeling. A summarization of the how the concept presented in this paper relates to previous
presented work is shown in Table 1.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Methodology</title>
      <p>The goal of this article is to present the design and evaluation of the Personas recommendation
system, with a particular focus on how lightweight pre-selection models, supported by
LLMbased in-depth analyses, can efectively streamline candidate–assignment matching at scale.</p>
      <p>This section describes the architecture, data processing steps, and algorithms used in the
Personas recommendation system. The system is composed of several modular components
responsible for document ingestion, structuring, pre-selection of client assignments for each
candidate, and in-depth analysis of candidate-assignment pair using LLMs. Figure 1 provides an
overview of the pipeline. We use the terms client request, project, and assignment
interchangeably to refer to job descriptions posted by clients, which outline specific tasks to be completed
within a defined time frame.</p>
      <sec id="sec-4-1">
        <title>3.1. Personas Recommendation System</title>
        <sec id="sec-4-1-1">
          <title>3.1.1. Document Collection</title>
          <p>Every day, the system collects hundreds of new client request descriptions from various sources,
including public job boards, company websites, and internal client submissions. Data collected
are cleaned, parsed, and stored in a standardized text format. Additionally, candidate CVs are
uploaded directly by users or HR consultants through the Personas platform. Each document,
whether a CV or an assignment description, enters the same downstream pipeline for semantic
processing.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>3.1.2. Document Structuring and Summarization</title>
          <p>All collected documents are transformed into high-dimensional semantic vectors using OpenAI’s
text-embedding-ada-002 model. This model encodes natural language into 1536-dimensional
dense embeddings that preserve contextual semantics, enabling fine-grained comparison
between documents and document components.</p>
          <p>To enhance interpretability and facilitate targeted similarity matching, raw documents are
restructured into standardized JSON schemas using prompting techniques with LLMs. The
structuring prompt processes heterogeneous file formats (PDFs, Word documents, plain text)
and outputs a normalized representation. For candidate CVs, we extract fields such as:
• Personal Information
• Short Description
• Experiences
We use two additional LLM prompts to separately extract:
• Candidate Keywords – terms related to competencies, tools, and domains of expertise.
• Candidate Roles – role titles or job functions expressed in the document.</p>
          <p>For client request descriptions, a parallel extraction process retrieves:</p>
          <p>These summarizations and document structuring are achieved using services provided by
Personas, which are connected to an Agents RESTful API, introduced in [11]. It contains simple
prompt chains as well as RAG chains and ReAct agents.</p>
          <p>These structured fields are used in subsequent matching stages. Prompt templates and
formatting instructions are available in suplemental material.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>3.1.3. Document Curation</title>
          <p>The Personas system incorporates a human-in-the-loop approach for document curation,
involving both candidates and sales consultants. Once documents—such as candidate summaries,
generated CVs, or motivation letters—are automatically produced, they can be reviewed, edited,
or enriched through a dedicated user interface before being presented to clients. This curation
process ensures higher-quality, context-aware content, better aligned with client expectations
and market standards.</p>
          <p>Beyond presentation, curated documents play an important role in improving the overall
recommendation system. By refining or correcting the automatically extracted or generated
information, the system benefits from more accurate and relevant data in subsequent matching
steps. Personas includes a frontend platform specifically designed to support this interactive
workflow, allowing users to annotate, validate, or update content with minimal friction.</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>3.1.4. Pre-Selection</title>
          <p>The pre-selection stage aims to eficiently reduce the candidate search space from thousands
of potential assignments to a manageable shortlist suitable for more intensive analysis. This
is achieved through lightweight semantic similarity models that rank assignments for each
candidate. In our daily pipeline, these methods are run over the assignments collected in the
same morning, but the endpoints provided by the tool allow also a more flexible range of
assignment poolk based on the collected date.</p>
          <p>In this study, we evaluate three pre-selection models:</p>
        </sec>
        <sec id="sec-4-1-5">
          <title>PS1: Skill-to-Skill Matching</title>
          <p>This model evaluates how well a candidate’s skills match the requirements of a given
assignment by comparing their respective skill sets using semantic embeddings. Specifically, for each
assignment-candidate pair, we represent the required skills of the assignment as a set of text
embeddings a , and the candidate’s skills as another set c , where both sets are derived from
the summarization process described in Section 3.1.2. The embeddings are generated using the
text-embedding-ada-002 model.</p>
          <p>To measure similarity, for each assignment skill embedding   , we find the closest matching
candidate skill embedding   based on cosine distance:
(1)
(2)
(3)

  = min (distance(a , c )),
distance(a, b) = 1 − cos(a, b),
 =
2 − mean(  ) × 100.</p>
          <p>2</p>
          <p>This approach generalizes simple keyword matching by capturing semantic similarity between
skills. For example, a perfect match between all assignment and candidate skills yields   = 0
for all  , resulting in a score of 100.</p>
          <p>One problem with this algorithm is that it grows with  ( ×  ×  )
the number of candidates
(N), number of assignments (M) and number of required skills in assignment (K).</p>
        </sec>
        <sec id="sec-4-1-6">
          <title>PS2: Keyword-to-Assignment Matching</title>
          <p>This model assesses the overall relevance of a candidate’s competency profile to a given
assignment by leveraging a vector similarity search engine—specifically, the built-in nearest
neighbor (NN) search provided by Chroma. Unlike the previous model (PS1), which computes
where cosine distance is defined as:
with values ranging from 0 (identical vectors) to 2 (completely dissimilar).</p>
          <p>The final matching score is computed by averaging the minimal distances across all required
skills, then scaling the result to a 0–100 range:
pairwise distances between individual skill embeddings, this approach compares aggregated
representations of candidate and assignment data.</p>
          <p>The candidate’s profile is represented as a single embedding vector derived from the
embeddings of their extracted keywords. The assignment is similarly represented by a single
embedding computed from its full textual description. These embeddings are generated using
the same text-embedding-ada-002 model described earlier.</p>
          <p>To compute the similarity, the model uses Chroma’s internal similarity search mechanism,
which indexes the assignment embeddings and allows for eficient retrieval of the most relevant
assignments for a given candidate query embedding. The matching score is determined by the
cosine similarity between the candidate’s keyword embedding c and the assignment embedding
a:
 = (2 − distance(c, a)) × 100, (4)</p>
          <p>2
where distance is given in Eq. 2. This results in a score between 0 and 100, with higher scores
indicating stronger overall alignment between the candidate’s profile and the assignment
content.</p>
          <p>By comparing entire profiles rather than individual skills, this model captures broader
semantic alignment, making it suitable for assessing general fit or potential suitability across loosely
defined tasks.</p>
          <p>This model solves the  ( ×  ×  ) grow using NN search in Chroma, reducing it to  ( )</p>
        </sec>
        <sec id="sec-4-1-7">
          <title>PS3: Role-to-Title Matching</title>
          <p>This model follows the same similarity search approach as PS2 but operates on diferent
inputs. Instead of using candidate keywords and full assignment descriptions, it compares the
embedding of the candidate’s identified roles with the embedding of the assignment title. This
captures how closely the candidate’s professional identity or career path aligns with the nature
of the role being ofered.</p>
        </sec>
        <sec id="sec-4-1-8">
          <title>3.1.5. In-Depth Analysis with Large Language Models</title>
          <p>The client requests that pass the pre-selection phase are further evaluated using more
computationally expensive but semantically rich LLM-based models. These models read both the
candidate and client request documents in detail and produce a matching score from 0 to 100
based on nuanced semantic understanding.</p>
          <p>We consider two in-depth LLM-based models:</p>
        </sec>
        <sec id="sec-4-1-9">
          <title>M1: Generic Fit Scorer</title>
          <p>
            This prompt reads the full CV and client request description and outputs a score indicating
the candidate’s suitability for the role. The score is based on inferred relevance, skill overlap,
and contextual cues. No training is involved—this is a prompt-based model using zero-shot
capabilities of the LLM. Zero-shot LLM capabilities have shown good results in job matching [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ].
The prompt template receives the full text of candidates’s resume and the full description of
client request to output a score from 0-100 and a textual analysis.
          </p>
        </sec>
        <sec id="sec-4-1-10">
          <title>M2: Role-Contextual Fit Scorer</title>
          <p>This model also leverages the zero-shot capabilities of the LLM, but applies prompt
engineering to decompose the evaluation into multiple targeted dimensions. The prompt instructs the
model to assess the candidate’s fit by considering several key factors individually, resulting
in a more structured and explainable score. The evaluation is broken down as follows: Skill
Matching (0–40 points), Role Alignment (0–30 points), Strengths and Weaknesses Analysis
(0–20 points), and Additional Considerations (0–10 points). This formulation emphasizes recent
experience and contextual relevance, encouraging the model to focus not only on content
overlap but also on transferable experience and strategic fit.</p>
          <p>These prompts are designed to be interpretable, meaning they output not only a score but
also a natural language justification, which can be logged for future audits or included in client
reports. Both M1 and M2 prompt templates are available in the suplemental material.</p>
        </sec>
        <sec id="sec-4-1-11">
          <title>3.1.6. Model Evaluation</title>
          <p>The outputs of the in-depth models are used both as final recommendations and as proxy labels
for evaluating the performance of pre-selection models.</p>
          <p>We also include a subset of assignments that were manually tagged by HR experts, allowing
for comparison between automated evaluations and human judgment. This triangulation
enables us to quantify how well each automated stage replicates expert decisions.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Results</title>
      <p>In this section, we present a first evaluation of the Personas recommendation system. We begin
with a description of the dataset used in our experiments, followed by a qualitative assessments
of in-depth LLM analyses compared to pre-selection models. We then analyze the performance
of the pre-selection models and conclude by a quantitative analysis comparing automated scores
to human evaluation labels.</p>
      <sec id="sec-5-1">
        <title>4.1. Data Description</title>
        <p>Our dataset consists of 35,679 client requested project descriptions collected from May 2024
to February 2025. These client requests span a wide variety of industries, roles, and technical
domains. Among them, 4,328 assignments have received human annotations, indicating whether
they were deemed relevant for a particular candidate by HR professionals.</p>
        <p>To evaluate the system’s ability to model candidate relevance, we selected 20 candidate
CVs representative of diverse experience levels and domains, ranging from junior software
developers to senior project managers.</p>
        <p>Each candidate was evaluated against a pool of assignments using both pre-selection and
in-depth models, with correlations computed between the various scoring mechanisms and
available human judgments.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Qualitative Assessment of LLM Analyses and Pre-Selection Models</title>
        <p>The in-depth LLM analyses (M1 and M2) demonstrated the ability to produce nuanced
evaluations, often surfacing insights that keyword-based models overlooked. For instance, the
models could infer transferable skills—such as familiarity with Agile methodologies in project
management—even when explicit technologies or terms were not mentioned.</p>
        <p>Moreover, the LLMs exhibited sensitivity to temporal factors (e.g., recency of experience),
job seniority, and domain-specific terminology. The generated justifications were coherent
and often aligned closely with human reasoning, making them well-suited for explainable AI
applications.</p>
        <p>Crucially, the LLM analyses provided detailed, interpretable justifications for individual
candidate-assignment matches. This level of granularity enables qualitative assessments of
model behavior, which is not possible with pre-selection models that output only a numerical
score. By analyzing these justifications, we gain confidence in the LLM-generated evaluations
and can therefore use their scores as a reliable benchmark for assessing the performance of
pre-selection models.</p>
        <p>This analysis was conducted in collaboration with HR experts, who reviewed various reports
daily over several months.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Evaluation of Pre-Selection Models</title>
        <p>To evaluate the quality of the pre-selection stage, we compared the scores produced by each
preselection model (PS1–PS3) against those generated by the in-depth LLM-based scoring models
M1 and M2. The models were applied to assignment-candidate pairs collected during March
2025 and 13 candidates, each one presenting one or two CVs in either English or Swedish (total
of 20 CVs). During this period, 2,547 assignments were collected. To manage computational
complexity—particularly due to the pairwise comparisons required by PS1—we limited our
evaluation to this subset rather than using the full database of assignments. Each pre-selection
model suggests 15 assignments from the pool for each CV, culminating in 15 or 30 assignments
per candidate. Each of these assignments are then analyzed by the LLM-based models M1 and
M2. Figure 2 summarizes the results using two boxplots—one for M1 and one for M2—showing
the distribution of scores across candidates selected by each pre-selection method.</p>
        <p>We found no clear preference for any specific pre-selection model indicating that lighter
models based on semantic searches can perform as well as skill-to-skill comparisons.</p>
      </sec>
      <sec id="sec-5-4">
        <title>4.4. Comparison with Human Label</title>
        <p>We also assessed the alignment between in-depth LLM scores and human expert labels across
seven candidate profiles, denoted as C1, C2, C3, C4, C5, C6 and C7. The profiles extend expertize
from IT programming industry to project manager. The human labels were tags (Good, Maybe
or Bad) indicating whether each project was relevant for the candidate.</p>
        <p>M1-Score Distribution per Candidate by Pre-Selection Model
PS1
PS2
PS3
PS1
PS2
PS3
C13
many candidates consider location as a strong component, but this is not evaluated by the
models.</p>
        <p>Similarly, Figure 4 presents the recall of the binary classification. We consider positive when
the Score is larger than 60. A true positive (TP) is therefore when the human labeled Good and
the in-depth model score return larger than 60, while a false negative is when the human label
a Good but the model returns below 60. We find recall
to be the most important metric of the confusion matrix, since we want to be sure that for every
Good assignment according to a human, that will also be considered good by the LLM model.</p>
        <p>The results indicate moderate positive correlations between LLM-based scores and human
judgments, especially when evaluations are constrained to relevant geographic or domain
contexts. When including all client requests, including those outside the candidate’s preferred
locations or industries, the correlation drops, reflecting the models’ limitations in capturing
implicit preferences not expressed in text.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Discussion</title>
      <p>The experimental results presented in the previous section ofer several important insights into
the efectiveness and limitations of the Personas recommendation system.</p>
      <p>First, while our results show that pre-selection models correlate to some extent with in-depth
LLM scores, we did not observe a clear performance advantage for any specific pre-selection
strategy. However, this does not necessarily imply that the models are equally efective at
identifying the best matches overall. A more definitive evaluation would require running the
in-depth LLM analyses exhaustively across all candidate–assignment pairs, which was beyond
the scope of this study. Without such comprehensive scoring, it remains dificult to assess how
well the pre-selection models truly prioritize the most suitable assignments from the entire
pool.</p>
      <p>Second, the qualitative and quantitative assessments of the LLM-based models show that large
language models are capable of nuanced judgment in candidate-assignment matching. Their
ability to consider contextual fit, infer latent skills, and synthesize complex job requirements
makes them valuable tools for augmenting HR workflows. However, the reliance on textual
descriptions means they are inherently limited by what is explicitly stated in the documents.
This was particularly evident in the human comparison experiments, where factors such as
geographic preference, salary expectations, or client-specific cultural fit played a key role in
expert evaluations but were often absent from the CVs and client request descriptions.</p>
      <p>These findings underscore an important trade-of: while LLMs bring rich semantic
understanding, they cannot reason beyond the provided inputs. This suggests two avenues for
improving future iterations of the system. First, incorporating structured preference data (e.g.,
preferred locations, target roles, availability) directly into the matching process may help bridge
the gap between textual analysis and real-world candidate intent. Second, training
domainspecific LLMs or fine-tuning existing models on annotated HR datasets could help models better
internalize implicit selection criteria. It is also worth noting that the system can process much
richer, multi-page CVs, whereas the human evaluation relied on shorter CVs due to practical
limitations—humans need to read through multiple CVs quickly, making longer documents
impractical.</p>
      <p>Another noteworthy point is the use of LLM scoring as a source of soft labels. This enables
a continuous learning pipeline, where lightweight pre-selection models can be evaluated and
improved without relying on scarce human-labeled data. Over time, this setup has the potential
to create a virtuous cycle of feedback, where pre-selection models improve in alignment with
human-like preferences—even in the absence of direct human supervision.</p>
      <p>One key limitation of this study is the absence of a traditional pre-selection analysis, which
could serve as a baseline for comparison. Traditional pre-selection methods often rely on
rule-based systems, keyword matching, or straightforward criteria such as years of experience
or educational background. While these approaches are widely used in HR systems, they tend
to be less flexible and context-sensitive than modern embedding-based models. Without this
baseline, it is dificult to assess whether the embedding-based pre-selection models outperform
or simply ofer a more nuanced approach to candidate-job matching. Future work should
consider incorporating a traditional pre-selection model to directly compare the performance of
the Personas system against these more established methods, providing a clearer understanding
of the strengths and weaknesses of embedding-based pre-selection in a recruitment context.</p>
      <p>Finally, the human evaluation itself is a potential source of bias. Tags such as “good match” or
“not relevant” are subject to individual consultant preferences, which may vary widely. These
judgments also frequently consider external factors not modeled in this study, such as location
of the assignment, team composition, communication style, or organizational fit. Future work
should aim to incorporate multi-dimensional human assessments, possibly through structured
annotation schemes or post-recommendation feedback loops.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusions</title>
      <p>This study presents an in-depth exploration of the Personas recommendation system, a hybrid
pipeline that combines lightweight semantic filtering with powerful LLM-based analysis to
support HR teams in the task of candidate-to-assignment matching. Through a combination
of structured document processing, semantic embedding, and prompt-driven evaluation, the
system is able to generate daily recommendations at scale while maintaining relevance and
interpretability.</p>
      <p>Our findings show that simple embedding-based models, provide good performance as
preselection filters and correlate well with more computationally expensive LLM-based evaluations.
These in-depth analyses ofer nuanced and context-aware assessments of fit, acting not only as
scoring mechanisms but also as a source of soft supervision for continuous improvement of
upstream components.</p>
      <p>Importantly, we observed moderate alignment between LLM scores and human expert tags,
especially in constrained settings. However, the divergence in broader contexts highlights the
need to explicitly model candidate preferences and non-textual factors—such as geography,
compensation expectations, and cultural fit—that are crucial in human decision-making.</p>
      <p>This work contributes to the growing body of research on the application of large language
models in HR and recommendation systems. It highlights both the promise and limitations
of current AI technologies in replicating complex human judgment and suggests practical
pathways for system refinement.</p>
      <p>Future work will focus on expanding the candidate dataset, incorporating explicit preference
modeling, expanding the knowledge base of each candidate, exploring ways to better access
specific parts of the documents, and exploring fine-tuned LLMs trained on HR-specific tasks.
We also plan to deepen our integration of feedback loops from real-world usage, enabling more
adaptive and personalized recommendations over time.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This project is supported by Vinnova (T.A.R.G.E.T. (2024-00242)), Kunskapsstiftelsen (KKS)
(SERT research profile (2018-01-22)), and our research partner Synteda.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT in order to: Grammar and
spelling check, Paraphrase and reword. After using this tool/service, the authors reviewed and
edited the content as needed and take full responsibility for the publication’s content.
[11] D. B. Franzosi, E. Alégroth, M. Isaac, Llm-based labelling of recorded automated gui-based
test cases, in: Proc. ICST, IEEE, 2025, pp. 453–463.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alonso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dessí</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Meloni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Reforgiato</given-names>
            <surname>Recupero</surname>
          </string-name>
          ,
          <article-title>A novel approach for job matching and skill recommendation using transformers and the o*net database</article-title>
          ,
          <source>Big Data Research</source>
          <volume>39</volume>
          (
          <year>2025</year>
          )
          <article-title>100509</article-title>
          . URL: https://www.sciencedirect.com/science/article/pii/S2214579625000048. doi:https://doi.org/10.1016/j.bdr.
          <year>2025</year>
          .
          <volume>100509</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kavas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Serra-Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wanner</surname>
          </string-name>
          ,
          <article-title>Using large language models and recruiter expertise for optimized multilingual job ofer - applicant cv matching</article-title>
          ,
          <source>in: International Joint Conference on Artificial Intelligence</source>
          ,
          <year>2024</year>
          . URL: https://api.semanticscholar.org/CorpusID: 271494727.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <article-title>Generative job recommendations with large language model</article-title>
          ,
          <source>ArXiv abs/2307</source>
          .02157 (
          <year>2023</year>
          ). URL: https://api.semanticscholar.org/ CorpusID:259342592.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , E. Chen,
          <article-title>Exploring large language model for graph data understanding in online job recommendations</article-title>
          ,
          <source>ArXiv abs/2307</source>
          .05722 (
          <year>2023</year>
          ). URL: https://api.semanticscholar.org/CorpusID:259836967.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kurek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Latkowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bukowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Świderski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Łępicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Baranik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Nowak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zakowicz</surname>
          </string-name>
          , Łukasz Dobrakowski,
          <article-title>Zero-shot recommendation ai models for eficient job-candidate matching in recruitment process</article-title>
          ,
          <source>Applied Sciences</source>
          (
          <year>2024</year>
          ). URL: https: //api.semanticscholar.org/CorpusID:268564829.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Sileo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Vossen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Raymaekers</surname>
          </string-name>
          ,
          <article-title>Zero-shot recommendation as language modeling</article-title>
          ,
          <source>in: European Conference on Information Retrieval</source>
          ,
          <year>2021</year>
          . URL: https://api.semanticscholar. org/CorpusID:244954768.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Singla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <article-title>A hybrid approach for job recommendation systems</article-title>
          ,
          <source>2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT)</source>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . URL: https://api.semanticscholar.org/CorpusID:273840570.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B. L.</given-names>
            <surname>Prasad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Srividya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. N.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. K.</given-names>
            <surname>Chandra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. S.</given-names>
            <surname>Dil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. V.</given-names>
            <surname>Krishna</surname>
          </string-name>
          ,
          <article-title>An advanced real-time job recommendation system and resume analyser</article-title>
          ,
          <source>2023 International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS)</source>
          (
          <year>2023</year>
          )
          <fpage>1039</fpage>
          -
          <lpage>1045</lpage>
          . URL: https: //api.semanticscholar.org/CorpusID:265935829.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>A survey on large language models for recommendation</article-title>
          ,
          <source>ArXiv abs/2305</source>
          .19860 (
          <year>2023</year>
          ). URL: https://api.semanticscholar.org/CorpusID:258987581.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>McAuley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Large language models are zero-shot rankers for recommender systems</article-title>
          ,
          <source>ArXiv abs/2305</source>
          .08845 (
          <year>2023</year>
          ). URL: https://api.semanticscholar.org/CorpusID:258686540.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>