<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Skill matching at scale: freelancer-project alignment for eficient multilingual candidate retrieval</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Warren Jouanneau</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marc Palyart</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emma Joufroy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Malt</institution>
          ,
          <addr-line>33000 Bordeaux</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Finding the perfect match between a job proposal and a set of freelancers is not an easy task to perform at scale, especially in multiple languages. In this paper, we propose a novel neural retriever architecture that tackles this problem in a multilingual setting. Our method encodes project descriptions and freelancer profiles by leveraging pre-trained multilingual language models. The latter are used as backbone for a custom transformer architecture that aims to keep the structure of the profiles and project. This model is trained with a contrastive loss on historical data. Thanks to several experiments, we show that this approach efectively captures skill matching similarity and facilitates eficient matching, outperforming traditional methods.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Matching</kwd>
        <kwd>Recommender system</kwd>
        <kwd>Information retrieval</kwd>
        <kwd>Contrastive learning</kwd>
        <kwd>Natural language processing</kwd>
        <kwd>Language model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With more than 700,000 registered freelancers, Malt is the
leading freelancing platform in Europe. On the platform,
users have the option to search for freelancers on their
own or to post a project that will be handled fully by our
recommender system. Like other players in the human
resources field [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] we have been using machine learning
for several years to help match projects to freelancers. Our
recommender system takes as input a project description
and contacts automatically a set of relevant freelancers to
see if they are interested in the project.
      </p>
      <p>At the time, this system sufered from three limitations.
First, we could not scale well, as each new project needed to
be scored against each freelancer to build the global
ranking. Second, the system only relied on partial information
contained within the rich freelancer profiles. For
example, job title and skills were taken into account but long
text sections such as profile description or experience
description were under-exploited. Finally, as Malt is active
across Europe, language management was painful since skill
matching model were monolingual and one had to be
developed and maintained for each language supported. Plus,
the cross-lingual matching was badly handled. In summary,
we needed a multilingual approach that could scale and use
richer information than the legacy system.</p>
      <p>
        Our legacy system had just the filtering and ranking
phases. To map to a traditional recommender architecture
that can scale, we decided to add a retrieving phase to
generate relevant candidates. Traditionally, two approaches
are used [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. One based on lexical matching (bag-of-words
models), and another called semantic matching based on
representational learning (neural network models).
However, the former sufers from a lack of semantic and context
interpretation, an issue that the latter is supposed to solve.
      </p>
      <p>Indeed, in our case, there are fundamental diferences in
the nature and form of information disclosed by freelancers
and project proposals. Freelancers and companies may
employ diferent vocabularies, potentially due to varying levels
of expertise. Finally, projects typically require a subset of
a freelancer’s skills, resulting in less specific information
being communicated.</p>
      <p>
        Therefore, our goal is to build upon existing work, such
as SentenceBERT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and conSultantBERT, within a
multilingual setting [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], to develop an eficient freelancer
retrieval phase. This retrieval model should reflect our
historical data, as illustrated in Fig. 1a, and enable the
selection of candidates based on skill-matching similarity, as
shown in Fig. 1b. The proposed method relies exclusively
on skills, as business-related factors such as experience or
location are managed separately within our pipeline, either
through the ranker or via hard filtering rules. The retrieval
approach, along with the associated experimental results,
are presented in this article as follows :
• Part 2 examines prior related research and discusses
their limitations within the context of our work.
• Part 3 provides an in-depth explanation of our
approach, which consists of an architecture that
leverages a multilingual backbone while efectively
handling documents and their structures, as well as a
training loss and configuration designed to facilitate
retrieval.
• Part 4 outlines the experimental protocol and the
models tested. In addition, new evaluation metrics
are proposed.
• Finally, Part 5 provides insights into the results
obtained after deploying the proposed model in
production.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        The proposed approach, while similar to traditional job
board methods, adopts a reversed recommendation
strategy: it recommends freelancers to employers rather than
job posts to candidates. Despite this perspective shift we
can benefit from current approaches in the human resources
(HR) domain. Indeed, in recent years, available data in HR
have enabled to enhance person-job fit algorithms. By
improving how candidates are matched with job roles, these
advancements have made hiring processes more eficient
and efective, ensuring that the right talent meets the right
opportunity. Various approaches, including content-based
ifltering [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7, 8, 9</xref>
        ], collaborative filtering [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
        ], and
hybrid strategies, have emerged [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13, 14, 15</xref>
        ].
(a)
(b)
      </p>
      <p>
        Due to their principal modality, documents containing
text, most of these methods utilize embeddings of input
data to measure similarity, a critical aspect of evaluating
a candidate’s fit for a particular role. The richness of
embeddings is explored in multiple studies [
        <xref ref-type="bibr" rid="ref16 ref17 ref18">16, 17, 18</xref>
        ], which
demonstrates that integrating these features can enhance
job matching accuracy.
      </p>
      <p>
        Approaches vary on embedded textual information types
and techniques used, such as Word2Vec and Doc2Vec
[
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ], the exploration of graphical representations, [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ],
the use of deep learning techniques, such as
Convolutional Neural Networks (CNNs), Recurrent Neural Networks
(RNNs), and Long Short-Term Memory (LSTM) networks
[
        <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
        ]. More recently, methods based on the attention
mechanism have emerged, such as PJFCANN [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] or
conSultantBERT [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Additionally, these approaches can be
adapted to handle multiple languages through distillation
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. To account for the inherent structure of documents,
whether résumés or job proposals, all sections can be
concatenated [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], processed at the sentence level [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], or
processed at the paragraph level [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. However, to our
knowledge, no existing method processes each section while
preserving its specificity. This results in the loss of the section’s
intrinsic type information.
      </p>
      <p>
        The aforementioned models transform textual data
through multiple non-linear transformations, resulting in
more abstract representations. They create a framework for
manipulating the representational space, referred to as the
latent space. The emerging field of representation learning
[
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] focuses on using deep learning to acquire meaningful
representations within this latent space, improving
predictions for higher-level tasks such as classification, regression
or clustering. The choice of representation learning
technique depends on the input modality and is tailored to the
downstream task [
        <xref ref-type="bibr" rid="ref26 ref27">27, 26</xref>
        ].
      </p>
      <p>
        Regarding retrieval process within HR applications,
natural clustering representation is highly desirable for similar
entities. Most approaches rely on contrastive representation
learning [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. The latter type of methods aim to minimize
the embedding distance for similar entities, while
maximizing it for dissimilar ones. Person-job fit systems often use
these approaches but face challenges such as sparse or
unstructured data and large pools of freelancers and clients,
leading to significant computational costs.
      </p>
      <p>
        To improve contrastive learning, researchers have
focused on enhancing settings, relational data, and loss
functions. For settings, models like CrosCLR [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] explore
intramodality relations using matrices, which is relevant as
proifles and project proposals can be seen as diferent modalities.
In terms of relational data, the number of counterexamples
significantly impacts performance. Some approaches also
dynamically select hard examples at the batch level during
training [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ], called online batch mining. There are various
loss functions and advancements available for training
models in a contrastive manner. Entities can be presented to the
models in pairs, using either a classification or regression
loss [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], as was explored in the human resource context
with conSultantBERT [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Alternatively, the input can be
structured as triplets, consisting of one anchor entity, one
relatively positive entity, and one negative entity. Initially
proposed in computer vision [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ], the triplet loss has also
been applied to text models, such as SentenceBERT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In
the literature, using triplets instead of pairs has often been
shown to yield better results. Based on this observation,
several attempts have focused on incorporating multiple
negatives and/or multiple positives simultaneously. One
such approach is the InfoNCE loss [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. The latter is
promising due to its automatic hard example mining. Models like
CDC [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] maximize the mutual information of future data
in latent space using recurrent neural networks. This
approach has been adapted for image and unsupervised patch
representation and can potentially be generalized to other
representation types. Moreover, the Supervised CDC model
[
        <xref ref-type="bibr" rid="ref34">34</xref>
        ] reintroduces a supervised setting, allowing
generalization to any number of positive examples. However, this
approach has yet to be evaluated on the human resource
domain.
      </p>
      <p>The following section describes our proposition of a
model handling documents structures in diferent languages:
a neural encoder based on contrastive learning using
InfoNCE.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Approach</title>
      <p>In the following, we denote project proposals as  for all
project  ∈ P, where P is the set of all possible projects.
Similarly, freelancers’ profile are denoted as  for all
freelancer  ∈ F, where F is the set of all possible freelancers.
This section aims to describe two-tower encoder models,
denoted as  and  , which encode and enable the
retrieval of freelancers based on submitted projects.
Generic documents (i.e. either a proposal or a profile) are
denoted as , for all projects or freelancers  ∈ {F ∪ P}.
Each document consists of diferent sections, denoted as
,, for all  ∈ section and  ∈ {F ∪ P}. Here,  represents a
type among the section diferent section types. Each section
is plain text, and thus, each document is a set of section
texts, denoted as:</p>
      <p>= {, |  ∈ section}.</p>
      <p>The role of the models is to encode these documents into
embedding vectors. Specifically, we have:
 ( ) = 
and
 () = .</p>
      <p>However, these embeddings must lie in the same
representation space, where the distance between them is
semantically meaningful in terms of skill matching. Projecting
a project into this space allows for retrieval by proximity.
To achieve this, we leverage past interactions between
freelancers and projects using a contrastive loss function.
As illustrated in Fig. 1, for past interactions, we consider a
recommended freelancer as a negative match for a project
if either the company or the freelancer himself declares that
the freelancer does not have the skills. Such cases are
considered negative project-freelancer pairs. If the freelancer
replies that they are interested in the project, then the pair
is considered positive. If no feedback is provided, the pair is
discarded from the dataset.</p>
      <p>In the subsequent sections, we first present the architecture
before describing the loss used for training the models and
obtaining a semantically meaningful representation space.</p>
      <sec id="sec-3-1">
        <title>3.1. Architecture</title>
        <p>For both proposals and profiles document types, the same
architecture  is used for both models ( ,  ) of the
two tower approach. Therefore, the two models
 ( ) =  ( ,  ) and
 () =  ( , ),
depend on the trained weights  and  respectively.
For a generic document type , whether a profile or a
proposal, the following detailed architecture  is designed to
process the input, resulting in its corresponding embedding
vector  (, ) = . This embedding vector will
subsequently be utilized for retrieval purposes.</p>
        <p>The proposed architecture is designed to process any
document type consisting of text, with the objective of leveraging
its inherent structure, while keeping language alignment.
It is depicted in Fig. 2 and can be summarized as follows :
A pretrained multilingual backbone is used to capture
local context, specifically at the section level, while handling
diferent languages, and is described in section 3.1.1 (
Section encoding on Fig. 2). A positional encoding is added
for the model to carry the indication of the section type, as
explained in section 3.1.2 (Categorical encoding on Fig. 2). In
complement, a global context processing at the document
level is introduced through a dedicated transformer head,
which is detailed in section 3.1.3 (Document context on Fig.
2). Both components ensure a comprehensive
understanding of local and global contexts within the document, while
the use of a frozen pre-trained multilingual backbone
ensure language alignment. Finally, the section 3.1.4 explains
how embedded sections are balanced for obtaining the final
document representation (Documents embedding on Fig. 2).</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Leveraging pre-trained multilingual model : section-level context</title>
          <p>By considering a document as a sequence of sections, and
each section as a sequence of tokens, it becomes possible to
process these sections independently using state-of-the-art
pre-trained language models.</p>
          <p>
            To encode the sequence of tokens  with a length of 
within a section , = { |  = 1, . . . , }, we utilize a
sequence-to-sequence transformer [
            <xref ref-type="bibr" rid="ref35">35</xref>
            ], which serves as the
backbone of our architecture, represented as follows:
Backbone(,) = token, = {token,, |  = 1 . . . }.
(1)
By processing all sections using the same model, we ensure
that each token embedding token,, ∈ token, captures
context at the section level. This is achieved through the
transformer model’s architecture and its attention
mechanism, which focuses on the relationships within the section’s
sequence of tokens.
          </p>
          <p>
            The choice of a multilingual similarity backbone has been
made considering the need for a latent space that is both
organized based on semantic similarity [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ], and possesses
language alignment [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. Such models are considered as great
semantic extracting encoders but encode the full sequence
of tokens as a unique embedding vector using a pooling
layer. Moreover, the resulting embedding is often fed into
a projection layer, which participates in the latent space
organization.
          </p>
          <p>Hence, using a model without a projection layer after
pooling will allow semantic similarity and language alignment
to be directly reflected in the token embeddings. In other
words, the token embeddings token,, ∈ token, of a
section , will inherently maintain semantic alignment across
diferent languages. Therefore, we assume that using such
a model as a sequence-to-sequence backbone (i.e. with no
ifnal projection layer), omitting the final pooling layer and
freezing its weights, will result in the conservation of the
semantic space organization and the alignment across
languages.</p>
          <p>The following sections describe the adaptation of this
backbone and the specification of the whole architecture to the
freelancer-project domain.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Diferentiating between section : positional encoding</title>
          <p>By concatenating the encoded token sequences from each
section, token, , we can construct a single sequence for the
entire document, resulting in t′oken .</p>
          <p>However, concatenating the sequences leads to the loss of
information regarding the original section from which each
token is derived. While each token in the newly formed
sequence retains its semantic meaning, section-level
context, and positional information, it no longer carries any
indication of its section type.</p>
          <p>To alleviate this issue, we propose the use of categorical
encoding to incorporate section type information at the
token level, similar to how positional encoding provides
positional information to the model. To achieve this, we
utilize learned embeddings categorical for each section type
label  ∈ section, and add these learned weights to the token
embeddings. This process is formalized as follows:
t′oken =</p>
          <p>{token,, + categorical |  = 1...}
⨀︁
∈section
= {′token,, |  = 1 . . . },
(2)
where the length  of the sequence is the sum of the lengths
of all sections in the document, given by  = ∑︀∈section ,
and ⊙ denotes the concatenation operation.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.1.3. Introducing a transformer head : document-level context</title>
          <p>
            At this stage, each token represents section-level context
and remains language-agnostic due to the use of the frozen
backbone model. To incorporate document-level context,
we process the concatenated sequence t′oken using a
BERTlike sequence-to-sequence transformer [
            <xref ref-type="bibr" rid="ref36">36</xref>
            ] as an
architecture head. This trained model head leverages the attention
mechanism to enrich each token’s embedding with context
from tokens across diferent sections.
          </p>
          <p>Given the frozen backbone and the previous processing
steps, this context integration is grounded in semantic
meaning, section type, and aligned across languages.
For each document , a new sequence t′o′ken is obtained
from t′oken , such a:</p>
          <p>Head(t′oken ) = t′o′ken, = {′′token, |  = 1 . . . }.
(3)
This ensures that each token’s representation is both
comprehensive and contextually aware, reflecting the document
as a whole.</p>
          <p>Finally, empirical tests have demonstrated that
incorporating a skip connection layer between the output of the frozen
backbone embeddings token, for all  ∈ section and the
embeddings produced by the head t′o′ken, facilitated the
training of the head. We hypothesize that this
improvement arises from the head’s ability to better adapt to the
pretrained backbone by learning the diference between the
generic domain and our specific domain.</p>
        </sec>
        <sec id="sec-3-1-4">
          <title>3.1.4. From embeddings tokens to vector representation</title>
          <p>Once the embedded tokens t′o′ken ∈ t′o′ken are obtained for
a document , the final goal is to derive a single
representation vector, denoted as . To achieve this, a final pooling
layer is applied.</p>
          <p>Our experiments indicate that using a weighted average
pooling method yields better results than simple average
pooling, which assigns equal importance to each token in
the document sequence. This improvement is due to the
fact that diferent sections within a document may vary
significantly in length and relevance, making some tokens,
such as those from a lengthy description, less significant
than others, like those from a job title. To mitigate this issue,
we set the pooling weights inversely proportional to the
section length. Taking into account the skip connection, the
ifnal representation vector is computed as follows:
 =
|∑sec︁tion| ∑︁ t′o′ken,+(∑︀=0 )
=1 =1 .|section|
+ token,,
. (4)
This operation efectively performs pooling in two stages:
ifrst within each section, and then across all sections,
resulting in a more balanced and representative final document
representation vector.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Training objective</title>
        <p>For structuring the latent space based on similarity
between profiles and project proposals, a contrastive learning
paradigm is adopted. Hence, a two-tower approach is used,
with one tower for profiles and the other for project
proposals. Finally, a contrastive loss function is applied for
optimization.</p>
        <p>Those kinds of loss leverage the relationship between
positive and negative pairs of elements. During training, the
model can receive input in the form of pairs corresponding
to an anchor and a positive or an anchor and a negative
element ; triplets corresponding to an anchor, a positive,
and a negative element, or n-uplets with multiple positives
and negatives elements.</p>
        <p>Implementing the loss function for n-uplets allows for a
more flexible and generalized approach better aligned with
(2, 2+, 2− ),
(1, 1, 1− ),
(2, 2+, 2− )</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. The contrastive loss approach</title>
          <p>
            The training objectives in a contrastive setting aim to
optimize the similarity or distance between document
embeddings by maximizing alignment for positive relations and
minimizing it for negative ones. This framework can be
applied to both supervised and unsupervised contexts.
triplet loss is formulated as:
Among these methods, the triplet loss [
            <xref ref-type="bibr" rid="ref31">31</xref>
            ], originally
derived from the computer vision paradigm, operates on
triplets of entities denoted as (, +, − ). Here,  serves as
the anchor, + is the positive example relative to the anchor,
and − is the negative example. Given a distance function
 and the document embeddings , + , and − , the
︁(
ℒtriplet(, +, − ) = max (, + )− (, − )+, 0
where  represents a specific margin. This loss function
encourages the model to ensure that the positive example is
closer to the anchor than the negative example by a margin
of at least , refining the embedding space. Lately, a more
commonly used loss function is the supervised contrastive
InfoNCE loss [
            <xref ref-type="bibr" rid="ref32">32</xref>
            ]. For an entity , this loss function can be
generalized to work with multiple positive examples  and
negative examples ′. It is expressed as:
(5)
︁)
ℒInfoNCE(, , ′) =  ·
          </p>
          <p>− log
∑︁
′∈</p>
          <p>∑︀
′′∈∪′
exp(.′ / )
exp(.′′ / )
(6)
with a temperature parameter  and  =
Both aforementioned losses can be generalized to handle
n-uplets. It is constructed by considering the relationships
between multiple elements, as detailed in the following
|1| .
sections.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Generalizing to n-uplet</title>
          <p>The relationships between elements can be modeled as
a bipartite graph, as shown in Fig.</p>
          <p>In this
figure, the nodes represent the entities—freelancers
denoted as 1, 1+, 1− , 2, 2+, 2− , and projects
1, 2—while the edges depict similarity relations based on
historical data, such as skill matches (+) or contrasting (-). In
this setting, the adjacency matrix  correspond to a useful
representation of the project proposal-profile relationship
for tensor and matrix computation. To computationally
iflter or weight document pairs in the loss function, we
represent a positive skill match with 1, a negative match with
− 1, and an unknown relation with 0. This matrix can be
preprocessed based on skill set agreements in historical data,
as described in Section 3.</p>
          <p>Furthermore, to help structure the latent space, a
freelancerto-freelancer skill similarity relation using a project as a
pivot can be computed. Specifically, if two freelancers are
both a positive skill match for a project, it indicates that
they share a subset of skills that qualify them to undertake
the project. Conversely, if one freelancer is a positive match
while the other is a negative match, it suggests that the
required skills are possessed by one but not the other. The
model is then expected to learn these similarities and
diferences.</p>
          <p>In this setting, when training a model at the batch level,
portion of the adjacency matrix can be extracted :
a subset of projects  ⊂
selected freelancers  ⊂</p>
          <p>P and the corresponding set of</p>
          <p>F can be sampled, and the relevant
project-freelancer = ∈,∈ ,
(7)
for computational purposes.</p>
          <p>Using this submatrix, we can compute the freelancer
similarity relation and, by extension, derive the
transi{− 1, 0, 1}| |×|  | in the following manner :
tive freelancer-to-freelancer adjacency matrix freelancer ∈
freelancer = ∈,∈
 2.( &gt;  ′)∈,′∈
.[1 − ( &lt; 0)∈,∈
 2] (8)
This operation retains only the upper triangular part (( &gt;
 ′) ∈ ,  ′ ∈  ) and ensures that two freelancers who
are negatively related to a project do not become positively
related (1 − ( &lt; 0) ∈ ,  ∈   2).</p>
          <p>During our experiments, we did not resort to a
project-toproject similarity relation. Such a relation is more
challenging to determine and is not directly obtainable from our
current settings. Indeed, a freelancer being capable of
solving two diferent projects does not necessarily imply that
the projects are similar, as they could require diferent sets
of the freelancer’s skills. In contrast, freelancer
relationships can be directly derived and even supplemented from
various data sources.</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Adding weak negatives</title>
          <p>While learning a latent space organization in a contrastive
setting, negative examples are crucial as they serve as
counterexamples, helping to refine both broad scenarios and edge
cases. However, if we rely solely on the skill match relation
in historical data, we are limited to addressing only edge
cases. In practice, freelancers are contacted based on their
perceived relevance to a project, and if negative feedback is
received, it indicates that the project is not suitable for the
freelancer at a more granular level (i.e. on finer details, not
visible at first glance).</p>
          <p>To introduce more trivial counterexamples, which we refer
to as weak negatives, we utilize a categorical feature of the
profiles. Specifically, when creating a profile, a freelancer
 ∈  must select a job category  ∈ profile . By
defining an arbitrary negative similarity relation between two
freelancers, [,  ′] = − 1, when they belong to diferent
categories  ̸= ′ , we can incorporate weak negative
examples since these freelancers are expected to have diferent
skill sets. The transitive freelancer to freelancer adjacency
then becomes:
freelancer =(project-freelancer
 2</p>
          <p>+ ∈,′∈ )
· ( &gt;  ′)∈,′∈ .
(9)</p>
          <p>These negative examples are intended to assist in the
initial training iterations, aid convergence, and improve the
organization of the latent space at a coarse granularity level.</p>
        </sec>
        <sec id="sec-3-2-4">
          <title>3.2.4. Adjacency matrix based contrastive losses</title>
          <p>We can reformulate the diferent contrastive loss functions
using the previously mentioned adjacency matrix within our
framework. Our goal is to obtain a loss function, ℒ(, , ),
that takes a set of proposals  , a set of freelancers  , and
an adjacency matrix  as input.</p>
          <p>First, we introduce a triplet loss that is calculated for all
possible document triplets, based on the distances between
two sets of entities, D and D’. Afterward, we filter the results
using the adjacency matrix to retain only the valid triplets.
This loss function is referred to as ℒA-triplets and is computed
as follows:
ℒA-triplets(, ′, ) = ∑︁ ∑︁
∑︁</p>
          <p>ℒtriplet(, ′, ′′)
∈ ′∈′ ′′∈′
.1,′ &gt;0,,′′ &lt;0.</p>
          <p>For tensor manipulation, the triplet loss ℒA-triplets can
be eficiently computed by first calculating the loss for all
possible document triplets, producing a tensor with the
shape R||×| ′|×| ′|. This tensor is then filtered using an
element-wise product with a mask tensor. The mask
tensor is derived from the relevant submatrix of the adjacency
matrix between the relations  and ′, and it retains only
the valid triplets specifically, where the second dimension
of the tensor represents the positive example relative to the
anchor, and the third dimension represents the negative
example relative to the anchor.</p>
          <p>The final triplet loss used in our experimentation is
computed by summing two A-triplets. In the first case, the set
of project proposals  ⊂ P is considered as the anchor,
and in the second case, the set of profiles  ⊂ F serves
as the anchor. In both scenarios, the positive and negative
elements are drawn from the set of profiles  ⊂ F. The
resulting loss is defined as the following :
ℒdual A-triplets(, , ) =ℒA-triplets(, , project-freelancer)
+ ℒA-triplets(, , freelancer).</p>
          <p>In our case, we want the InfoNCE loss to be fully
supervised, meaning we want to control exactly which document
are negative in the loss. To do so, we can leverage the
adjacency matrix:</p>
          <p>1
ℒA-InfoNCE(,′,)= |&gt;0| . ∑︀∈ ∑︀′∈′ − log((,′,)),
(12)
(10)
(11)
with
 (, ′, ) = ∑︀</p>
          <p>exp(.′ / ).1,′ &gt;0
′′∈∪′ exp(.′′ / ).1,′ ̸=0
.
(13)
From this implementation, the tensor computation can be
done in two part, first computing all dot products between
all possible pairs then applying a masked softmax and
filtering on positive pairs. Then consider both projects  ⊂ P
freelancers  ⊂ F pairs and freelancer  ⊂ F only pairs
and adding the weak negatives in  computed from
categories, we get:
ℒdual A-InfoNCE(,,)=ℒA-InfoNCE(, , project-freelancer)
+ ℒA-InfoNCE(, , freelancer + ∈,′∈ )
(14)
We note that for the A-InfoNCE the cosine similarity is
better suited than a distance function to reflect on
semantic similarity, as it implies the maximization of a cartesian
product.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment</title>
      <p>With our experiment, we want to evaluate our choices on
how to leverage both pretrained models and our document
data, while also comparing our approach to a reference
model and models based on state-of-the-art approaches.
The diferent model evolution tested here are to reflect on
the impact of modifying the weights of pretrained models
in the semantic organization and the language alignment of
the latent space, but also the influence of our solutions to
exploit document content and structure.</p>
      <sec id="sec-4-1">
        <title>4.1. Models</title>
        <p>
          In the following, we compare our approach presented in
section 3 against one reference model and several models and
architectures we tested while building our final approach:
• The first model, used as reference, is PaLM 2,
specifically its Gecko version1. This is a closed commercial
model from Google, based on a decoder-style
transformer architecture with over 10 billion parameters.
It is fifty to one hundred times bigger than the other
models evaluated here.
• The second model, later referred to as S-BERT, is
a multilingual model from the Sentence BERT[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
family which is finetuned on our data. This approach
is fairly similar to what has been proposed with
conSultantBERT [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. However, we use a diferent
backbone and training loss.
• The third model is also based on a Sentence BERT [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ],
but applied per section of text. This is followed by an
average pooling of the resulting section embedding
vectors to produce a single vector per document. For
clarity, we refer to this model as Section-S-BERT.
Concerning the first two models (Gecko and S-BERT), both
take project proposals and profiles as plain text inputs. To
obtain only one piece of text, all sections are concatenated,
while Section-S-BERT and our proposed method process
the sections separately, without concatenation. S-BERT and
1textembedding-gecko-001 from May 2023
Section-S-BERT are trained using the same multilingual
SentenceBERT [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] as backbone2 which is aligned through
distillation [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Whereas, our approach leverages a multilingual
E5 [
          <xref ref-type="bibr" rid="ref37 ref6">37, 6</xref>
          ] as backbone3. This choice is significant because,
unlike SentenceBERT, the E5 model does not project the
ifnal layer, which is crucial for our method, as discussed in
section 3.1.1.
        </p>
        <p>Some models were evaluated out-of-the-box in a zero-shot
manner, while others were trained on our dataset, as
described in the following sections.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Dataset</title>
        <p>To avoid data contamination, the data used for the following
experiments and testing were split in a temporal manner.
Specifically, the test set consists of ten weeks of past
interactions starting from March 2023, comprising approximately
61k interactions, 4k project proposals, and 32k freelancer
profiles. For the training and validation data, which were
split in an 80/20 ratio, all interactions that occurred prior to
the aforementioned date were considered. Furthermore, all
profiles were recomputed to reflect their state at the time of
the interactions.</p>
        <p>Concerning the freelancer profiles, various declared
information are used as sections, including a free-text job title
and description, a job family and category selected from
our taxonomy, and a set of skills, which may be provided
in free text or selected from those previously defined by
other freelancers. For the project proposals, the retained
information includes the mission title, its associated job title
and description as free-text, a job family and category from
our taxonomy, as well as mandatory and bonus skills chosen
from the list of freelancer-declared skills, as illustrated in 1.</p>
        <p>We consider empty sections, referring to sections not
iflled by the freelancers or the companies, as sequences
containing only the tokens [CLS] and [END]. This data
modeling approach ofers two key benefits: (1) it enables
the use of static computational graphs for optimization, and
(2) it provides the models with explicit information about
the emptiness of the section.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Training setting</title>
        <p>In this experiment, Gecko and Section-S-SBERT were
evaluated in a zero-shot setting, while all other models were
trained on our dataset, either with full retraining or with
a frozen backbone. Specifically, we evaluated both S-BERT
and Section-S-BERT with fully retrained backbones,
starting from their initial pretrained states. Additionally, we
trained a Section-S-BERT with a projection layer and our
ifnal proposed approach, both with their pretrained
backbone weights frozen. In the case of Section-S-BERT with a
projection layer, it should be noted that only the projection
layer was trained.</p>
        <p>
          Regarding the loss functions, almost all models were
optimized using a triplet loss [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Our approach, as well as the
Section-S-SBERT with a projection layer, were trained using
the InfoNCE loss [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]. The trainings using the triplet loss
were conducted over ten epochs with a batch size of two,
i.e., two project proposals per batch. Based on these project
proposals, one negative and two positive freelancers were
sampled, enabling the creation of two freelancer triplets per
2https://huggingface.co/sentence-transformers/distiluse-basemultilingual-cased-v1
3https://huggingface.co/intfloat/multilingual-e5-small
batch, in addition to the four project-freelancer triplets.
For the trainings based on the InfoNCE loss, the models
were trained for two epochs with one project proposal per
batch. The same freelancer sampling strategy as used with
the triplet loss was employed, but with the addition of thirty
weak negative freelancers.
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Evaluation metrics</title>
        <p>Each of the aforementioned models aims at encoding
projects and freelancers for retrieval within the proposal
step of a recommender system. However, evaluating them in
the same manner as we would evaluate the full pipeline,
using ranking or retrieval evaluation metrics, is not perfectly
suited. Moreover, the proposal-profile relevancy, in the
context of document retrieval, is only partially available in our
dataset, as the relevancy of all possible proposal-profile pairs
remains unknown. Finally, evaluating solely based on past
interactions may not provide a complete picture of the
intended performance of a retrieval system. Indeed, historical
data is biased by previous processes and lacks information
about preposterous project-freelancer pairs.</p>
        <p>To better reflect the future usage of the models, we chose to
evaluate them in three diferent settings:
• Supervised, using only historical interactions and
the interacted profiles,
• Unsupervised, by simulating a retrieval with
knearest-neighbor (k-NN) using all test profiles,
• Weakly supervised, using the simulated retrieval
enriched with the historical interactions.</p>
        <p>The cosine similarity or Euclidean distance was used to
compute the predicted proposal-profile score when the training
loss was InfoNCE or the triplet loss, respectively. Both were
converted to scores or distances according to the
requirements of the metrics and the k-NN algorithm.</p>
        <p>Considering the supervised experiments, we tested the
recall metric using score comparison, denoted as (· ), to
determine true and false positives. This metric, which we will
refer later to as "valid score recall," is computed as follows:
recallsingle(, +, − ) =</p>
        <p>∑︁
+∈+
1 [︀ s(, + ) &gt; s(, − ), ∀− ∈ − ]︀
|+|
(15)
where  is a project proposal,  corresponds to its
encoding, + ∈ + the relative positive freelancers, − ∈ −
represents its negative freelancers, and + , − are their
respective embedding vectors.</p>
        <p>In this setting, we also compare the average of positive score
against the average score of negatives, with the following :
recallall(, +, − ) =</p>
        <p>⎡⎡
1 ⎣⎣</p>
        <p>∑︁
+∈+
s(, + )
|+|
⎤
⎦ &gt; ⎣
⎡</p>
        <p>∑︁
− ∈−
s(, − )
|− |
⎤⎤
⎦⎦ .</p>
        <p>(16)
Inverting the comparison and using a distance function
allows for the computation of the metric when the triplet loss
was used.</p>
        <p>For the unsupervised setting, a retrieval is first simulated
using a k-NN algorithm for each project on all freelancers
in the test set. Then, the proportion of freelancers who
0.437
0.477
have declared the same category as the one in the project
proposal is computed.</p>
        <p>We introduce the A-overlap metric to quantify lexical
similarity between freelancers and project proposals using exact
lexical matching on a predefined set of terms:</p>
        <p>A-overlap(; ,  ) = ∑︁ | ∩  | ,
∈ | | · | |
(17)
where  denotes either the set of skills  or of categories
. The overlap score is calculated for each freelancer, and
the results are averaged across all retrieved freelancers  .
Finally, for the weakly supervised setting, both the
simulated retrieval and the historical interaction data are
combined. This allows us to compute the proportion of retrieved
positives, such as :
retrieved-positves(, ,  +) = | ∩ +| ,
|+|
and retrieved negatives :
retrieved-negatives(, ,  − ) = | ∩ − | ,
|− |
(18)
(19)
among the retrieved freelancers  .</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Results</title>
        <p>The following results, reported in table 1, consist of the
metrics defined previously, computed across all models
described in Section 4.1. Each metric represents the average
value computed over all project proposals in the test dataset.
Gecko and Section-S-BERT, corresponding to the first and
fourth lines of Table 1 are evaluated in a zero-shot
manner, and provide strong results for the unsupervised metrics
related to category and skill overlap. These results reflect
the efectiveness of the semantic-aware pretraining and the
models’ ability to organize documents in the latent space
based on their content. However, due to the lack of
finetuning on our dataset, the supervised valid score recall
metric reveals these models’ inability to accurately structure
the latent space according to our data domain and historical
interactions.</p>
        <p>Adapting the models to our domain through full retraining
yields improved results, as shown in the second and third
rows of the table. However, the low values obtained for
the weakly-supervised and unsupervised metrics suggest a
degradation in the semantic space organization established
during the pretraining phase.</p>
        <p>The results reported in the last three rows of the table,
corresponding to models with frozen backbones, appear to ofer
the best of both worlds. These models successfully retain
the benefits of the semantic pretraining of the backbones
while adapting to our domain. This is evidenced by the
improved supervised metrics compared to the fully retrained
versions, without significantly degrading the unsupervised
metrics.</p>
        <p>Adding synthetic weak negatives in the training improves
the unsupervised metrics, as reported in the last two rows
of the table. We hypothesize that this improvement is due to
the backbones being adapted to both edge cases and broader
scenarios using more extensive data. In contrast, the other
models adapt to our domain by focusing exclusively on edge
cases during training.</p>
        <p>Finally, our latest approach appears to be the most efective
in leveraging content. Not only does it avoid degrading the
unsupervised metrics, but it also achieves the best overall
results in this regard. It seems to adapt the backbone to edge
cases based on historical data while simultaneously
adapting it to our domain, being 50 times smaller than PALM 2. It
is important to note that our approach enables the retrieval
of more positives than negatives in the historical data,
although the number of negatives remains substantial. This
should be carefully considered when designing the entire
pipeline to ensure proper filtering of these freelancers.
In addition to the aforementioned quantitative metrics, we
present the freelancers’ latent space organization for our
trained approach in Fig. 4. A two-dimensional projection of
the freelancers’ density is illustrated both by job family in
the left of the Figure, and by category in its right, specifically
for the web, graphic and design job family. When focusing
on the density distributions by family, denser spots can be
observed. Indeed, some families exhibit a single dense spot,
while others display multiple dense areas.</p>
        <p>When exploring the latent space representing the job
categories, it can be observed that the previously mentioned
denser spots correspond to regions of space reserved for
categories within the families, even though some categories
may have multiple dense spots. We hypothesize that this
may be due to the presence of sub-categories and diferent
typologies of freelancers.</p>
        <p>This latent space can be leveraged not only for retrieval, but
also to gain a deeper understanding of our market and
freelancers. In particular, it could be used to drive the evolution
of our job category taxonomy. Overall, our model appears
to efectively organize freelancers based on semantic
relationships.</p>
        <p>In conclusion of these experiments, we report metrics based
on the language alignment aspect of our approach, as shown
in Table 2. These results highlight the model’s ability to
maintain pretrained language alignment while
simultaneously adapting the backbone to our domain. This result is
particularly notable given that our dataset contains
approximately ten times more French profiles than profiles in other
languages. Hence, achieving efective language alignment
solely based on this data is challenging without the use of
pretrained backbones, highlighting the good performances
of our approach.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Deployment in production</title>
      <p>As outlined in Section 1, a legacy recommender system
with one filtering phase and one ranking phase was already
in place. This section details the integration of the newly
developed retriever within the existing infrastructure and
examines the impact of its deployment in our production
environment.</p>
      <sec id="sec-5-1">
        <title>5.1. Infrastructure</title>
        <p>As illustrated in Fig. 1, we implemented a conventional
architecture for deploying our neural retriever. The
embeddings for all freelancer profiles are precomputed and stored
in a vector database. For this purpose, we selected Qdrant4,
4https://qdrant.tech/
due to its high performance and advanced filtering
capabilities, including geographical filtering. The vector database
is updated daily to incorporate new and updated profiles.</p>
        <p>At inference time, upon receiving a new project proposal,
its embedding is computed on the fly. Then, we use that
embedding to perform an approximate nearest neighbors
search taking into account hard filters to retrieve a set of
candidates for the projects. These candidates are subsequently
passed to the legacy ranking system for final evaluation.
5.2. A/B test
To ensure that the new retrieval phase did not negatively
impact conversion rates, an A/B test was conducted over
the course of November 2023. As anticipated, there was a
significant improvement in response times, with the 95th
percentile latency decreasing from tens of seconds (and
occasionally exceeding a minute) to a maximum of 3 seconds.
More unexpectedly, a 5.63% improvement in conversion for
efective matches ( i.e., cases where both the freelancer and
client confirm the match) was observed. We hypothesize
that the new retriever, with its more advanced technology,
efectively eliminates suboptimal candidates that might have
been selected by the legacy ranker.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This research aimed to develop two encoder models that
generate embeddings for freelancer profiles and project
proposals, enabling eficient vector-based retrieval. The models
were designed to handle multilingual data, exploit
document structure and content, and scale efectively as part of
a candidate proposal system.</p>
      <p>We proposed and validated an architecture and training
approach, which outperformed other methods we tested.
When deployed in production, it improved latency without
any matching performance loss, leading even to an improved
conversion rate.</p>
      <p>Future work could explore improving data preparation for
contrastive learning, such as creating better positive and
negative pairs. Additionally, using continuous values in
the adjacency matrix, rather than binary edges, could
enhance learning semantic similarity, particularly in synthetic
relationships between freelancers. Combining lexical and
semantic retrieval methods, such as integrating BM25 scores,
also presents a potential area for improvement.
While our models are efective for recommendations, they
are less suited for search tasks where the input are just short
queries instead of complete project proposals. A possible
direction for future research could involve a three-tower
architecture, with one tower dedicated to handling search
queries.</p>
      <p>Finally, an important area of research within human
resources technology involves studying biases induced by
such models. That topic is deeply important to Malt and
for the AI Act. That is why we plan on investigating and
mitigating potential biases in our retriever models. This
would ensure fairness and transparency in the
recommendations, while exploring techniques to identify and correct
any unintended biases that may arise during training and
deployment.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kenthapadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Le</surname>
          </string-name>
          , G. Venkataraman,
          <article-title>Personalized job recommendation system at linkedin: Practical challenges and lessons learned</article-title>
          , in: RecSys,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Geyik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ozcaglar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Thakkar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kenthapadi</surname>
          </string-name>
          ,
          <article-title>Talent search and recommendation systems at linkedin: Practical challenges and lessons learned</article-title>
          ,
          <source>in: SIGIR</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Mitra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          ,
          <article-title>An introduction to neural information retrieval, Foundations and Trends in Information Retrieval (</article-title>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks, in: EMNLP-IJCNLP, Association for Computational Linguistics</article-title>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Making monolingual sentence embeddings multilingual using knowledge distillation</article-title>
          ,
          <source>EMNLP</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <article-title>Multilingual e5 text embeddings: A technical report</article-title>
          , arXiv preprint arXiv:
          <volume>2402</volume>
          .05672 (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Mpela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zuva</surname>
          </string-name>
          ,
          <article-title>A mobile proximity job employment recommender system</article-title>
          , in: icABCD,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>O.</given-names>
            <surname>Chenni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bouda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Benachour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zakaria</surname>
          </string-name>
          ,
          <article-title>A contentbased recommendation approach using semantic user proifle in e-recruitment</article-title>
          ,
          <source>in: TPNC</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <article-title>Topic modeling driven content based jobs recommendation engine for recruitment industry, Procedia computer science (</article-title>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-C.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hong</surname>
          </string-name>
          , S.-W. Kim,
          <article-title>Exploiting job transition patterns for efective job recommendation</article-title>
          ,
          <source>in: IEEE SMC</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Reusens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lemahieu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Baesens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sels</surname>
          </string-name>
          ,
          <article-title>A note on explicit versus implicit information for job recommendation, Decision Support Systems (</article-title>
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Hoq</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Adnan</surname>
          </string-name>
          ,
          <article-title>User interaction analysis to recommend suitable jobs in career-oriented social networking sites</article-title>
          , in: IEEE ICoDSE,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <surname>X. Zhang,</surname>
          </string-name>
          <article-title>Resumegan: an optimized deep representation learning framework for talentjob fit via adversarial learning</article-title>
          ,
          <source>in: CIKM</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <article-title>Learning to match jobs with resumes from sparse interaction data using multi-view co-teaching network</article-title>
          ,
          <source>in: CIKM</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V. S.</given-names>
            <surname>Dave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , M. Al Hasan,
          <string-name>
            <given-names>K.</given-names>
            <surname>AlJadda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Korayem</surname>
          </string-name>
          ,
          <article-title>A combined representation learning approach for better job and skill recommendation</article-title>
          ,
          <source>in: CIKM</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <article-title>Learning efective representations for person-job fit by feature fusion</article-title>
          ,
          <source>in: CIKM</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaya</surname>
          </string-name>
          , T. Bogers,
          <article-title>Efectiveness of job title based embeddings on résumé to job ad recommendation, in: CEUR-WS, CEUR-</article-title>
          <string-name>
            <surname>WS</surname>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>E.</given-names>
            <surname>Lacic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Reiter-Haas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Duricic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Slawicek</surname>
          </string-name>
          , E. Lex,
          <article-title>Should we embed? a study on the online performance of utilizing embeddings for real-time job recommendations</article-title>
          ,
          <source>in: ACM RecSys</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-R.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <article-title>Modeling two-way selection preference for person-job fit</article-title>
          ,
          <source>in: ACM RecSys</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>C.</given-names>
            <surname>Qin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , E. Chen,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <article-title>Enhancing person-job fit for talent recruitment: An abilityaware neural network approach</article-title>
          ,
          <source>in: The 41st international ACM SIGIR conference on research &amp; development in information retrieval</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>25</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ramanath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Inan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Polatkan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ozcaglar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kenthapadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Geyik</surname>
          </string-name>
          ,
          <article-title>Towards deep and representation learning for talent search at linkedin</article-title>
          , in: CIKM,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.-L.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <article-title>Person-job fit estimation from candidate profile and related recruitment history with co-attention neural networks</article-title>
          ,
          <source>Neurocomputing</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Medentsiy</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Graus, consultantbert: Fine-tuned siamese sentence-bert for matching jobs and job seekers</article-title>
          ,
          <source>HR@RecSys</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Le</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Distributed representations of sentences and documents</article-title>
          , in: ICML, PMLR,
          <year>2014</year>
          , pp.
          <fpage>1188</fpage>
          -
          <lpage>1196</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>A. M. Dai</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Olah</surname>
            ,
            <given-names>Q. V.</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Document embedding with paragraph vectors</article-title>
          ,
          <source>arXiv preprint arXiv:1507.07998</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vincent</surname>
          </string-name>
          ,
          <article-title>Representation learning: A review and new perspectives</article-title>
          ,
          <source>IEEE PAMI</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Metz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chintala</surname>
          </string-name>
          ,
          <article-title>Unsupervised representation learning with deep convolutional generative adversarial networks</article-title>
          ,
          <source>ICLR</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Le-Khac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Healy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Smeaton</surname>
          </string-name>
          ,
          <article-title>Contrastive representation learning: A framework and review</article-title>
          , IEEE Access (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zolfaghari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gehler</surname>
          </string-name>
          , T. Brox, Crossclr:
          <article-title>Crossmodal contrastive learning for multi-modal video representations</article-title>
          ,
          <source>in: ICCV</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1450</fpage>
          -
          <lpage>1459</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hermans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Beyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Leibe</surname>
          </string-name>
          ,
          <article-title>In defense of the triplet loss for person re-identification</article-title>
          ,
          <source>arXiv preprint arXiv:1703.07737</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>F.</given-names>
            <surname>Schrof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kalenichenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Philbin</surname>
          </string-name>
          ,
          <article-title>Facenet: A unified embedding for face recognition and clustering</article-title>
          ,
          <source>in: Proceedings CVPR</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>A.</surname>
          </string-name>
          v. d. Oord,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <article-title>Representation learning with contrastive predictive coding</article-title>
          , arXiv preprint arXiv:
          <year>1807</year>
          .
          <volume>03748</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>A.</surname>
          </string-name>
          v. d. Oord,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <article-title>Representation learning with contrastive predictive coding</article-title>
          , arXiv preprint arXiv:
          <year>1807</year>
          .
          <volume>03748</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>P.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Teterwak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Isola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maschinot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Krishnan</surname>
          </string-name>
          ,
          <article-title>Supervised contrastive learning</article-title>
          ,
          <source>NeurIPS</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>I.</given-names>
            <surname>Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. V.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <article-title>Sequence to sequence learning with neural networks</article-title>
          ,
          <source>NIPS</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pretraining of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Majumder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <article-title>Text embeddings by weakly-supervised contrastive pre-training</article-title>
          ,
          <source>arXiv preprint arXiv:2212.03533</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>