=Paper=
{{Paper
|id=Vol-2512/paper2
|storemode=property
|title=Tripartite Vector Representations for Better Job Recommendation
|pdfUrl=https://ceur-ws.org/Vol-2512/paper2.pdf
|volume=Vol-2512
|authors=Mengshu Liu,Jingya Wang,Kareem Abdelfatah,Mohammed Korayem
|dblpUrl=https://dblp.org/rec/conf/kdd/LiuWAK19
}}
==Tripartite Vector Representations for Better Job Recommendation==
<pdf width="1500px">https://ceur-ws.org/Vol-2512/paper2.pdf</pdf>
<pre>
Tripartite Vector Representations for Better Job
                Recommendation

Mengshu Liu1 , Jingya Wang1 , Kareem Abdelfatah1 , and Mohammed Korayem1

                     Careerbuilder LLC, Norcorss GA 30092, USA


        Abstract. Job recommendation is a crucial part of the online job re-
        cruitment business. To match the right person with the right job, a good
        representation of job postings is required. Such representations should
        ideally recommend jobs with fitting titles, aligned skill set, and reason-
        able commute. To address these aspects, we utilize three information
        graphs (job-job, skill-skill, job-skill) from historical job data to learn
        a joint representation for both job titles and skills in a shared latent
        space. This allows us to gain a representation of job postings/ resume
        using both elements, which subsequently is combined with location. In
        this paper, we first present how the representation of each component is
        obtained, then we discuss how these different representations are com-
        bined together into one single space to acquire the final representation.
        The comparison of the proposed methodology against base-line methods
        shows significant improvement in terms of relevancy.

        Keywords: job recommendation · vector representation


1     Introduction
Online recruiting and job portals like Careerbuilder.com, Linkedin. com, and
Indeed.com, have become the norm in the talent acquisition business. Millions
of jobs are posted and even more resumes are uploaded daily. Different machine
learning and information retrieval models have been applied to analyze these
resumes and job descriptions, and multiple efforts have been made to match the
two parties of the recruiting process. A good job/resume representation helps to
improve many downstream products that in turn support the company’s goal of
empowering employment and helping job seekers find jobs and the training they
need. Specifically, it facilitates matching job seekers and employers by improving
our search and recommendation products.
    The content of a job posting incorporates all aspects of a position. Enti-
ties like job title, required skills, experience, degrees, benefits, company culture,
location, etc, can be extracted and normalized. Among these, title, skills, and
location are the top factors when defining a job position. While a job title deter-
mines the nature of a job, skills enrich the job title, and differentiate jobs with
    DI2KG 2019, August 5, 2019, Anchorage, Alaska. Copyright held by the author(s).
    Use permitted under Creative Commons License Attribution 4.0 International (CC
    BY 4.0)
2       M. Liu et al.

the same title by identifying their niceties. In order to offer a good job match,
an accurate representation consisting of both is needed. Rather than simply us-
ing various word representations of job title and skills, recent works [6, 9, 17]
start to consider both the text content and the relationship between matching
pairs. In our previous work, we propose a novel representation learning based
solution, which learns job and skill vector representations into a shared latent
space using three pre-processed graphs [4]. To extend this work, we consider the
interconnection of both job title and skills to learn the vector representation of
job postings/ resumes. By using a retrofitting model similar to the work of [5],
we combine the pre-trained representations of job title and skills into one vector
to represent a job or resume. In terms of location, majority of job seekers prefer
jobs within reasonable commute distance. For many people, relocation is not an
option, and short commute is always a big plus. Similarly, for most companies,
remote employees are not preferred either. Therefore, we explicitly include a lo-
cation vector in our representation. To achieve quick, accurate vector search and
recommendation, we utilize Faiss [7], which is a library for efficient similarity
search and clustering of dense vectors.
    Our contributions in this paper are as follows:
    • The vector representation proposed is applicable for both job postings and
resumes. It’s not only a flexible representation to obtain similar jobs or similar
candidates, but also provides a direct mapping of jobs and resumes.
    • We generate a more holistic vector representation jointly learned for both
titles and skills which can be used in the job recommendation system.
    • We incorporate location explicitly in our representation.
    • We employ retrofitting to refine the job vectors by using the skill vectors,
detailed in Section 3.3.


              Job Postings
                                                                               Title
                                                       Representation


                                          Job Title                           Vector
                                                          Learning

                                                           Joint


                                                                                              Retroﬁtter
                                Parser


                                                                               Skill
                                            Skills                            Vectors
              Resumes


                                          Location
                                                       3D Vectorization


                                                                                   Title + Skills
                                                      Location Vector                  Vector


               Recommendation
                                                                        Concatenation


                                         Faiss          Title + Skills + Location Vector


Fig. 1. Workflow to generate recommendations using our tripartite vector representa-
tions.
           Tripartite Vector Representations for Better Job Recommendation             3

2    Related Work

The two major solutions of job recommendation in existing works are content-
based representation learning and collaborative filtering (CF). CF builds a very
high dimension sparse matrix to maintain the relationship between user and
product [16, 18]. In our paper, this relationship is between each pair of job
and skill. Inspired by the natural language Processing approach (NLP) of skip-
gram [9], [1] proposes to learn item embedding based on interaction history in
the form of a sequence. [19] presents an architecture of CF and points out the
common problem of cold start. CF relies on historical interactions, therefore,
when a new candidate registers or a new job is posted, no interaction record
is available. Content-based method on the other hand, makes recommendations
according to the characteristics of users and products as in [12].
     Most recent works, including our proposed system, use hybrid approaches of
both profile properties and interaction history [11]. [6, 17] jointly learns fea-
ture representation of users and their selected items in one convolutional neural
network. [3] concatenates the learned sparse and dense representation of user’s
activity history, query, and profile as input. In this case, the recommendation
problem is treated as an extreme multi-class classification problem. The clas-
sifier is a series of non-linear activation functions followed by a softmax. [8]
employs a convolutional neural network to encode the user profile followed by a
Long short-term memory (LSTM) [15] to encode the user’s interaction history
in training and continue the LSTM on time axis as a decoder to predict user’s
next activity in testing.
     Our work is similar to the hybrid approaches in terms of learning the em-
bedding of job titles and skills based on both their characteristics and interac-
tions. Meanwhile, it is different that, instead of simply pairing the related jobs
and skills, our profile representation models three specific relationships simul-
taneously between jobs and required skills. Moreover, we employ retrofitting to
achieve better representation.


3    Methodology

In this section, we describe the design and methodology of our tripartite vector
representation. A job posting contains important information of the hiring po-
sition. Most importantly it includes job title, skills, and location. Here, we use
a combined representation vectors of job titles and skills, and obtain a vector
representation for location transformed from latitude and longitude. Figure 1
shows the design of our method. Job posting and resume data go to an in-house
job parser, where job title/skills/location are extracted. While titles and skills
are jointly trained by a representation learning framework (Section 3.1), lo-
cation information is also vectorized and normalized (Section 3.2). In order to
combine the tripartite, the title representation is first retrofitted by a list of skills
associated with each job before added with the location vector (Section 3.3).
4       M. Liu et al.

3.1   Job Title and Skill Vector
Title and skills are the defining features of any job. Embeddings for both titles
and skills are learned in the same k-dimensional space [4], by utilizing three
types of information networks from historical job data: (i) job-job transition
network, (ii) skill-skill co-occurrence network, and (iii) job-skill co-occurrence
network. Our goal is to encode the local neighborhood structures captured by
the three networks.
    For the job-job transition graph, we assume that the transition between sim-
ilar jobs x and y is more likely to happen than non-similar jobs x and z. Let
Ajxy = hwx , wy i, the dot product of the two embedding vectors, be the affinity
score between job x and job y, and Djj (job-job) represents the transition re-
lationship of job triplets (x, y, z). The objective is to learn representation W so
that                                    X
                      Ojj = min −              ln σ(Ajj      jj
                                                      xy − Axz )                (1)
                               W
                                     (x,y,z)∈D jj

Where sigmoid function σ(ν) = 1+e1−ν is used as the probability function which
preserves the order Ajxy > Ajxz .
    Similarly, for the skill-skill graph, coexisting skills x and y which appear
on the same job posting or the same resume are closer to each other than non-
coexisting skills such as x and z. Let Dss (skill-skill) be the set of training triplets
of skills with coexisting relationship, our objective here is
                                       X
                      Oss = min 0
                                  −           ln σ(Ass        ss
                                                       xy − Axz )                    (2)
                              W
                                     (x,y,z)∈D ss

Moreover, for the job-skill graph, if skill y s appears on the advertisement of job
xj , its embedding vector is more similar to xj than non-related skill z s . Given
the set of training triplets Djs (job-skill), our desired vector representation of
jobs W and skills W 0 are learned according to the objective
                                       X
                   Ojs = min0 −                  ln σ(Ajs     js
                                                       xy − Axz )               (3)
                            WW
                                    (xj ,y s ,z s )∈D js

    Finally, to achieve high quality job and skill embedding, we optimize this
joint objective function
          O(W, W 0 ) = min0 Ojj + Oss + Ojs + λ · (|| W ||2F + || W 0 ||2F )         (4)
                        W,W

where λ is the coefficient of the l2 regularization term || · ||2F to avoid over-fitting.
    Vectors of dimension size 50 are obtained for 4325 unique job titles and 6214
skills, using joint Bayesian Personalized Ranking (BPR) [14].

3.2   Location Vector
Location is another key factor in a job posting. Commute time matters when
people are choosing a position. The common practice when dealing with loca-
tion is to pre-filter or post-filter the recommendations with a fixed radius. This
          Tripartite Vector Representations for Better Job Recommendation          5

method has a few downsides: 1. Some jobs are less sensitive to distance than
others. For example, people are more willing to commute longer with a highly
compensated job than a minimum wage part-time one; 2. The system is more
difficult to implement because of the extra layer of filtering. 3. Specifically for
our case where Faiss is used for similarity search, only one index file is needed if
location is embedded in the vector.
    Before latitude and longitude can be added to our embedding model, they
need to be transformed since they are not on the same scale as the title and
skill vector. Geo-locations are three dimensional in its nature, and to represent
location in a similar fashion as title and skill vectors, we perform a transformation
of latitude and longitude, as shown in Figure 2:

                                x = cos(θ) × cos(φ)                              (5)


                                y = cos(θ) × sin(φ)                              (6)


                                     z = sin(θ)                                  (7)
   Where θ represents the latitude and φ is the longitude.
   Thus, location is represented as a normalized three dimensional vector, which
can be later combined with the title and skill vector.


                                               Latitude


                                   Longitude


Fig. 2. Latitude and longitude can be converted into a three-dimensional coordinates.
6       M. Liu et al.

3.3   Combination of title, skill, location

Job postings with the same title might require different skill sets at different
companies in different industries. The next natural question is how to combine
the title and skill vectors together to a personalized vector for a specific job,
given both title and skill vectors are trained in the same latent space.
    To assemble the vectors back into one to represent the job posting, we apply
the retrofitting method [5] to combine the vectors of the job title and all skills
of the job posting. This method adjusts the position of the job title based on the
skills appeared in the job posting. For example, consider one job posting looking
for a web developer, and a person has a recent title of JavaScript developer. The
two job titles are similar, though still different. If JavaScript is listed as a top
skill in the job posting, the distance between the job and the resume will be
shortened. This gives the person a better place in ranking, even though the job
title is not a perfect match.
    In Faruqui’s work [5], they implement a retrofitting method to adjust any
pre-trained embedding using semantic lexicons:
                                   P
                                     j:(i,j) βij q j + αqˆi
                             qi = P                                              (8)
                                        j:(i,j) βij + αi

where q i is the modified vector, qˆi is the initial vector, q j are the neighbors.
This is derived by minimizing the distance between initial qi and neighbors. And
for their case, the results converge after 10 iterations.
    Intuitively, if we add two vectors together, the result vector will lie in between
the initial two vectors. Adjusting vectors by adding the vectors of neighboring
words will bring words similar in meanings closer to each other. For example, in
              ˆ
Figure 3, Smile             ˆ
                    and T ears    are the initial vectors for two very different words,
so they are further apart in direction (cosine similarity). After the tuning using
a neighboring word: happy, the new vectors for smile and tears are closer than
before, since the tears are in the context of ”happy tears”.
    Similarly, we apply the above operation to our job posting data, and generate
a combined representation of the job posting using title and the skills. Jobs with
similar set of skills as well as the same title will have a higher rank than the
ones with the same title but less overlapping skills. Rather than going through
several iterations as in the original paper, we only apply the updates once, as
skill vectors are not updated in the process. We can view this one time update
as fine tuning of the title vector, or the calculation of the ”job + title” vector,
shown as below, where n is the number of skills in a job posting, which are given
the same weight:                                     P
                                       n · q title + q skill
                              q job =                                               (9)
                                                  2n
    Location vector is then concatenated with the new vector. The weight of the
location vector can be adjusted based on how the jobs are sensitive to distance.

                            q job+location = [q job , x, y, z]                    (10)
          Tripartite Vector Representations for Better Job Recommendation          7


                       Fig. 3. Addition of embedding vectors.


4   Results

To test and evaluate our proposed system, we use a real dataset (jobs and users)
via CareerBuilder.com. CareerBuilder operates the largest job posting board in
the U.S. and has an extensive growing global presence, with millions of job post-
ings, more than 60 million actively searchable resumes, over one billion searchable
documents, and more than a million searches per hour.
    Each job/ resume is parsed into job title, skill list, and location (Figure 1).
We then translate these three parts into their vector representations, which are
later combined to the final job posting vector. We have also curated a list of top
skills for each job title, in case skills are not available. The curation is based on
millions of resumes and job postings, as well as human knowledge. We use this
skill list to generate a vector if no skill is given. This provides a more accurate
representation of the job than just using a job title. These vectors all go into our
recommendation engine for different recommendation products.
    We calculate the vectors on 300K actual job postings on the Careerbuilder
Website, using 6 different models (FastText, Word2Vec, Glove-6B-300 , Glove-
840B-300 [2,10,13], and our proposed retrofiter model with and without location
embedding), so each job posting has 6 vectors for comparison. For each of these
vectors, we calculate the top 50 similar vectors using Faiss. For job posting data,
a flat index method is used. For resume data, which is much larger in size, we
compress them to 16 blocks in 64-dimension and build the inverted indexing of
size 65535. Four metrics are computed for evaluation: 1) Distance, which is the
geographic distance between the recommended job and the input. 2) In-range job
counts, which is the number of jobs within 50 miles in the top 50 recommended
8       M. Liu et al.

jobs. 3) Title match rate, which is the percentage of recommended jobs with the
same job title as the input. 4) Title coverage, which measures the title match rate
within 50 miles. The results for different models are computed and compared.
    Table 1 shows that by explicitly including latitude and longitude in our em-
bedding, the average distance of recommended jobs is reduced by 90%. All top
50 jobs are within a reasonable proximity of the original input job.
    Table 2 shows the number of jobs within 50 miles in the top 50 jobs increased
dramatically for retrofitter with location embedding. Both table 1 and table 2
show that the base-line methods are not location sensitive, even though location
is included in the text file. Implicitly having location in the embedding is far
from enough, and the advantage of explicitly including location in the embedding
model is prevailing.
    Table 3 shows title match rate and coverage. Our retrofitter model with no
location embedding has a much higher title match rate and coverage than the
base-line methods as well as the retrofitter model with location. However, the
drop in performance in our location retrofitter model purely comes from the
limited relevant jobs in close proximity, which is a trade-off we have to make.
A perfect job match from ten thousand miles away is simply not a perfect job
for most people. Even so, our retrofitter model still has a better results in both
measurements than the base-line methods.


              Table 1. Distance statistics for different representations.

                        Model              Average Median STD
                        FastText           940.85 760.69 819.33
                        W2V-300            936.46 754.67 818.58
                        Glove-6B-300       959.54 785.40 823.14
                        Glove-840B-300 942.22 763.72 816.83
                        Retrofitter-no loc 902.918 727.087 836.275
                        Retrofitter-loc    90.417 70.177 95.673


    Table 2. Number of jobs with a distance less than 50 miles in the top 50 jobs.

                            Model              Average Median
                            FastText-300       5.637   3.0
                            W2V-300            5.67    3.0
                            Glove-6B-300       5.108   3.0
                            Glove-840B-300 5.503       3.0
                            Retrofitter-no loc 1.614   1.0
                            Retrofitter-loc    21.420 20.0
          Tripartite Vector Representations for Better Job Recommendation           9

Table 3. This table shows two metrics. First, it shows the percentage of jobs with
the same title match. Second, it shows the coverage which means the percentage of
recommended jobs within the 50 mi with the same job title.

                Model              Carotene-Match (%) Coverage (%)
                FastText-300       11.0               3.1
                W2V-300            11.2               3.1
                Glove-6B-300       10.6               3.0
                Glove-840B-300 10.6                   3.0
                Retrofitter-no loc 68.9               14.7
                Retrofitter-loc    14.1               5.5


5   Conclusion and Future Work

In this paper, we discuss the representation model which can be used for a recom-
mendation system and it is currently being utilized within CareerBuilder. Three
facets of a job posting are considered: job title, job skills, and location. While
job title carries the most weight in determining what a job is, skill set defines the
nuances which differs from job to job. Most job seekers are also very sensitive
to the location of a job. In our model, we encompass all three of these aspects,
and are able to give location sensitive, highly related job recommendations to
our users.
     There are a number of improvements being worked on such as : Develop
an inductive learning framework to accommodate newly emerged job titles and
skills, as the current model is transductive, and representation vectors only exist
if it is in the input graph; Incorporate more features in the job representation
such as education and previous experience; Adjust the location embedding in
a more quantifiable way to control the radius of recommended jobs; Combine
the current representation model with other models to provide a better results;
Apply different weights to skills based on their importance to the job title.


References
 1. Barkan, O., Koenigstein, N.: Item2vec: neural item embedding for collaborative
    filtering. In: 2016 IEEE 26th International Workshop on Machine Learning for
    Signal Processing (MLSP). pp. 1–6. IEEE (2016)
 2. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with
    subword information. Transactions of the Association for Computational Linguis-
    tics 5, 135–146 (2017)
 3. Covington, P., Adams, J., Sargin, E.: Deep neural networks for youtube recommen-
    dations. In: Proceedings of the 10th ACM conference on recommender systems. pp.
    191–198. ACM (2016)
 4. Dave, V.S., Zhang, B., Al Hasan, M., AlJadda, K., Korayem, M.: A combined
    representation learning approach for better job and skill recommendation. In: Pro-
    ceedings of the 27th ACM International Conference on Information and Knowledge
    Management. pp. 1997–2005. ACM (2018)
10      M. Liu et al.

 5. Faruqui, M., Dodge, J., Jauhar, S.K., Dyer, C., Hovy, E., Smith, N.A.: Retrofitting
    word vectors to semantic lexicons. In: Proceedings of NAACL (2015)
 6. Huang, W., Wu, Z., Liang, C., Mitra, P., Giles, C.L.: A neural probabilistic model
    for context based citation recommendation. In: Twenty-Ninth AAAI Conference
    on Artificial Intelligence (2015)
 7. Johnson, J., Douze, M., Jégou, H.: Billion-scale similarity search with gpus. arXiv
    preprint arXiv:1702.08734 (2017)
 8. Li, L., Jing, H., Tong, H., Yang, J., He, Q., Chen, B.C.: Nemo: Next career move
    prediction with contextual embedding. In: Proceedings of the 26th International
    Conference on World Wide Web Companion. pp. 505–513. International World
    Wide Web Conferences Steering Committee (2017)
 9. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed repre-
    sentations of words and phrases and their compositionality. In: Advances in neural
    information processing systems. pp. 3111–3119 (2013)
10. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed repre-
    sentations of words and phrases and their compositionality. In: Burges, C.J.C.,
    Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances
    in Neural Information Processing Systems 26, pp. 3111–3119. Curran Asso-
    ciates, Inc. (2013), http://papers.nips.cc/paper/5021-distributed-representations-
    of-words-and-phrases-and-their-compositionality.pdf
11. Oh, K.J., Lee, W.J., Lim, C.G., Choi, H.J.: Personalized news recommendation
    using classified keywords to capture user preference. In: 16th International Con-
    ference on Advanced Communication Technology. pp. 1283–1287. IEEE (2014)
12. Van den Oord, A., Dieleman, S., Schrauwen, B.: Deep content-based music recom-
    mendation. In: Advances in neural information processing systems. pp. 2643–2651
    (2013)
13. Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word repre-
    sentation. In: Empirical Methods in Natural Language Processing (EMNLP). pp.
    1532–1543 (2014), http://www.aclweb.org/anthology/D14-1162
14. Rendle, S., Freudenthaler, C., Gantner, Z., Schmidt-Thieme, L.: Bpr:
    Bayesian personalized ranking from implicit feedback. In: Proceedings of
    the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence. pp.
    452–461. UAI ’09, AUAI Press, Arlington, Virginia, United States (2009),
    http://dl.acm.org/citation.cfm?id=1795114.1795167
15. Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural net-
    work architectures for large scale acoustic modeling. In: Fifteenth annual conference
    of the international speech communication association (2014)
16. Sedhain, S., Menon, A.K., Sanner, S., Xie, L.: Autorec: Autoencoders meet col-
    laborative filtering. In: Proceedings of the 24th International Conference on World
    Wide Web. pp. 111–112. ACM (2015)
17. Tang, D., Qin, B., Liu, T., Yang, Y.: User modeling with neural network for review
    rating prediction. In: Twenty-Fourth International Joint Conference on Artificial
    Intelligence (2015)
18. Wang, H., Wang, N., Yeung, D.Y.: Collaborative deep learning for recommender
    systems. In: Proceedings of the 21th ACM SIGKDD international conference on
    knowledge discovery and data mining. pp. 1235–1244. ACM (2015)
19. Wu, L., Shah, S., Choi, S., Tiwari, M., Posse, C.: The browsemaps: Collaborative
    filtering at linkedin. In: RSWeb@ RecSys (2014)

</pre>