Using vector representations for matching tasks to skills Miriam Amin1,* , Jan-Peter Bergmann1 and Yuri Campbell1 1 Fraunhofer Center for International Management and Knowledge Economy (IMW), Neumarkt 9-19, 04109 Leipzig, Germany Abstract Science, Technology and Innovation (ST&I) companies as well as large research organizations are repeatedly facing the problem of matching an emerging task with the appropriate skill that is present somewhere in an organizational unit. Many organizations already have skill or competence taxonomies that can be useful in this regard. In this working paper, we present our experiments on automatically recommending suitable skills from the internal skill taxonomy of the Fraunhofer Society research organization to incoming research requests in order to support human decision making processes. We applied three different vector-based approaches for this end, one based on language models, one on word embeddings and one on a simple one-hot-encoding of keywords. Our results show that the language-model-based approach outperforms the other methods and is able to recommend skills to research requests with an MAP of 0.82. These first findings pave the way for further improvements of our method and for the transfer to other related problems. Keywords Recommender Systems, Knowledge Management, Skill Taxonomy, Competence Taxonomy, Task-Skill Matching 1. Introduction onomies alone, the manual matching of such queries with skills remains laborious. Systems that support ex- Recommender Systems are widely used in Human perts by recommending a set of highly suitable skills can Ressources, mainly in the processes of hiring and re- be very helpful in the human decision making process. cruiting. A frequent field of application is the matching We present an approach that automatically matches a of suitable job seekers to a job vacancy. The methods research request with the most suitable technological applied for this end range from the application of LSTMs skills and demonstrate the application of our method on [1] over word embeddings [2] to state-of-the art language real examples from the Fraunhofer Society. To our best models. Many authors do not merely rely on pretrained knowledge, no comparable system has been presented language models like BERT, but fine-tune these models so far. with resume and job vacancy data [3]. Although appli- The Fraunhofer Society is a German publicly funded cations of AI and NLP in the field of Knowledge Man- research organization, operating 76 research institutes agement have recently been identified as promising [4], and units that are working in different areas. The more research in this area is just starting to gain traction. than 20.000 scientific, technical and administrative work- Especially ST&I companies, but also large research or- ers in 2020 cover a very broad spectrum of competences. ganizations, may repeatedly face the problem of matching With contract research as a main source of revenue, the an emerging task with the matching skill required for organization regularly receives research requests, which completing that task in order to forward it to the appro- must then be forwarded to the appropriate organizational priate organizational unit. Many organizations already units. In this working paper, we want to present our ex- have skill or competence taxonomies that can be useful periments on automatically recommending suitable skills in this regard. Skill taxonomies (also skill framework, from the Fraunhofer internal skill taxonomy to incoming competence taxonomy or competence framework) list research requests. and describe the skills that are present or desired in an In the next section, we describe our datasets, the Fraun- organization, cluster them in a hierarchical manner and hofer skill taxonomy and the corpus of research requests store them in a database. in greater detail. After that, we dicuss the different meth- Nevertheless, due to the often very extensive tax- ods that we used and compared to the end of matching the two datasets with each other. In the subsequent results RecSys in HR’22: The 2nd Workshop on Recommender Systems for chapter, we briefly present the results of our experiments. Human Resources, in conjunction with the 16th ACM Conference on Recommender Systems, September 18–23, 2022, Seattle, USA. The article concludes with the discussion of our results * Corresponding author. and the methods and highlights the next steps we want miriam.amin@imw.fraunhofer.de (M. Amin); to take with these approaches. jan-peter.bergmann@imw.fraunhofer.de (J. Bergmann); yuri.cassio.campbell.borges@imw.fraunhofer.de (Y. Campbell)  0000-0001-6912-4122 (M. Amin); 0000-0001-8918-0551 (J. Bergmann); 0000-0002-4166-2081 (Y. Campbell) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Scientific Research We are searching for a solution to link a smart Skill discipline field metering system of high-resolution electricity, Simulation, control gas and heat data with our intelligent cloud and operational AI-based solution. In the cloud, we want to automati- Energy management of autonomous Research cally process the data using machine lear- informatics energy supply actions request ning to check for consistency and complete- systems ness and to enable load forecasts and cost Concatenated skill string optimization. We are also looking for the Simulation, control and operational management of joint development of innovative business energy supply systems Energy informatics AI-based models. autonomous actions Table 2 Table 1 Example research request Example of a skill hierarchy and the concatenated skill string. 2.2. Methods 2. Methods In order to support the matching between research re- quests and in-house skills in large organizations, we pro- 2.1. Data and Preprocessing pose a vector-based approach, which draws from recent The Fraunhofer Society combines a wide variety of spe- Transfer Learning advances in Natural Language Process- cialized institutes under one umbrella. To handle this ing. Firstly, we represent the skills in the taxonomy with variety of skills contained in the different institutions, a vector model. Then, with the same vector represen- the Fraunhofer Society developed an overview of its al- tation approach, we transform the requests and project ready existing competences as well as prospective ones. them into the same vector space. Finally, every research It is planned that employees will be able to subscribe request acts as a query for which we retrieve matching to the skills and topics that interest them, i.e. skills are documents. In this Information Retrieval setting, we re- not automatically assigned to employees. Based on their turn the 𝑛−closest skill-vectors to a specific query vector individual selections, employees can then receive rele- as matches for that request. vant messages and notifications about incoming research In this framework, we test three distinct approaches to requests. These skills are hierarchically structured in a create useful vector representations for the task at hand. taxonomy with a tree-like structure with four levels: the They are Keyword-Binarizer (KB), Keyword-Embedding root, the first level: scientific disciplines, the second level: (KE) and Language Model (LM). In the KB approach, their research fields, and finally, the skills are the leaves we extract keywords using the keyword extraction al- in this skill-tree. gorithm YAKE! [5] from the text description of skills The entire dataset includes approximately 1.000 skills and requests, then a binary vector is constructed in an that are either written in German, English or mixed En- one-hot-encoding manner for all skills and requests. It glish and German. Moreover, disciplines and research is important to note here that YAKE! extracts keywords fields as well have similar language composition in their as well as keyphrases (the combination of two or more description. That means, even when a leaf is described in words). From now on in the text, we will refer to both as the English language, as Machine Learning, its research keywords only. field Künstliche Intelligenz can be written in German, and In the KE vector model, the texts undergo the same vice versa. In order to give more contextual information keyword extraction procedure as in KB. However, the fi- to single skills, we concatenate skill, research field and nal step for the construction of the vector representation scientific discipline to build one textual representation is different. Here, given a skill or a request, we create for every skill in this way. These preprocessed skills have the corresponding vector representation by averaging an average length of 128 characters. Table 1 shows an the Word2Vec embeddings of the keywords belonging to example of a skill hierarchy and the concatenated skill that skill/request. We use Word2Vec word-embeddings, string. In this specific case, all levels are in English. which were trained by Deepset1 on the whole German On the other side, research requests are short texts Wikipedia corpus. In cases where the vector representa- of approximately 1.112 characters in length. Since they tions for a specific word is not found in the embedding come from different authors, they are very diverse both dictionary, we apply compound splitting and a vector structurally and stylistically. Also, they cover a large retrieval is attempted for the resulting components. This variety of research fields and can be German or English, procedure is specially useful for German, since many but mainly German. Our experimental corpus of research German words have a compositional structure, for ex- requests conveys approximately 100 documents. Table 2 shows an example of such a research request. 1 https://www.deepset.ai/german-word-embeddings ample Forschungsprojekt = Forschung (research) + Pro- Sampling method Top jekt (project). Words for which no representation can be Method Expert Mean Similarities found receive a 0−vector, which practically cancels any NDCG MAP NDCG MAP NDCG MAP impact they might have on the average representation. LM 0.70 0.89 0.63 0.76 0.67 0.82 Finally, in the LM approach, we use a multilingual KE 0.25 0.37 0.13 0.16 0.19 0.27 language model which is fine-tuned on the task KB 0.28 0.39 0.15 0.28 0.21 0.33 of semantic similarity. More precisely, we use the model paraphrase-multilingual-mpnet-v2, Table 3 provided by Sentence-Transformers 2 . This model NDCG@5 and MAP values for the three vectorization meth- ods and the two sampling methods. LM - Language Model, is suitable for creating vector representations KE - Keyword-Embeddings, KB - Keyword-Binarizer of sentences and paragraphs for information re- trieval, clustering or sentence similarity tasks3 . The model paraphrase-multilingual-mpnet-v2 is the multilingual version of the original for the request, ’1’ when it was not completely relevant, model all-mpnet-base-v2. The model but also not irrelevant and ’0’ when it was completely paraphrase-multilingual-mpnet-v2 is trained via irrelevant. multilingual knowledge distillation [6]. In other words, We took two samples of ten requests, each with a dif- a smaller multilingual model, in this case XLM-RoBERTa ferent sampling method. In the sampling method ’expert’, [7], is used as the student model, while a bigger MPNET we selected ten requests in which the authors of this pa- [8] monolingual model is used to guide the multilingual per themselves have expert knowledge of the required vector representations of translated pairs by means skills - resulting in ten IT and AI related requests. For the of a double mean squared error loss on the generated sampling method ’top similarities’, we considered the top representations for the multilingual training pair. The five skills with the highest similarity scores for each re- pre-trained monolingual teacher model MPNET was quest. We then took the mean of these top five similarity fine-tuned with SBERT-like objective [9] on more than 1 scores. For each vectorization method, we then selected billion pairs of sentences/paragraphs4 . The pre-training the top ten request with the highest mean similarities. objective of the teacher model is an usual contrastive Note that the ’top similarities’ sample sets differ among learning objective. That means, for a given pair of sen- the methods. In addition, we calculated the mean value tences, or paragraphs or sentence-paragraph, the model from the ’expert’ and the ’top similarities’ sample. predicts which, out of a set of randomly constructed With 20 relevance assessments for each method, we pairs with at least one component of the original pair, were able to calculate the Normalized Discounted Cumu- were actually paired in the billion dataset. In our use lative Gain@5 (NDCG@5) and the Mean Average Preci- case, just the trained student model is used in order to sion (MAP) for each system. In order to calculate these create multilingual vector representations for skills and measures despite the missing ground truth, we assumed requests. Both require no further pre-processing steps that there are five matching skills for each request. In before as the XLM-RoBERTa model has SentencePiece order to calculate the MAP, which requires a binary rele- as its base tokenizer and it was previously pre-trained in vance, we considered the relevance labels ’1’ and ’2’ as many languages, among them English and German as relevant and ’0’ as irrelevant. well. 3. Results 2.3. Validation The purpose of our experiment was to find out which In order to validate the three approaches described in the NLP method yields the best results for the task of rec- preceding section, we took two different samples of the ommending skills from a standardised skill ontology to request corpus, retrieved the top five skill recommenda- a specific task or request. Table 3 shows an overview of tions from each method and assessed the relevance. the NDCG@5 and the MAP scores obtained during our For the experiments at hand, we needed to conduct the experiments. relevance assessment manually. In the near future, how- From the data, it is apparent that the language model- ever, a completely expert-labeled ground truth dataset based method yielded by far the best results. Over all sam- will be at our hand, recording all relevant skills for each ples, the language model achieved an impressive MAP of request. We labeled a request-skill-pair with the rele- 0.82 and and NDCG of 0.67. The other two methods are vance value ’2’ when the skill was completely relevant far behind. 2 https://www.sbert.net/docs/pretrained_models.html To illustrate the findings of these first experiments, we 3 https://huggingface.co/sentence-transformers/all-mpnet-base-v2 show the top five skill recommendations of each method 4 https://huggingface.co/sentence-transformers/all-mpnet-base-v2 Simulation, control and opera- this allow us to calculate more evaluation measures, such tional management of energy Before prompt as precision@k and F1@k, we could also fine-tune the supply systems Field of com- engineering petence energy informatics vector-space model. With contrastive learning, we could AI-based autonomous actions optimize the vector space in a way that requests move We work on AI-based auto- closer to the matching skills and further away from the nomous actions. Our field mismatching skills, hoping that this new vector space is of competence is energy in- transferable to unknown requests. After prompt formatics, within the research Last, and maybe most importantly, we want to ex- engineering field of simulation, control plore the transferability of our method to other, related and operational management problems. These are, e.g., recommending skills for more of energy supply systems general tasks and work assignments or even finding the Table 4 worker or team with the optimal skill set for requests, An example of the prompt engineering of a skill string tasks and work assignments. However, it remains very important to mention that such recommender systems are only useful and properly for one request in table 5. utilized when they are designed to support an essentially human-driven decision-making process. 4. Discussion References The results of these preliminary experiments are very sat- isfactory. We have shown that our language-model-based [1] C. Qin, H. Zhu, T. Xu, C. Zhu, L. Jiang, E. Chen, method in particular performed very well for matching H. Xiong, Enhancing person-job fit for talent re- skills to specific tasks. That was somewhat surprising cruitment, in: K. Collins-Thompson (Ed.), The 41st against the background that the skills have a compara- International ACM SIGIR Conference on Research tively short text length and thus do not provide much & Development in Information Retrieval, ACM Con- context for the language model to compute semantic simi- ferences, ACM, New York, NY, 2018, pp. 25–34. larities. Equally surprising was that the word-embedding- doi:10.1145/3209978.3210025. based method (KE), which were supposed to perform well [2] J. Zhao, J. Wang, M. Sigdel, B. Zhang, P. Hoang, even without context, showed such poor performance. M. Liu, M. Korayem, Embedding-based recom- We suspect that this is due to the rather technical vocabu- mender system for job to candidate matching on lary in both the skills and the requests that is not present scale, 2021. URL: https://arxiv.org/pdf/2107.00221. in our word embedding vocabulary. Our attempt to coun- [3] D. Lavi, V. Medentsiy, D. Graus, consultantbert: Fine- teract this with the compound splitting described above tuned siamese sentence-bert for matching jobs and does not seem to have achieved the expected results. job seekers, in: The Workshop on Recommender Nevertheless, we are convinced that the performance Systems for Human Resources (RecSys in HR 2021), - particularly that of the LM approach - can still be im- 2021. proved by further tuning. In future work, we want to [4] M. H. Jarrahi, D. Askay, A. Eshraghi, P. Smith, Artifi- experiment with further text preprocessing and prompt cial intelligence and knowledge management: A part- engineering methods. For example, we are interested nership between human and ai, Business Horizons how the transformation of the skill string into a real- (2022). doi:10.1016/j.bushor.2022.03.002. language sentence impacts the performance. For this, [5] R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, a sentence template with slots for the hierarchical el- C. Nunes, A. Jatowt, Yake! keyword extraction ements of the skill string can be used. Table 4 shows from single documents using multiple local fea- an example of such a transformed string. With such a tures, Information Sciences 509 (2020) 257–289. transformation, we hope to provide even more context to doi:10.1016/j.ins.2019.09.013. the Language Model, especially to the attention mecha- [6] N. Reimers, I. Gurevych, Making monolingual sen- nism. Moreover, LMs are trained and optimized on whole tence embeddings multilingual using knowledge dis- natural sentences, not on syntaxless word groups. tillation, 2020. URL: https://arxiv.org/abs/2004.09813. Again, we should address that the sample size of this doi:10.48550/ARXIV.2004.09813. experiment is still rather small and results need to be con- [7] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, firmed as soon as the entire dataset of research requests G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettle- was labelled with the matching skills. moyer, V. Stoyanov, Unsupervised cross-lingual We also hope to make further improvements to our ap- representation learning at scale, 2019. URL: https: proach with such a ground truth at hand. Not only would We are searching for a solution to link a smart metering system of high- resolution electricity, gas and heat data with our intelligent cloud solution. In the cloud, we want to automatically process the data using machine Research Request learning to check for consistency and completeness and to enable load forecasts and cost optimization. We are also looking for the joint development of innovative business models. Assessment Rank Research field Skill Label Language Energy Information Data Science, Statistics, Time 1 2 Model Technology Series Analyses, AI/ML Energy Information 2 Data Management 2 Technology Economic and regulatory 3 Energy system analyses 2 assessment AI-based methods of optimized, Energy Information 4 predictive network operation 1 Technology management Energy Information Standards and interfaces for 5 2 Technology interoperable communication Keyword- Storage & storage Integration of new storage 1 0 Embedding systems systems Lightweight construction Functional integration in 2 0 technologies lightweight construction 3 Power grids Modeling of power grids 0 Artificial Intelligence Generation of Synthetic 4 0 Methods Training Data Artificial Intelligence AI Technologies in 5 1 Methods Production & Logistics Packaging for RF and Keyword- Module manufacturing/ 1 analog mixed-signal 0 Binarizer integration modules Process 2 Epitaxy 0 Technologies High- and ultra-high- Component 3 frequency components 0 manufacturing (High-Frequency Devices) Component 4 Actuators, MEMS actuators 0 manufacturing Component packaging, 5 module manufacturing/ Display, RFID packaging 0 integration Table 5 Top 5 recommendations of all three methods for one example research request //arxiv.org/abs/1911.02116. doi:10.48550/ARXIV. 1911.02116. [8] K. Song, X. Tan, T. Qin, J. Lu, T.-Y. Liu, Mpnet: Masked and permuted pre-training for language un- derstanding, 2020. URL: https://arxiv.org/abs/2004. 09297. doi:10.48550/ARXIV.2004.09297. [9] N. Reimers, I. Gurevych, Sentence-bert: Sen- tence embeddings using siamese bert-networks, 2019. URL: https://arxiv.org/abs/1908.10084. doi:10. 48550/ARXIV.1908.10084.