Modeling, Measuring and Exploiting Concept Drift in the
                        Labour Market Domain
                            Panos Alexopoulos                                                          Spyretta Leivaditi
                            Textkernel B.V.                                                                Kentivo B.V.
                       Nieuwendammerkade 26a5                                                              Kerksteeg 1
                 1022 AB, Amsterdam, The Netherlands                                           3582 CV, Utrecht, The Netherlands
                      alexopoulos@textkernel.com                                                 spyretta.leivaditi@kentivo.com

ABSTRACT                                                                            questions like "What are the most important skills for a certain pro-
The Labour Market domain is a relatively narrow domain in terms                     fession?", "What professions are specializations of Profession X?" or
of concept types that appear in it (as it typically consists of pro-                "What qualifications do I need in order to acquire skill Y". Moreover,
fessions, skills and qualifications) but a very broad one in terms of               we use the graph within our systems for a) performing entity recog-
actual concepts (as these professions and skills can be in all kinds of             nition and disambiguation in CVs and vacancies and b) determining
domains such as Technology, Education, Finance, etc). More impor-                   the semantic similarity between these entities when searching or
tantly, it is a quite volatile domain in the sense that the meaning of              matching CVs and vacancies.
many concepts changes (at different rates) over time. This phenom-                     Constructing the knowledge graph in an efficient and cost-effective
enon, known as semantic or concept drift, poses a challenge for                     way is a quite challenging task, not only because the labour market
the maintenance and evolution of knowledge graphs that represent                    domain is quite broad but also because it is very heterogeneous
such domains, and requires dedicated approaches for tackling it so                  (different industries and business areas, languages, labour markets,
as to prevent such graphs from becoming irrelevant. With that in                    educational systems etc.). What is equally challenging, however, is
mind, in this paper we describe our experiences from dealing with                   dealing with the concept drift that happens to the domain’s concepts
concept drift in an in-house developed labour market knowledge                      as the time goes by, and causes changes to their meaning [15].
graph, and provide insights on: i) how concept drift can be effec-                     In particular, drift in our graph is mainly observed in Professions,
tively defined and modeled for labour market concepts, and ii) how                  Skills and Qualifications. Take for example journalists. Before the
it can be detected, measured and effectively incorporated in the                    proliferation of the Internet and social media, a reporter would
knowledge graph lifecycle.                                                          have to research stories through contacts, speaking to people, door
                                                                                    knocking and visiting the local library to consult past publications.
KEYWORDS                                                                            She would also most likely not know how to do her own video
                                                                                    production editing but would rely on experts to do that for her.
Knowledge Graphs, Concept Drift, Labour Market
                                                                                    Nowadays, however, it’s more likely to meet a reporter who can
                                                                                    use effectively Google, Twitter and other modern information chan-
1    INTRODUCTION                                                                   nels, and, to a still low yet increasing extent, data analysis and
A few years after Google announced that their knowledge graph                       visualization tools [10]. Similar arguments can be made for other
allowed searching for things, not strings1 , knowledge graphs have                  professions but also for qualifications and skills. A contemporary
been gaining momentum in the world’s leading organisations as                       degree in Finance, for example, has definitely different content and
a means to integrate, share and exploit data and knowledge that                     even somewhat different learning objectives than it had 30 years
they need in order to stay competitive [11]. Apart from Google,                     ago. Similarly, being expert in Marketing nowadays is highly asso-
prominent examples of companies that develop knowledge graphs                       ciated to being expert in Search Engine Optimization and Social
include Microsoft2 , LinkedIn3 , BBC4 and Thomson Reuters5 . A                      Media.
similar knowledge graph, for the recruitment and labour market                         These changes can be bigger or smaller, faster or slower, and
domain, we have been developing and using for the last couple of                    more or less profound, depending on the concept type and of course
years at Textkernel, aiming to significantly improve the way our                    the real-world dynamics. In any case, such changes can affect the
semantic software modules parse, retrieve and match CVs and Job                     quality of a knowledge graph and, therefore, dedicated frameworks
Vacancies.                                                                          for modeling, measuring and exploiting semantic drift in the context
   Our knowledge graph defines and interrelates concepts and enti-                  of knowledge graph maintenance and evolution are needed [14].
ties about the labour market and recruiting domain, such as profes-                    In this short paper, we corroborate this argument and we extend
sions, skills and qualifications, for multiple languages and countries.             it with the following two arguments:
Using the graph, an agent (human or computer system) can answer
                                                                                       (1) The definition and modeling of semantic drift for a given
1 googleblog.blogspot.com
2 https://arstechnica.com/information-technology/2012/06/inside-the-architecture-
                                                                                           knowledge graph should take into account the graph’s
of-googles-knowledge-graph-and-microsofts-satori/
                                                                                           content, domain and application context, and adapted
3 https://engineering.linkedin.com/blog/2016/10/building-the-linkedin-knowledge-           accordingly. While generic formalizations of concept drift
graph                                                                                      are very useful (like for example modeling drift in terms of
4 http://www.bbc.co.uk/ontologies
5 https://www.scribd.com/document/288608104/Creating-the-Thomson-Reuters-                  label, intension and extension [15]), these are not necessar-
knowledge-graph-and-open-permID-ODI-Summit-2015                                            ily directly or completely applicable to all domains and/or
       graphs, the reason being that not all aspects of a concept’s          is broader than "Java Developer" and "Economics" is broader than
       meaning contribute to its drift in the same way and to the            "Microeconomics").
       same extent.                                                             Additional relations are defined per concept type. In particular,
   (2) There is not a unique optimal way to measure concept                  professions are linked to skills and activities they involve, as well
       drift for a given knowledge graph, but rather multi-                  as the locations, organizations and industries where they are found.
       ple ways whose outcomes can have different interpre-                  They are also linked to qualifications that are (formally or infor-
       tations and usages. Indeed, the values one gets when mea-             mally) required for their exercise (e.g., the BAR exam for practicing
       suring concept drift can be quite different, depending on the         law in the United States), and, of course, to other professions that
       metrics, data sources and methods/algorithms used for the             are similar to them.
       measurement. Therefore, it is important that a) for a given              Skills, in turn are linked to similar skills and activities, profes-
       drift measurement approach, the drift values it produces can          sions and industries they are mostly demanded by, and qualifica-
       be clearly interpreted and used, and b) for a desired interpre-       tions that develop and verify them. Finally, qualifications are linked,
       tation/usage, an appropriate drift measurement method can             apart from skills, to organizations that provide them as well as the
       be selected.                                                          educational levels they cover.
   In the rest of the paper we further explain and exemplify these              Most of the above relations are extracted and incorporated into
arguments by describing how we model and measure concept drift               the knowledge graph in a semi-automatic way from a variety of
in our Labour Market Knowledge Graph, as well as how we apply                structured and unstructured data sources, including CVs, Job Va-
the measurement results, not only for improving the graph but also           cancies and Wikipedia [16] [17], as well as Search Query Logs
gaining business benefits.                                                   [3]. Moreover, many of these relations are vague, i.e., there are (or
                                                                             could be) pairs of concepts for which it is indeterminate whether
2  DRIFT MODELING FOR LABOUR MARKET                                          they stand in the relation or not (e.g., the similarity between dif-
   CONCEPTS                                                                  ferent skills or the importance of a skill for a profession) [2]. The
                                                                             problem with vague relations is that their interpretation is highly
2.1 Concept Representation                                                   subjective, context-dependent, and usually a matter of degree, thus
The Textkernel knowledge graph consists primarily of the following           making it hard to achieve a global consensus over their veracity.
concept types:                                                               For this reason, in our graph, such relations have the following
     • Professions: Concepts that represent groupings of jobs that           three properties:
       involve similar tasks and require similar skills and compe-                 • Strength: A number (typically from 0 to 1) indicating the
       tencies.                                                                      strength/confidence of the relation.
     • Skills: Concepts that represent tools, techniques, methodolo-               • Applicability Context: The contexts (location, language,
       gies, areas of knowledge, activities, and generally anything                  industry etc) in which the relation has been discovered and
       that a person can "have knowledge of", "be experienced in"                    considered to be true.
       or "be expert at" (e.g., Economics, Software Development,                   • Provenance: Information about how the relation has been
       "doing sales in Africa", etc). Also concepts that represent                   added to the graph (source, method, process).
       personality traits, including communication abilities, per-              These properties do not remove of course vagueness, but help
       sonal habits, cognitive or emotional empathy, time manage-            towards making the relations better interpretable by both humans
       ment, teamwork and leadership traits (usually referred as             and systems and reducing disagreements [1]. Moreover, as we show
       soft skills).                                                         below, these properties play an important role in the measurement
     • Qualifications: Concepts that represent "formal outcomes              of concept drift.
       of assessment and validation processes which are obtained
       when a competent body determines that an individual has
                                                                             2.2     Concept Drift
       achieved learning outcomes to given standards" (European
       Qualifications Framework6 ).                                          Concept drift in the semantic knowledge representation literature
     • Organizations: Concepts that represent organizations of               is usually modeled (and measured) with respect to three aspects
       different types, including public organizations and institutes,       of a concept’s meaning, namely its labels (i.e., the words used to
       private companies and enterprises, educational institutes (of         express the concept), its intension (i.e., the concept’s characteristics
       all educational levels) and others.                                   as expressed via its properties and relations), and its extension
     • Industries: Concepts that represent industrial groupings of           (i.e., the set concept’s of the concept’s instances) [15] [13]. The
       companies based on similar products and services, technolo-           extension’s role in drift is disputed by [5], suggesting that it depends
       gies and processes, markets and other criteria.                       on the kind of concepts under consideration.
                                                                                 In our knowledge graph, we adopt this latter perspective, by
   The different ways a concept can be expressed in a text (sur-
                                                                             not considering extensions as part of our concepts’ meaning and
face forms) are represented in the graph via the well-known SKOS
                                                                             drift. One reason for that is that concepts like skills and professions
relations prefLabel and altLabel [9]. Moreover, concepts can be
                                                                             are rather abstract and do not have straightforward instances (e.g.,
taxonomically related to other concepts of the same type via the
                                                                             professions do not refer to specific persons or jobs). One could
SKOS relations broader and narrower (e.g., "Software Developer"
                                                                             consider as profession instances the people that exercise them or
6 http://ec.europa.eu/eqf/home_en.htm                                        the vacancies that are available for them, but then a change in the
                                                                         2
workforce size does not alter the profession’s meaning. Instead,             and labeling, they define corresponding similarity functions for
it’s the qualitative characteristics of this workforce that signify          each of these aspects. In particular, they employ string similarity
a change, and that’s exactly what we capture via the concepts’               metrics for measuring labeling drift, and set similarity metrics for
intension.                                                                   measuring intension and extension drift. For our graph, we follow
    Nevertheless, we do not consider all properties and relations of         a similar approach, but with some important differences.
our concepts to be part of their meaning and drift, nor to the same             First, for labeling we don’t use string similarity to measure
extent. In particular:                                                       change, one reason being that we don’t consider spelling or mor-
    • We do consider as drift changes in a concept’s labels, yet only        phosyntactic change as a drift. Instead, we consider labels as part of
      when these changes are not merely additions or removals of             the concept’s intension and we use set similarity metrics to measure
      spelling and/or morphosyntactic variations of existing labels          the difference between a concept’s changing label sets.
      (e.g., part-of-speech or plural form). Moreover, we consider              Second, since many of the concept relations are vague and with
      changes in preferred labels as slightly more important than            their validity quantified by some strength score, when we calculate
      alternative labels, as the former are typically more suggestive        similarity based on them we use metrics that can take in consid-
      of the concept’s meaning.                                              eration this strength. One approach that we use, for example, is
    • We do consider as drift changes in a concept’s broader and             as follows: Given two versions of the same concept and a (vague)
      narrower relations, with broader changes suggesting in gen-            relation that influences drift, we derive the top-N related concepts
      eral a more fundamental drift in the concept’s meaning than            for each version (based on the strength score), and we calculate
      the narrower ones.                                                     their similarity using the generalized Kendall’s tau [4] that can
    • For profession concepts meaning is primarily defined by                measure distance between rankings. In that way, for example, if
      the skills and activities they involve (see the example of             the "Data Scientist" profession continues having the same top 10
      journalist above). Essential skills for a profession are more          related skills but differently ranked, a drift will be detected.
      important than optional skills, though that can be hard to                Third, in order to be able to understand and interpret concept
      distinguish. Profession meaning also changes, though to a              drift better, we need a versatile measurement framework that en-
      lesser extent, when the industries it is found in change (e.g.,        ables the dynamic and highly configurable measurement and pre-
      journalists start working in the tech sector). On the other            sentation of drift. Such a framework should take as input a set of
      hand, a profession concept does not drift when the locations           parameters, specifying the scope, type and other characteristics of
      or companies it is most popular in, change.                            the drift we want to measure, and generate corresponding output.
    • For skill concepts meaning is primarily defined by their sim-          Examples of parameters we consider are:
      ilar skills and activities, as these describe for what tasks and
      in what contexts a skill is used. It also changes, though to a             • Target concept types (Professions, Skills, etc.)
      lesser extent, when it starts being applied in different pro-              • Time scope (either as a specific time period or as specific
      fessions and industries, as part of possessing a skill includes              releases to be included).
      having experience in its application contexts.                             • Relations and properties to be included.
    • For qualification concepts meaning is primarily defined by                 • Relation applicability context and provenance.
      the skills they develop and/or verify. Secondarily, by the
      professions they regulate and/or are useful for (especially               The reason we need all these parameters, is that different values
      in some countries, qualifications are the main criterion for           of them can yield different drift, not only in terms of intensity
      entering a profession).                                                but also in terms of interpretation. For example, if we calculate a
   It’s worth noting that we are aware of the distinction between            concept’s drift using only CVs as a data source, then the drift we will
concept drift and concept replacement (i.e., change in the concept’s         measure will reflect the change in the way the workforce side of the
core meaning) [8], but we don’t really tackle this issue in our graph,       labour market interprets and uses the concept. On the other hand, if
because a) it can be quite difficult to define the core meaning of a         we use only Vacancies, we shall get an idea of how the same concept
concept in a way that is easily detectable, and b) it’s a phenomenon         changes from the industry’s perspective. Similarly, if we use news
that is rather rare, not causing any observable problems to our              articles, we will measure the change in the general perception of the
graph and its applications so far.                                           concept, while the usage of more encyclopedic and definitional data
                                                                             sources (e.g. Wikipedia or specialized dictionaries) may indicate
3   DRIFT MEASUREMENT FOR LABOUR                                             changes in more core aspects of the concept’s meaning.
    MARKET CONCEPTS                                                             Finally, as suggested in the previous section, different relations
                                                                             have different influence to concept drift, and that difference needs
Concept drift is typically detected and quantified by measuring the
                                                                             to be considered when relation-specific drifts are aggregated. A sim-
difference in meaning between two or more different versions of
                                                                             ilar argument can be made for other drift aspects like provenance or
the same concept in different points in time [13] [12] [7] [6]. The
                                                                             context (e.g., the change of a profession concept in a country with
more dissimilar the two versions are to each other, the greater the
                                                                             more advanced economy may be more important/crucial than the
drift is.
                                                                             change in less developed country). For that reason, our drift frame-
   Measuring concept meaning similarity is obviously dependent on
                                                                             work supports the definition of drift aspect importance weights
how meaning is modeled. Thus, for example, in [15] and [13] where
                                                                             that are used for combining and aggregating partial drift scores.
the authors consider as meaning the concept’s intension, extension
                                                                         3
4    DRIFT EXPLOITATION                                                                          crisp ones. Knowl. Inf. Syst. 32, 3 (2012), 667–695. https://doi.org/10.1007/
                                                                                                 s10115-011-0457-6
The modeling and measurement of concept drift in our knowledge                               [3] Khalifeh AlJadda, Mohammed Korayem, and Trey Grainger. 2015. Improving the
graph serves mainly two purposes, one engineering related and                                    quality of semantic relationships extracted from massive user behavioral data. In
                                                                                                 2015 IEEE International Conference on Big Data, Big Data 2015, Santa Clara, CA,
one of business nature. On the engineering side, the measurement                                 USA, October 29 - November 1, 2015. 2951–2953. https://doi.org/10.1109/BigData.
and monitoring of drift helps us quantify and understand better                                  2015.7364133
the dynamics of our domain and our graph’s content. This, in turn,                           [4] Ronald Fagin, Ravi Kumar, and D. Sivakumar. 2003. Comparing Top K Lists. In
                                                                                                 Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algo-
enables us to plan and prioritize the maintenance and evolution                                  rithms (SODA ’03). Society for Industrial and Applied Mathematics, Philadelphia,
of the knowledge graph much more effectively by, for example,                                    PA, USA, 28–36. http://dl.acm.org/citation.cfm?id=644108.644113
identifying highly volatile graph aspects that need more frequent                            [5] Antske Fokkens, Serge Ter Braake, Isa Maks, and Davide Ceolin. 2016. On the
                                                                                                 Semantics of Concept Drift: Towards Formal Definitions of Concept Drift and
updates, and allocating more resources for that. This applies not                                Semantic Change. In Proceedings of the 1st Workshop on Detection, Representation
only for computational resources (data storage capacity, data pro-                               and Management of Concept Drift in Linked Open Data co-located with the 20th
                                                                                                 International Conference on Knowledge Engineering and Knowledge Management
cessing efficiency, etc.) but also human ones (knowledge engineers,                              (EKAW 2016), Bologna, Italy, November, 2016. 10–17.
quality analysts, annotators etc.)                                                           [6] Jon Atle Gulla, Geir Solskinnsbakk, Per Myrseth, Veronika Haderlein, and Olga
   On the business side, the drift in our knowledge graph indicates                              Cerrato. 2010. Semantic Drift in Ontologies. In WEBIST 2010, Proceedings of the
                                                                                                 6th International Conference on Web Information Systems and Technologies, Volume
to a large extent the changes that take in place in the labour market,                           2, Valencia, Spain, April 7-10, 2010. 13–20.
especially the one that we derive from CVs and Vacancies. These                              [7] Adam Jatowt and Kevin Duh. 2014. A Framework for Analyzing Semantic
changes we can then communicate to job seekers, candidate seekers,                               Change of Words Across Time. In Proceedings of the 14th ACM/IEEE-CS Joint
                                                                                                 Conference on Digital Libraries (JCDL ’14). IEEE Press, Piscataway, NJ, USA,
education and training providers, policy makers, and generally                                   229–238. http://dl.acm.org/citation.cfm?id=2740769.2740809
anyone who can gain advantage from knowing the dynamics of                                   [8] Jouni-Matti Kuukanen. 2008. Makinh Sense of Conceptual Change. History and
                                                                                                 Theory 47, 3 (2008), 351–372.
the labour market.                                                                           [9] Alistair Miles and Sean Bechhofer. 2009. SKOS Simple Knowledge Organization
   For example, most job holders have a narrow perception of what                                System Reference. W3C Recommendation 18 August 2009. (2009). http://www.
their profession entails and to what extent and rate it evolves over                             w3.org/TR/2009/REC-skos-reference-20090818/
                                                                                            [10] Nic Newman. 2017. Journalism, Media, and Technology Trends and Predictions
time, as they usually operate in a narrow context. As a result, when                             2017. Technical Report. Reuters Institute for the Study of Journalism.
these people become job seekers, they have to change this percep-                           [11] Jeff Pan, Guido Vetere, Jose Manuel Gomez-Perez, and Honghan Wu. 2017. Ex-
tion, otherwise they may fail to secure a new job that may have the                              ploiting Linked Data and Knowledge Graphs in Large Organisations. Springer
                                                                                                 International Publishing Switzerland. https://doi.org/10.1007/978-3-319-45654-6
same title but quite different content. The same applies for organi-                        [12] Gabriel Recchia, Ewan Jones, Paul Nulty, John Regan, and Peter de Bolla. 2016.
zations that need to hire people but fail to do so, mainly because                               Tracing Shifting Conceptual Vocabularies Through Time. In Drift-a-LOD@EKAW
                                                                                                 (CEUR Workshop Proceedings), Vol. 1799. CEUR-WS.org, 2–9.
their job definitions are too restrictive and not in sync with the                          [13] Thanos G. Stavropoulos, Stelios Andreadis, Efstratios Kontopoulos, Marina Riga,
supply side of the market.                                                                       Panagiotis Mitzias, and Yiannis. Kompatsiaris. SemaDrift: A Protégé Plugin
                                                                                                 for Measuring Semantic Drift in Ontologies. In Hollink, L., Darányi, S., Meroño
                                                                                                 Peñuela, A., and Kontopoulos, E. (eds.) 1st International Workshop on Detection, Rep-
5    CONCLUSION                                                                                  resentation and Management of Concept Drift in Linked Open Data (Drift-a-LOD) in
In this short paper we have described how we have been model-                                    conjunction with the 20th International Conference on Knowledge Engineering and
                                                                                                 Knowledge Management (EKAW). CEUR Workshop Proceedings Vol 1799. Bologna,
ing, measuring and exploiting concept drift in a Knowledge Graph                                 Italy, 34–41.
for the Labour Market domain, making the case for a more flex-                              [14] Ljiljana Stojanovic, Alexander Maedche, Boris Motik, and Nenad Stojanovic.
                                                                                                 2002. User-Driven Ontology Evolution Management. In Proceedings of the 13th
ible, adaptable, and domain/application dependent drift tackling                                 International Conference on Knowledge Engineering and Knowledge Management.
approach. We have shown how not all aspects of a concept’s mean-                                 Ontologies and the Semantic Web (EKAW ’02). Springer-Verlag, London, UK, UK,
ing contribute to its drift in the same way and to the same extent,                              285–300. http://dl.acm.org/citation.cfm?id=645362.650868
                                                                                            [15] Shenghui Wang, Stefan Schlobach, and Michel C. A. Klein. 2011. Concept drift
thus requiring a careful analysis and selection of them for the do-                              and how to identify it. Journal of Web Semantics 9 (2011), 247–265.
main and graph at hand. We have also shown how versatile can be                             [16] Meng Zhao, Faizan Javed, Ferosh Jacob, and Matt McNair. 2015. SKILL: A System
the outcome of measuring concept drift (depending on the metrics,                                for Skill Identification and Normalization. In Proceedings of the Twenty-Ninth
                                                                                                 AAAI Conference on Artificial Intelligence (AAAI’15). AAAI Press, 4012–4017.
data sources and methods/algorithms used for the measurement),                                   http://dl.acm.org/citation.cfm?id=2888116.2888273
suggesting nevertheless that this versatility can be actually useful                        [17] Wenjun Zhou, Yun Zhu, Faizan Javed, Mahmudur Rahman, Janani Balaji, and Matt
                                                                                                 McNair. 2016. Quantifying skill relevance to job titles. In 2016 IEEE International
and, therefore, in need of proper management.                                                    Conference on Big Data, BigData 2016, Washington DC, USA, December 5-8, 2016.
    Our parameter-based drift management framework is still work                                 1532–1541. https://doi.org/10.1109/BigData.2016.7840761
in progress, requiring further research and development on how
it can be properly operationalized within our enterprise. This in-
cludes full-fledged UI support, additional drift metrics, guidelines
for interpreting and acting on the metrics, and a more formal user
and data driven evaluation.

REFERENCES
 [1] Panos Alexopoulos, Silvio Peroni, Boris Villazón-Terrazas, Jeff Z. Pan, and
     José Manuél Gómez-Pérez. 2014. A Metaontology for Annotating Ontology
     Entities with Vagueness Descriptions. In Uncertainty Reasoning for the Semantic
     Web III - ISWC International Workshops, URSW 2011-2013, Revised Selected Papers.
     100–121. https://doi.org/10.1007/978-3-319-13413-0_6
 [2] Panos Alexopoulos, Manolis Wallace, Konstantinos Kafentzis, and Dimitris Ask-
     ounis. 2012. IKARUS-Onto: a methodology to develop fuzzy ontologies from
                                                                                        4