=Paper= {{Paper |id=Vol-201/paper-32 |storemode=property |title=Semantic Annotation of Documents Applied to E-Recruitment |pdfUrl=https://ceur-ws.org/Vol-201/10.pdf |volume=Vol-201 |dblpUrl=https://dblp.org/rec/conf/swap/YahiaouiBP06 }} ==Semantic Annotation of Documents Applied to E-Recruitment== https://ceur-ws.org/Vol-201/10.pdf
> 10 <                                                                                                                                     1




                          Semantic Annotation of Documents
                              Applied to e-recruitment
                                                    L. Yahiaoui, Z. Boufaida and Y. Prié


                                                                              to process and the need for a more semantic interpretation of
   Abstract—This paper presents an approach based on semantic                 their content.
annotation of CVs and job offers for automating recruitment on                   Automatic matching between supply and demand requires
the web. The main idea consists on modeling formally the                      the use of new approaches based on semantic Web
semantic content of these documents in term of their
acquirements (case of a CV) or requirements (case of a job offer)
                                                                              technologies. The idea consists on extending syntactic
using a shared ontology between recruiters and job seekers. The               structures of documents with a semantic content in order to
domain ontology built is inspired from the most significant parts             make them machine-understandable [8]. For that, two
of these documents (personal qualifications, diplomas and job                 approaches are proposed:           (i) semantic annotation of
experiences) and handle the competencies management. It                       documents which consists on using a shared ontology to
describes a model of competency as well as hierarchies of topics              enrich documents with metadata [11] and (ii) semantic
that a competency can have. It allows the end user to explicitly
enrich his document with metadata (annotations). Semantic
                                                                              indexing of documents based on the construction of an index
matching between supply and demand, based on the computation                  that will have a structure inspired from the used ontology.
of a coefficient, can be applied in a superficial or a deep way.                 In what follows, we propose a simple approach based on
Superficial matching deals with all acquirements/requirements                 semantic annotation of documents to automate the e-
mentioned explicitly by a job seeker/recruiter (a special diploma,            recruitment process. The main idea consists on formally
a special job experience or a special personal qualification),                modeling the semantic content of these documents in term of
whereas competency based matching deals with all competencies
                                                                              their acquirements/requirements, in a simple and efficient
underlying the user’s document.
                                                                              way, by using a shared ontology between job seekers and
  Index Terms—Competency, Semantic annotation, Semantic                       recruiters. Concepts of this ontology are inspired from the
matching, Semantic Web                                                        most significant parts of these documents and competency is
                                                                              considered as the crucial element in the proposed modeling.
                                                                              This objective requires the description of the global
                          I. INTRODUCTION                                     architecture of our semantic annotation and matching system

T    HE evolution of job market has proven that traditional
     methods of recruitment are becoming inefficient. Internet
has introduced a new way of managing human resources.
                                                                              (presented in section II), the definition of a competency model
                                                                              (section III), the development of an ontology used to annotate
                                                                              documents with their semantic contents (section IV) and the
Nowadays, job seekers can send their CVs directly to                          definition of an efficient and simple semantic matching
companies (email) or to dedicated servers on the Web.                         process between CVs and job offers (Section V). A pre-
Recruiters, on the other side, can publish their job offers on                evaluation of our approach is shown in Section VI then a
the Web with a significant reduction in cost and time. In this                conclusion and perspectives for improving this work are given
context, electronic recruitment tends to automate matching                    at the end of this paper.
between the published CVs and job offers. The major problem
is that these resources are often badly used because available
management techniques and tools are purely syntactic and                       II.   THE ANNOTATION & MATCHING SYSTEM ARCHITECTURE
remain limited in front of the increasing number of documents                   The architecture of the annotation and semantic matching
                                                                              system is illustrated in “Fig. 1”. It is composed of:
   Manuscript received October 23, 2006. This work was supported in part by     A. The ER-ontology
the Laboratory “LIRE”, Computer science Department, University of
Constantine-Algeria.                                                            An ontology framework composed by ontologies related to
   L. Yahiaoui is with the computer science department, Laboratory “LIRE”,    each other, dedicated to annotate CVs and job offers by its
University Mentouri Constantine, Constantine 25000 Algeria (phone: 00213-     concepts instances. The metadata repository is used to store
31818817; fax: 00213-31818817; e-mail: yahiaoui_lilapg@yahoo.fr).
   Z. Boufaida is with the computer science department, Laboratory “LIRE”,    generated annotations.
University Mentouri Constantine, Constantine 25000 Algeria (phone: 00213-
31818817; fax: 00213-31818817; e-mail: boufaida@hotmail.com).
   Y. prié belongs to laboratory “LIRIS”, UMR 5205 CNRS, University
Claude Bernard Lyon 1, F-69622 Villeurbanne Cedex, France (phone: 0033-
472431636; fax: 0033-472431536; e-mail: yannick.prie@liris.cnrs.fr ).
> 10 <                                                                                                                           2

                                                                    technical competency can be specific (related to a particular
                                                                    domain) or general. In this work, we are interested in
                                                                    "computer science and telecommunications" domain so our
                                                                    scientific and technical competency relates a competency
                                                                    object to a competency level [10]. The competency object can
                                                                    be a «technology topic» or a «software artefact». The
                                                                    competency level can have one of the following values: Basic
                                                                    (B or 20%), Application (A or 50%), Master ship (M or 70%)
                                                                    or Expert (E or 90%). Aptitudes, identified by their names, are
                                                                    inspired from CIGREF [5]. “Fig. 2” illustrates the competency
                                                                    model adopted.




Fig. 1. Semantic annotation & matching system architecture.


 B. The XML/HTML documents server
  It allows the storage and the management of documents to
be annotated (CVs and job offers).
  C. The system interface                                           Fig. 2. The competency model.
   It offers two functions. The annotation interface gives the
end-user the possibility to annotate his document by using the
                                                                       This competency model seams to be simple compared with
ER-ontology and generating metadata (annotations). The
                                                                    the one proposed in the project COMMONCV [4] because our
matching interface allows the end-user to submit queries to the
                                                                    objective is to find a compromise between simplicity and
matching component and presents the returned results. A
                                                                    efficiency. Adding the context in the competency model will
recruiter can find the most qualified candidate according to his
                                                                    insure a better matching between CVs and job offers but it
needs; whereas a job seeker can find the best job fitting to his
                                                                    will also complicate this process.
qualifications.
  D. The matching component                                             IV.   SEMANTIC CONTENT MODELING BASED ONTOLOGY
   It interprets the user’s query to get the URI of the user’s        Ontology, considered as a formal and explicit specialization
document (CV/job offer) and the kind of semantic matching to        of a shared conceptualization, is the key element of the
apply, then it calculates coefficients of semantic matching         semantic web. Ontologies are crucial for e-recruitment
(superficial or competency based coefficients) between the          because they allow recruiters and job seekers to share a
user’s document and all available annotated documents (if the       common reference system to describe contents of their
end-user is a job seeker, the matching process will use his CV      documents in a non-ambiguous, simple, semantic and a formal
and all available job offers). The result is a set of pairs (URI/   way. The importance of formalization is to allow an automatic
C_match), where URI is the identifier of the found document         matching between supply and demand.
and C_match the associated coefficient (percentage) of
                                                                       A. The ER-ontology architecture
semantic matching (superficial or competency based).
                                                                       Elements of the ER-ontology (Electronic-Recruitment
                                                                    ontology) are inspired from the most significant and common
                  III.   THE COMPETENCY MODEL                       parts between CVs and job offers. This includes personal
                                                                    information, diplomas, job experiences and explicit
   Human resources management is based on the knowledge
                                                                    competencies acquired by a candidate (CV) or required by a
of individuals and their competencies, as well as on the
                                                                    job position (job offer). Furthermore, a job or a diploma
knowledge of the organization and its jobs. By mapping these
                                                                    mobilizes a subset of elementary competencies [4], what make
competencies, it is possible to enhance recruitment [13]. This
                                                                    the competency the crucial element in the proposed modeling.
requires an explicit representation of competencies and thus a
                                                                    For the construction of the ER-ontology, some ideas are
model for this concept. A competency can be identified as a
                                                                    inspired from existing works [3] and [14]. We have chosen
set of knowledge used to accomplish a task [13]. It can
                                                                    METHONTOLOGY [6] as a development method. “Fig. 3”
appear as an aptitude (behaviour) or a scientific and technical
                                                                    illustrates the global architecture of the ER-ontology in term
competency (a knowledge or a know-how). The scientific and
                                                                    of linked ontologies (or sub-ontologies). These ontologies are
> 10 <                                                                                                                            3

detailed in “Fig. 4” as a set of concepts hierarchies (shown as      competencies mobilized by a particular job.
rectangles) with semantic relations between them.                       4) The ontology “COMPETENCY”: Describes the adopted
Competency model concepts are distinguished by doubled               competency       model      and    hierarchies    of    objects
edges.                                                               (“TechnologyTopic” or “SoftwareArtefact”) that can have the
   The domain of this ontology is “Computer Science and              scientific and technical competency [10]. In the computer
Telecommunications”. It is considered as a framework                 science domain, a topic can be general, mathematic or specific
composed by five ontologies:                                         to this domain. The hierarchy of the general topic is inspired
                                                                     from general knowledge of CIGREF [5] and that of the
                                                                     mathematic topic inspired from the Algerian high education
                                                                     programs in computer science [9]; Whereas the hierarchy of
                                                                     the computer technology topic is inspired from information
                                                                     system competencies of CIGREF [5], the Algerian high
                                                                     education programs in computer science [9] and other
                                                                     modeling works related to computer science disciplines [1].
                                                                     The hierarchy of topics is built in order to cover the majority
                                                                     of computer science disciplines including knowledge and
Fig. 3. Sub-ontologies of the ER-ontology.
                                                                     know-how. Each topic is characterized by an attribute
                                                                     "weight" which represents the percentage of its contribution in
   1) The ontology “PERSON”: composed only of one concept            its parent topic. This hierarchy will allow persons handling
“Person” which describes the most important personal                 diplomas to bring their knowledge closer to competencies
characteristics that a recruiter can require (and that a candidate   required by a particular job throw the computation of a
can have). It includes: sex, maximum age, military service,          semantic matching coefficient
residence (country and city), driving licence, familial state and       5) The ontology “ANNOTATION”: Allows associating a
nationality.                                                         resource with all its corresponding acquirements/requirements
   2) The ontology “DIPLOMA”: Describes concepts related             (case of a CV/a job offer). The concept “Resource” describes
to diplomas and trainings. This includes Diploma families,           the document to be annotated through its URI (Unified
valid domain diplomas and a diplomas reference system                Resource identifier) and type (CV or a job position). The
inspired here form the Algerian high education system [9]. It        concept “AcquiRequi” is specialized in elements that a
is related to the ontology “COMPETENCY” to attest                    resource can be annotated with and links the
competencies mobilized by a particular diploma.                      “ANNOTATION” ontology with the other sub-ontologies.
   3) The ontology “JOB”: Describes concepts related to job          The concept “Annotation” relates the two former concepts in
experiences. This includes job families, existing domain jobs        order to annotate a specific resource with a set of
and a jobs reference system inspired form CIGREF [5]. It is          acquirements/requirements. The role of this ontology can be
related to the ontology “COMPETENCY” to attest                       replaced by a semantic annotation tool.




  Fig. 4. The detailed architecture of the ER-ontology.
> 10 <                                                                                                                                           4

                                                                   a scientific and technical competence) will be exploited to
   The ER-ontology contains 510 concepts : 109 concepts            calculate the level of the candidate in this topic. A weight is
belong to the general topic hierarchy, 351 concepts belong to      associated to each type of competency. For example, we can
the computer technology topic hierarchy and 18 concepts            assign a coefficient of 2 to both “GeneralCompetency” and
belong to the software artefact hierarchy. These concepts are      “Aptitude”, and a weight of 6 to “SpecificCompetency”. We
characterised by 20 attributes and 17 relations. This ontology     have chosen these initial values for coefficients according to
is implemented as a single ontology in OWL (Ontology Web           the importance of each type of competency in the recruitment
Language) [2] using Protégé_3.1[12].                               process, but they can be adjusted according to tests results.
  B. Semantic annotation process
   Concepts instances of Sub-ontologies “JOB”, “DIPLOMA”
and “COMPETENCY” are created by the system
administrator (before any annotation by the end-user). The
end-user can use these instances during the annotation of his
document. An instance of the class “Competency” is created
for     each      subclass     of     “TechnologyTopic”       or
“SoftwareArtefact” with the four possible competency levels
{B, A, M, E). Instances of classes “Job” and “Diploma” are
related to instances of the class “Competency” that they
mobilize. The role of the end-user consists on:
      --Creating an instance of the class “Resource” to describe
the document to be annotated.
      --Creating an instance of the class “Person” to describe
personal information of a candidate (case of a CV) or the
required personal information (case of a job offer).
      --Creating instances of the class “JobExperience” to
describe the candidate’s job experiences or job experiences
required by a recruiter. This description includes the name of
a job and years of expertise.
      --Creating instances of the class “AcquiRequi” with all
the requirements of a job offer or the acquirements of a
candidate, by using available instances.
      --Creating instances of the class “Annotation” to link
acquirements/requirements to the annotated resource.

  In our current version, instantiating classes is a manually
process using the interface of protégé-OWL[12].

         V.   SEMANTIC MATCHING BETWEEN DOCUMENTS
   Once documents are annotated, a semantic matching
algorithm can be applied between a particular CV (CV1) and a
job offer (P1). This matching is based on the computation of a
coefficient (percentage) which can be done according to two
different but complementary techniques: (i) superficial
semantic matching takes into account requirements or
acquirements that annotate a document at a superficial level,
whereas (ii) competency based semantic matching uses all
competencies underlying the annotated document.
  A. Competency based semantic matching
   This kind of semantic matching is interested in
competencies underlying the annotated documents. The main
                                                                    Fig. 5. Semantic matching algorithm.
idea consists on searching each required competency (by a job           Note : the parameter Coef used in the function Evaluate_Subtopics
offer) in the set of competencies acquired by a candidate           reflect the percentage of participation of the topic T in its parent topic. For
(CV). If this competency exists, a weight will be cumulated;        example we can estimate that the percentage of participation of the topic
                                                                    “Software design” in its parent topic “software engineering” is 25% (or
otherwise the topic’s hierarchy of this competency (the case of     0.25).
> 10 <                                                                                                                                  5

                                                                     acquirements/requirements to reflect its importance in the
   The scientific and technical competence level is evaluated        computation of the matching coefficient. For example, we can
as (B ≅20%) if level<25%, as (A ≅50%) if 25% ≤level < 60%,           assign the coefficient 8 to the type “Person” (1 for each
as (M ≅70%) if 60% ≤level ≤75% and as (E ≅90%) if level              personal qualification), 10 for the type “Diploma”, 20 for the
>75 %. “Fig. 5” illustrates the competency based semantic            type “JobExperience” and 5 for the type “Competency”. These
matching algorithm, witch interests as most, between a CV            weights can be adjusted according to tests results. The
(RCV) and a job offer (ROF). The following Conventions are           matching coefficient will depend on the difference between
used:                                                                the some of weights of all job requirements (tot_weight) and
     --C(I): I is an individual concept/class C (so class(I)=C).     the some of weights of requirements satisfied by the candidate
     --I.atrName: the value of the attribute “atrName” of the        (calc_weight). It is calculated as :
individual I or all individuals related to I by the role                C_match = (calc_weight/ tot_weight)*100.
“atrName”.
     --A → C: class A is a sub-class of class C.                                     VI.   DIPLOMAS AND JOBS MATCHING
   The function “Extraction_Competencies” extracts all                  The test of our approach on a set of documents has given
competencies underlying the CV in the set CCV and those of           satisfying results. From the simplicity and efficiency view
the job offer in the set COF. Theses competencies can be             points, the two proposed techniques of semantic matching
explicit (explicit annotations) or implicit (mobilized by a          offer to the recruiter a deep vision of the received CVs
particular diploma or a job experience).                             (satisfaction of superficial and deep requirements).
   In addition to the power of expression offered by OWL,            Furthermore, a job seeker have the possibility to make closer
used to implement the ER-ontology, powerful inference                his competencies with those are required by a particular job
services are offered by the a reasoner called RACER [7].             position. “Table.I” shows the results of the competency based
This reasoner is a knowledge representation system that              semantic matching between five job offers (with different job
implements a highly optimized tableau calculus for a very            positions) and four CVs (candidates having distinct diplomas).
expressive description logic. It can interpret OWL documents                                          TABLE I
and offers reasoning services for multiple T-Boxes and for                  RESOLTS OF THE SEMANTIC MATCHING BASED COMPETENCY (%)

multiple A-Boxes as well.
   At the terminological level, various types of queries can be               Jobs         TNT        AM          D       DBA         EOS
applied. For instance: to check the consistency of a concept or       Diplomas

to control relations between concepts (descendants or parents).            AB          60,50        61,36       59,32   54,38       42,60
The first functionality was used for the validation of the ER-            BSE          55,63        67,74       80,79   55,21       37,00
ontology, while the second one can be used to exploit the
topics hierarchy of the scientific and technical competency in            BIS          57,60        90,66       72,60   73,18       38,80
the implementation of the competency based matching
                                                                         BSTIC         76,71        66,41       70,48   65,33       48,40
algorithm (for example : to implement F←{Fi , i≥0/ Fi →
T}).                                                                    Jobs: TNT= Technician in network and telecommunication, AM=
   At the A-Boxes level, other queries are possible. The most        applications manager, D= developer, DBA= data base administrator, EOS=
interesting for us are : calculating the direct type (class) of an   expert in operation system. Bachelor
                                                                        Diplomas: AB = Academic Bachelor, SEB= Bachelor in Software
individual, which can be used in the IF-statements (for              Engineering, ISB = Bachelor in Information System, STICB = Bachelor in
example: If GeneralCompetency(Ci)) and extracting instances          Science and Technology of Information and Communication.
of a particular class, even according to various criteria, based
on analysing roles and attributes of these instances. RQL               It is clear that these coefficients reflect the relation between
(Racer Query Language), which is an extended query                   jobs and diplomas. For Instance a candidate having a
language for RACER, makes it possible to use complex                 professional Bachelor in Science and Technology of
queries on OWL documents that can be useful in the                   Information and Communication (BSTIC) is the most
implementation of extraction functions mentioned in the              qualified to get the position of a Network and
proposed        matching      algorithms       (for      example:    Communications Technician among the other candidates.
Extraction_Competencies(Ccv , Cp)).
  B. Superficial semantic matching                                                               VII. CONCLUSION
  Acquirements or requirements that can explicitly annotate a
                                                                        In this paper, a simple and efficient approach based on
document (CV/job offer) have four types: a competency, a
                                                                     semantic annotation for automating electronic recruitment was
diploma, a job experience (job + years of expertise) or
                                                                     proposed. It is characterized by modeling the semantic content
personal information. In superficial matching, researching a
                                                                     of CVs and job offers using a shared ontology between
particular job offer requirement in the candidate’s
                                                                     recruiters and job seekers. Elements of this ontology are
acquirements set is done with exactitude (exists or not). A
                                                                     inspired from the most significant parts of these documents. It
weight      is     associated    to     each      type     of
> 10 <                                                                         6

allows also competencies management via a competency
formal modeling. The ER-ontology is implemented in OWL,
by using the powerful inference services of the RACER
reasoner, all acquirements/requirements related to a particular
document, including competencies, can be deduced (inferred)
and used by the two original algorithms of semantic matching
proposed (superficial and competency based).
   Future work aims to validate this approach on real data (a
site of CVs and job offers) and enhance the competency
model as well as the semantic matching process. We tend also
to implement an interface for annotating documents easier to
use then the Protégé-OWL interface that we use actually, as
well as to generalize the ER-ontology to other domains.
Furthermore, the different conceptual models used in the
different countries, especially for describing diplomas and
jobs, should be handled.


                              REFERENCES
[1]  A. Abran, J. W. Moore, P. Bourque, R. Dupuis et L. L. Tripp, “A guide
     to the Software Engineering Body of Knowledge-SWEBOK”, IEEE
     Computer Society Professional project, 2004. Available:
      http://www.swebok.org
[2] S. Bechhofer, F.-V. Harmelen, J. Hendler, I. Horrocks, “OWL Web
     Ontology Language Reference”, 2004. Available:
      http://www.w3.org/TR/2004/REC-owl-ref-20040210/
[3] C. Bizer, R. Heese, M. Mocho, R. Oldakowski, R. Tolksdorf, R.
     Eckstein, “The Impact of Semantic Web Technologies on Job
     Recruitment Processes”, in In International Conference workshop on
     computer science (WI’05), 2005.
[4] M. Bourse, M. Harzallah, M. Leclère, F. Trichet, “COMMONCV:
     modeling the competencies underlying a Curriculum Vita”, IRIN
     research report N° 2, 2002.
[5] GIGREF, “Nomenclature 2005, les emplois-métiers du système
     d’information dans les grandes entreprises”, 2005.           Available:
     http://www.cigref.fr/cigref/livelink.exe/Nomenclature_RH_2005.pdf?fu
     nc=doc.Fetch&nodeId=401472&docTitle=Nomenclature_RH_2005%2E
     pdf
[6] M. Fernandez, P.-A. Gomez, N. Juristo, “Methontology: from
     ontological art toward ontological engineering”, In Spring symposium
     series on ontological engineering AAAI97, USA, 1997.
[7] V. Haarslev, R. Moller, M. Wessel, “RACER User’s Guide and
     Reference Manual (Version 1.7.19)”, 2004. Available:
      http://coli.lili.uni-bielefeld.de/~felix/lehre/ws04_05/
      ontologischeRessourcen/addLiterature/haarslev-undmoeller04.pdf
[8] P. Laublet, C. Reynaud, J. Charlet, “Sur quelques aspects du Web
     sémantique”, 2002. Available:
      sis.univ-tln.fr/gdri3/fichiers/assises2002/papers/03-WebSemantique.pdf
[9] M. E. S, “Reforme LMD de l’enseignement supérieur”, university of
     Constantine, department of computer science, 2004.
[10] ONTOlogen Group (DFKI 's Knowledge Management Department),
     Competency Ontology : a project for modelling competency using
     protègé-2000), 2002. Available: www.dfki.uni-kl.de/~elst/ONTOlogen/
     doc/ONTOlogen-148.htm
[11] Y. Prié, S. Garlatti, “ Annotations et Méta-données dans le Web
     sémantique”, in Revue I3 Information – Interaction – Intelligence,
     numéro Hors-série Web Sémantique, 24 pp, 2003.
[12] Protègè OWL : http//protégé/standford/edu/plugins/owl
[13] C. Rieu, C.-M. Rousseau, C. Roche, “Gestion des compétences: un
     modèle opérationnel à base d’ontologie”, (du E-Management à la E-
     RH) Colloque, Paris-Dauphine Univ., Paris, France, 2005.
[14] F. Trichet, AnnotatingWithCigref : a project for annotation CVs, 2002.
     Available : www.sciences.univ-nantes.fr/irin/commoncv/ production.