=Paper= {{Paper |id=Vol-201/paper-32 |storemode=property |title=Semantic Annotation of Documents Applied to E-Recruitment |pdfUrl=https://ceur-ws.org/Vol-201/10.pdf |volume=Vol-201 |dblpUrl=https://dblp.org/rec/conf/swap/YahiaouiBP06 }} ==Semantic Annotation of Documents Applied to E-Recruitment== https://ceur-ws.org/Vol-201/10.pdf

> 10 < 1

Semantic Annotation of Documents
Applied to e-recruitment
L. Yahiaoui, Z. Boufaida and Y. Prié

to process and the need for a more semantic interpretation of
Abstract—This paper presents an approach based on semantic their content.
annotation of CVs and job offers for automating recruitment on Automatic matching between supply and demand requires
the web. The main idea consists on modeling formally the the use of new approaches based on semantic Web
semantic content of these documents in term of their
acquirements (case of a CV) or requirements (case of a job offer)
technologies. The idea consists on extending syntactic
using a shared ontology between recruiters and job seekers. The structures of documents with a semantic content in order to
domain ontology built is inspired from the most significant parts make them machine-understandable [8]. For that, two
of these documents (personal qualifications, diplomas and job approaches are proposed: (i) semantic annotation of
experiences) and handle the competencies management. It documents which consists on using a shared ontology to
describes a model of competency as well as hierarchies of topics enrich documents with metadata [11] and (ii) semantic
that a competency can have. It allows the end user to explicitly
enrich his document with metadata (annotations). Semantic
indexing of documents based on the construction of an index
matching between supply and demand, based on the computation that will have a structure inspired from the used ontology.
of a coefficient, can be applied in a superficial or a deep way. In what follows, we propose a simple approach based on
Superficial matching deals with all acquirements/requirements semantic annotation of documents to automate the e-
mentioned explicitly by a job seeker/recruiter (a special diploma, recruitment process. The main idea consists on formally
a special job experience or a special personal qualification), modeling the semantic content of these documents in term of
whereas competency based matching deals with all competencies
their acquirements/requirements, in a simple and efficient
underlying the user’s document.
way, by using a shared ontology between job seekers and
Index Terms—Competency, Semantic annotation, Semantic recruiters. Concepts of this ontology are inspired from the
matching, Semantic Web most significant parts of these documents and competency is
considered as the crucial element in the proposed modeling.
This objective requires the description of the global
I. INTRODUCTION architecture of our semantic annotation and matching system

T HE evolution of job market has proven that traditional
methods of recruitment are becoming inefficient. Internet
has introduced a new way of managing human resources.
(presented in section II), the definition of a competency model
(section III), the development of an ontology used to annotate
documents with their semantic contents (section IV) and the
Nowadays, job seekers can send their CVs directly to definition of an efficient and simple semantic matching
companies (email) or to dedicated servers on the Web. process between CVs and job offers (Section V). A pre-
Recruiters, on the other side, can publish their job offers on evaluation of our approach is shown in Section VI then a
the Web with a significant reduction in cost and time. In this conclusion and perspectives for improving this work are given
context, electronic recruitment tends to automate matching at the end of this paper.
between the published CVs and job offers. The major problem
is that these resources are often badly used because available
management techniques and tools are purely syntactic and II. THE ANNOTATION & MATCHING SYSTEM ARCHITECTURE
remain limited in front of the increasing number of documents The architecture of the annotation and semantic matching
system is illustrated in “Fig. 1”. It is composed of:
Manuscript received October 23, 2006. This work was supported in part by A. The ER-ontology
the Laboratory “LIRE”, Computer science Department, University of
Constantine-Algeria. An ontology framework composed by ontologies related to
L. Yahiaoui is with the computer science department, Laboratory “LIRE”, each other, dedicated to annotate CVs and job offers by its
University Mentouri Constantine, Constantine 25000 Algeria (phone: 00213- concepts instances. The metadata repository is used to store
31818817; fax: 00213-31818817; e-mail: yahiaoui_lilapg@yahoo.fr).
Z. Boufaida is with the computer science department, Laboratory “LIRE”, generated annotations.
University Mentouri Constantine, Constantine 25000 Algeria (phone: 00213-
31818817; fax: 00213-31818817; e-mail: boufaida@hotmail.com).
Y. prié belongs to laboratory “LIRIS”, UMR 5205 CNRS, University
Claude Bernard Lyon 1, F-69622 Villeurbanne Cedex, France (phone: 0033-
472431636; fax: 0033-472431536; e-mail: yannick.prie@liris.cnrs.fr ).
> 10 < 2

technical competency can be specific (related to a particular
domain) or general. In this work, we are interested in
"computer science and telecommunications" domain so our
scientific and technical competency relates a competency
object to a competency level [10]. The competency object can
be a «technology topic» or a «software artefact». The
competency level can have one of the following values: Basic
(B or 20%), Application (A or 50%), Master ship (M or 70%)
or Expert (E or 90%). Aptitudes, identified by their names, are
inspired from CIGREF [5]. “Fig. 2” illustrates the competency
model adopted.

Fig. 1. Semantic annotation & matching system architecture.

B. The XML/HTML documents server
It allows the storage and the management of documents to
be annotated (CVs and job offers).
C. The system interface Fig. 2. The competency model.
It offers two functions. The annotation interface gives the
end-user the possibility to annotate his document by using the
This competency model seams to be simple compared with
ER-ontology and generating metadata (annotations). The
the one proposed in the project COMMONCV [4] because our
matching interface allows the end-user to submit queries to the
objective is to find a compromise between simplicity and
matching component and presents the returned results. A
efficiency. Adding the context in the competency model will
recruiter can find the most qualified candidate according to his
insure a better matching between CVs and job offers but it
needs; whereas a job seeker can find the best job fitting to his
will also complicate this process.
qualifications.
D. The matching component IV. SEMANTIC CONTENT MODELING BASED ONTOLOGY
It interprets the user’s query to get the URI of the user’s Ontology, considered as a formal and explicit specialization
document (CV/job offer) and the kind of semantic matching to of a shared conceptualization, is the key element of the
apply, then it calculates coefficients of semantic matching semantic web. Ontologies are crucial for e-recruitment
(superficial or competency based coefficients) between the because they allow recruiters and job seekers to share a
user’s document and all available annotated documents (if the common reference system to describe contents of their
end-user is a job seeker, the matching process will use his CV documents in a non-ambiguous, simple, semantic and a formal
and all available job offers). The result is a set of pairs (URI/ way. The importance of formalization is to allow an automatic
C_match), where URI is the identifier of the found document matching between supply and demand.
and C_match the associated coefficient (percentage) of
A. The ER-ontology architecture
semantic matching (superficial or competency based).
Elements of the ER-ontology (Electronic-Recruitment
ontology) are inspired from the most significant and common
III. THE COMPETENCY MODEL parts between CVs and job offers. This includes personal
information, diplomas, job experiences and explicit
Human resources management is based on the knowledge
competencies acquired by a candidate (CV) or required by a
of individuals and their competencies, as well as on the
job position (job offer). Furthermore, a job or a diploma
knowledge of the organization and its jobs. By mapping these
mobilizes a subset of elementary competencies [4], what make
competencies, it is possible to enhance recruitment [13]. This
the competency the crucial element in the proposed modeling.
requires an explicit representation of competencies and thus a
For the construction of the ER-ontology, some ideas are
model for this concept. A competency can be identified as a
inspired from existing works [3] and [14]. We have chosen
set of knowledge used to accomplish a task [13]. It can
METHONTOLOGY [6] as a development method. “Fig. 3”
appear as an aptitude (behaviour) or a scientific and technical
illustrates the global architecture of the ER-ontology in term
competency (a knowledge or a know-how). The scientific and
of linked ontologies (or sub-ontologies). These ontologies are
> 10 < 3

detailed in “Fig. 4” as a set of concepts hierarchies (shown as competencies mobilized by a particular job.
rectangles) with semantic relations between them. 4) The ontology “COMPETENCY”: Describes the adopted
Competency model concepts are distinguished by doubled competency model and hierarchies of objects
edges. (“TechnologyTopic” or “SoftwareArtefact”) that can have the
The domain of this ontology is “Computer Science and scientific and technical competency [10]. In the computer
Telecommunications”. It is considered as a framework science domain, a topic can be general, mathematic or specific
composed by five ontologies: to this domain. The hierarchy of the general topic is inspired
from general knowledge of CIGREF [5] and that of the
mathematic topic inspired from the Algerian high education
programs in computer science [9]; Whereas the hierarchy of
the computer technology topic is inspired from information
system competencies of CIGREF [5], the Algerian high
education programs in computer science [9] and other
modeling works related to computer science disciplines [1].
The hierarchy of topics is built in order to cover the majority
of computer science disciplines including knowledge and
Fig. 3. Sub-ontologies of the ER-ontology.
know-how. Each topic is characterized by an attribute
"weight" which represents the percentage of its contribution in
1) The ontology “PERSON”: composed only of one concept its parent topic. This hierarchy will allow persons handling
“Person” which describes the most important personal diplomas to bring their knowledge closer to competencies
characteristics that a recruiter can require (and that a candidate required by a particular job throw the computation of a
can have). It includes: sex, maximum age, military service, semantic matching coefficient
residence (country and city), driving licence, familial state and 5) The ontology “ANNOTATION”: Allows associating a
nationality. resource with all its corresponding acquirements/requirements
2) The ontology “DIPLOMA”: Describes concepts related (case of a CV/a job offer). The concept “Resource” describes
to diplomas and trainings. This includes Diploma families, the document to be annotated through its URI (Unified
valid domain diplomas and a diplomas reference system Resource identifier) and type (CV or a job position). The
inspired here form the Algerian high education system [9]. It concept “AcquiRequi” is specialized in elements that a
is related to the ontology “COMPETENCY” to attest resource can be annotated with and links the
competencies mobilized by a particular diploma. “ANNOTATION” ontology with the other sub-ontologies.
3) The ontology “JOB”: Describes concepts related to job The concept “Annotation” relates the two former concepts in
experiences. This includes job families, existing domain jobs order to annotate a specific resource with a set of
and a jobs reference system inspired form CIGREF [5]. It is acquirements/requirements. The role of this ontology can be
related to the ontology “COMPETENCY” to attest replaced by a semantic annotation tool.

Fig. 4. The detailed architecture of the ER-ontology.
> 10 < 4

a scientific and technical competence) will be exploited to
The ER-ontology contains 510 concepts : 109 concepts calculate the level of the candidate in this topic. A weight is
belong to the general topic hierarchy, 351 concepts belong to associated to each type of competency. For example, we can
the computer technology topic hierarchy and 18 concepts assign a coefficient of 2 to both “GeneralCompetency” and
belong to the software artefact hierarchy. These concepts are “Aptitude”, and a weight of 6 to “SpecificCompetency”. We
characterised by 20 attributes and 17 relations. This ontology have chosen these initial values for coefficients according to
is implemented as a single ontology in OWL (Ontology Web the importance of each type of competency in the recruitment
Language) [2] using Protégé_3.1[12]. process, but they can be adjusted according to tests results.
B. Semantic annotation process
Concepts instances of Sub-ontologies “JOB”, “DIPLOMA”
and “COMPETENCY” are created by the system
administrator (before any annotation by the end-user). The
end-user can use these instances during the annotation of his
document. An instance of the class “Competency” is created
for each subclass of “TechnologyTopic” or
“SoftwareArtefact” with the four possible competency levels
{B, A, M, E). Instances of classes “Job” and “Diploma” are
related to instances of the class “Competency” that they
mobilize. The role of the end-user consists on:
--Creating an instance of the class “Resource” to describe
the document to be annotated.
--Creating an instance of the class “Person” to describe
personal information of a candidate (case of a CV) or the
required personal information (case of a job offer).
--Creating instances of the class “JobExperience” to
describe the candidate’s job experiences or job experiences
required by a recruiter. This description includes the name of
a job and years of expertise.
--Creating instances of the class “AcquiRequi” with all
the requirements of a job offer or the acquirements of a
candidate, by using available instances.
--Creating instances of the class “Annotation” to link
acquirements/requirements to the annotated resource.

In our current version, instantiating classes is a manually
process using the interface of protégé-OWL[12].

V. SEMANTIC MATCHING BETWEEN DOCUMENTS
Once documents are annotated, a semantic matching
algorithm can be applied between a particular CV (CV1) and a
job offer (P1). This matching is based on the computation of a
coefficient (percentage) which can be done according to two
different but complementary techniques: (i) superficial
semantic matching takes into account requirements or
acquirements that annotate a document at a superficial level,
whereas (ii) competency based semantic matching uses all
competencies underlying the annotated document.
A. Competency based semantic matching
This kind of semantic matching is interested in
competencies underlying the annotated documents. The main
Fig. 5. Semantic matching algorithm.
idea consists on searching each required competency (by a job Note : the parameter Coef used in the function Evaluate_Subtopics
offer) in the set of competencies acquired by a candidate reflect the percentage of participation of the topic T in its parent topic. For
(CV). If this competency exists, a weight will be cumulated; example we can estimate that the percentage of participation of the topic
“Software design” in its parent topic “software engineering” is 25% (or
otherwise the topic’s hierarchy of this competency (the case of 0.25).
> 10 < 5

acquirements/requirements to reflect its importance in the
The scientific and technical competence level is evaluated computation of the matching coefficient. For example, we can
as (B ≅20%) if level<25%, as (A ≅50%) if 25% ≤level < 60%, assign the coefficient 8 to the type “Person” (1 for each
as (M ≅70%) if 60% ≤level ≤75% and as (E ≅90%) if level personal qualification), 10 for the type “Diploma”, 20 for the
>75 %. “Fig. 5” illustrates the competency based semantic type “JobExperience” and 5 for the type “Competency”. These
matching algorithm, witch interests as most, between a CV weights can be adjusted according to tests results. The
(RCV) and a job offer (ROF). The following Conventions are matching coefficient will depend on the difference between
used: the some of weights of all job requirements (tot_weight) and
--C(I): I is an individual concept/class C (so class(I)=C). the some of weights of requirements satisfied by the candidate
--I.atrName: the value of the attribute “atrName” of the (calc_weight). It is calculated as :
individual I or all individuals related to I by the role C_match = (calc_weight/ tot_weight)*100.
“atrName”.
--A → C: class A is a sub-class of class C. VI. DIPLOMAS AND JOBS MATCHING
The function “Extraction_Competencies” extracts all The test of our approach on a set of documents has given
competencies underlying the CV in the set CCV and those of satisfying results. From the simplicity and efficiency view
the job offer in the set COF. Theses competencies can be points, the two proposed techniques of semantic matching
explicit (explicit annotations) or implicit (mobilized by a offer to the recruiter a deep vision of the received CVs
particular diploma or a job experience). (satisfaction of superficial and deep requirements).
In addition to the power of expression offered by OWL, Furthermore, a job seeker have the possibility to make closer
used to implement the ER-ontology, powerful inference his competencies with those are required by a particular job
services are offered by the a reasoner called RACER [7]. position. “Table.I” shows the results of the competency based
This reasoner is a knowledge representation system that semantic matching between five job offers (with different job
implements a highly optimized tableau calculus for a very positions) and four CVs (candidates having distinct diplomas).
expressive description logic. It can interpret OWL documents TABLE I
and offers reasoning services for multiple T-Boxes and for RESOLTS OF THE SEMANTIC MATCHING BASED COMPETENCY (%)

multiple A-Boxes as well.
At the terminological level, various types of queries can be Jobs TNT AM D DBA EOS
applied. For instance: to check the consistency of a concept or Diplomas

to control relations between concepts (descendants or parents). AB 60,50 61,36 59,32 54,38 42,60
The first functionality was used for the validation of the ER- BSE 55,63 67,74 80,79 55,21 37,00
ontology, while the second one can be used to exploit the
topics hierarchy of the scientific and technical competency in BIS 57,60 90,66 72,60 73,18 38,80
the implementation of the competency based matching
BSTIC 76,71 66,41 70,48 65,33 48,40
algorithm (for example : to implement F←{Fi , i≥0/ Fi →
T}). Jobs: TNT= Technician in network and telecommunication, AM=
At the A-Boxes level, other queries are possible. The most applications manager, D= developer, DBA= data base administrator, EOS=
interesting for us are : calculating the direct type (class) of an expert in operation system. Bachelor
Diplomas: AB = Academic Bachelor, SEB= Bachelor in Software
individual, which can be used in the IF-statements (for Engineering, ISB = Bachelor in Information System, STICB = Bachelor in
example: If GeneralCompetency(Ci)) and extracting instances Science and Technology of Information and Communication.
of a particular class, even according to various criteria, based
on analysing roles and attributes of these instances. RQL It is clear that these coefficients reflect the relation between
(Racer Query Language), which is an extended query jobs and diplomas. For Instance a candidate having a
language for RACER, makes it possible to use complex professional Bachelor in Science and Technology of
queries on OWL documents that can be useful in the Information and Communication (BSTIC) is the most
implementation of extraction functions mentioned in the qualified to get the position of a Network and
proposed matching algorithms (for example: Communications Technician among the other candidates.
Extraction_Competencies(Ccv , Cp)).
B. Superficial semantic matching VII. CONCLUSION
Acquirements or requirements that can explicitly annotate a
In this paper, a simple and efficient approach based on
document (CV/job offer) have four types: a competency, a
semantic annotation for automating electronic recruitment was
diploma, a job experience (job + years of expertise) or
proposed. It is characterized by modeling the semantic content
personal information. In superficial matching, researching a
of CVs and job offers using a shared ontology between
particular job offer requirement in the candidate’s
recruiters and job seekers. Elements of this ontology are
acquirements set is done with exactitude (exists or not). A
inspired from the most significant parts of these documents. It
weight is associated to each type of
> 10 < 6

allows also competencies management via a competency
formal modeling. The ER-ontology is implemented in OWL,
by using the powerful inference services of the RACER
reasoner, all acquirements/requirements related to a particular
document, including competencies, can be deduced (inferred)
and used by the two original algorithms of semantic matching
proposed (superficial and competency based).
Future work aims to validate this approach on real data (a
site of CVs and job offers) and enhance the competency
model as well as the semantic matching process. We tend also
to implement an interface for annotating documents easier to
use then the Protégé-OWL interface that we use actually, as
well as to generalize the ER-ontology to other domains.
Furthermore, the different conceptual models used in the
different countries, especially for describing diplomas and
jobs, should be handled.

REFERENCES
[1] A. Abran, J. W. Moore, P. Bourque, R. Dupuis et L. L. Tripp, “A guide
to the Software Engineering Body of Knowledge-SWEBOK”, IEEE
Computer Society Professional project, 2004. Available:
http://www.swebok.org
[2] S. Bechhofer, F.-V. Harmelen, J. Hendler, I. Horrocks, “OWL Web
Ontology Language Reference”, 2004. Available:
http://www.w3.org/TR/2004/REC-owl-ref-20040210/
[3] C. Bizer, R. Heese, M. Mocho, R. Oldakowski, R. Tolksdorf, R.
Eckstein, “The Impact of Semantic Web Technologies on Job
Recruitment Processes”, in In International Conference workshop on
computer science (WI’05), 2005.
[4] M. Bourse, M. Harzallah, M. Leclère, F. Trichet, “COMMONCV:
modeling the competencies underlying a Curriculum Vita”, IRIN
research report N° 2, 2002.
[5] GIGREF, “Nomenclature 2005, les emplois-métiers du système
d’information dans les grandes entreprises”, 2005. Available:
http://www.cigref.fr/cigref/livelink.exe/Nomenclature_RH_2005.pdf?fu
nc=doc.Fetch&nodeId=401472&docTitle=Nomenclature_RH_2005%2E
pdf
[6] M. Fernandez, P.-A. Gomez, N. Juristo, “Methontology: from
ontological art toward ontological engineering”, In Spring symposium
series on ontological engineering AAAI97, USA, 1997.
[7] V. Haarslev, R. Moller, M. Wessel, “RACER User’s Guide and
Reference Manual (Version 1.7.19)”, 2004. Available:
http://coli.lili.uni-bielefeld.de/~felix/lehre/ws04_05/
ontologischeRessourcen/addLiterature/haarslev-undmoeller04.pdf
[8] P. Laublet, C. Reynaud, J. Charlet, “Sur quelques aspects du Web
sémantique”, 2002. Available:
sis.univ-tln.fr/gdri3/fichiers/assises2002/papers/03-WebSemantique.pdf
[9] M. E. S, “Reforme LMD de l’enseignement supérieur”, university of
Constantine, department of computer science, 2004.
[10] ONTOlogen Group (DFKI 's Knowledge Management Department),
Competency Ontology : a project for modelling competency using
protègé-2000), 2002. Available: www.dfki.uni-kl.de/~elst/ONTOlogen/
doc/ONTOlogen-148.htm
[11] Y. Prié, S. Garlatti, “ Annotations et Méta-données dans le Web
sémantique”, in Revue I3 Information – Interaction – Intelligence,
numéro Hors-série Web Sémantique, 24 pp, 2003.
[12] Protègè OWL : http//protégé/standford/edu/plugins/owl
[13] C. Rieu, C.-M. Rousseau, C. Roche, “Gestion des compétences: un
modèle opérationnel à base d’ontologie”, (du E-Management à la E-
RH) Colloque, Paris-Dauphine Univ., Paris, France, 2005.
[14] F. Trichet, AnnotatingWithCigref : a project for annotation CVs, 2002.
Available : www.sciences.univ-nantes.fr/irin/commoncv/ production.