<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Labour Market Intelligence for Supporting Decision Making</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roberto Boselli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mirko Cesarini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Mercorio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mario Mezzanzanica</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CRISP Research Centre, Univ. of Milano-Bicocca</institution>
          ,
          <addr-line>Italy, discussion paper</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Statistics and Quantitative Methods, Univ. of Milano-Bicocca</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Over the past decade, a growing number of employers has been using the Web for advertising job opportunities through Web job vacancies, that usually specify a job position, along with a set of skills a candidate should possess. Reasoning with these Web job advertisements can e ectively support the decision marking processes of several labour market stakeholders, including public organisations, educational and employment agencies, and analysts as well. Here, Labour Market Intelligence refers to the design and de nition of automated methodologies and tools for supporting real-time labour market monitoring at a very ne-grained level. This, in turn, represents a competitive advantage to labour market stakeholders with respect to classical survey-based analyses, as they are quite expensive and may require up to one year before being available. In this paper we discuss how Web job vacancies have been collected from selected websites, processed, and classi ed over a standard taxonomy through machine learning algorithms, extracting the most relevant skills from raw texts. Then, we show how our approach has been applied to some real-life studies and we discuss the bene ts provided to end users.</p>
      </abstract>
      <kwd-group>
        <kwd>Machine Learning</kwd>
        <kwd>Text Classi cation</kwd>
        <kwd>Big Data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>as they are \valued for fostering job-speci c and transversal skills, facilitating
the transition into employment and maintaining and updating the skills of the
workforce according to sectorial, regional and local needs".1 In 2014, the Cedefop
EU Agency2 launched a call-for-tender3 aimed at collecting Web job vacancies
from ve EU countries and extracting the requested skills from the data. The
rationale behind the project is to turn data extracted from Web job vacancies into
knowledge (and thus value) for policies design and evaluation through a
factbased decision making. In 2016, the EU launched the ESSnet Big Data project,
involving 22 EU member states with the aim of \integrating big data in the
regular production of o cial statistics, through pilots exploring the potential of
selected big data sources and building concrete applications".</p>
      <p>The rationale behind all these initiatives is that reasoning over Web job
vacancies represents an added value for both public and private labour market
operators to deeply understand the Labour Market dynamics, occupations, skills,
and trends: (i) by reducing the time-to-market with respect to classical
surveybased analyses (o cial Labour Market surveys results actually require up to one
year before being available); (ii) by overcoming the linguistic boundaries through
the use of standard classi cation systems rather than proprietary ones; (iii)
by representing the resulting knowledge over several dimensions (e.g., territory,
sectors, contracts, etc) at di erent level of granularity and (iv) by evaluating and
comparing international labour markets to support fact-based decision making.</p>
      <p>Paper's Goal. This paper would summarise some results of our research
activities and outcomes in Web Labour Market Intelligence in two distinct research
projects: WollyBI 4 and Cedefop3, by focusing on (i) job vacancy classi cation
and skill extraction tasks, and (ii) the support to decision making activities that
our approach provided to the involved stakeholders.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Web Labour Market in the Literature</title>
      <p>LMI is anemerging cross-disciplinary eld of studies that is attracting research
interests in both industrial and academic communities, as we summarise below.</p>
      <p>
        Scienti c Literature. Since the early 90s, text classi cation (TC) has been
an active research topic. It has been de ned as \the activity of labelling
natural language texts with thematic categories from a prede ned set" [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Most
popular techniques are based on the machine learning paradigm, according to
which an automatic text classi er is created by using an inductive process able
to learn, from a set of pre-classi ed documents, the characteristics of the
categories of interest. Recently, text classi cation has proven to give good results
in categorizing many real-life Web-based data such as, for instance, news and
1 The Commission Communication \A New Skills Agenda for Europe" COM(2016) 381/2, available
at https://goo.gl/Shw7bI
2 Cedefop European agency supports the development of European Vocational
Education and Training (VET) policies and contributes to their implementation
http://www.cedefop.europa.eu/
3 \Real-time Labour Market information on skill requirements: feasibility study and working
prototype". Cedefop Reference number AO/RPA/VKVET-NSOFRO/Real-time LMI/010/14. Contract
notice 2014/S 141-252026 of 15/07/2014 https://goo.gl/qNjmrn
4 www.wollybi.com
social media [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and sentiment analysis [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. On the other side, skills extraction
from Web job vacancies can be framed in the Information Extraction eld [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
and Named Entity Recognition [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The latter has been applied to solve
numerous domain speci c problems in the areas of Information Extraction and
Normalization [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In the last years, public administrations started exploring
new ways for supporting knowledge management (see, e.g., [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) as well as for
obtaining detailed and fresh information about the Labour Market. Here,
administrative information collected by public administrations has been used for
studying the Italian Labour Market dynamics performing both data quality [
        <xref ref-type="bibr" rid="ref5 ref9">5,9</xref>
        ]
and knowledge discovery activities [
        <xref ref-type="bibr" rid="ref10 ref4">10,4</xref>
        ] through AI techniques (see, e.g.[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]).
Unfortunately, administrative data are collected when people is hired (and only in
countries where the state collect such information), therefore they do not provide
information about the labour demand.
      </p>
      <p>Industries. This problem is also relevant for business purposes, and this
motivates the growing of several commercial products providing job seekers and
companies with skill-matching tools. Concerning rms, they strongly need to
automatize Human Resource (HR) department activities; as a consequence, a
growing amount of commercial skill-matching products has been developed in
the last years, for instance BurningGlass, Workday, Pluralsight, EmployInsight,
and TextKernel. To date, the only commercial solution that uses standard
taxonomies as thesauri is Janzz: a Web based platform to match labour demand
and supply in both public and private sectors. It also provides APIs access to its
knowledge base, but it is not aimed at classifying job vacancies. Worth of
mentioning is Google Job Search API, a pay-as-you-go service announced in 2016
for classyfying job vacancies through the Google Machine Learning service over
O*NET, that is the US standard occupation taxonomy. Though this commercial
service is still a closed alpha, it is quite promising and also sheds the light on
the needs for reasoning with Web job vacancies using a common taxonomy.</p>
      <p>All these approaches are quite relevant and e ective, and they also make
evidence of the importance of the Web for labour market information.
Nonetheless, they di er from our approach in two aspects. First, we aim to classify job
vacancies according to a target classi cation system for building a (language
independent) knowledge base for analyses purposes, rather than matching
resumes on job vacancies. Second, our approach aims to build a knowledge-graph
for supporting the fact-based decision making activities for LMI.
3</p>
    </sec>
    <sec id="sec-3">
      <title>An Approach to deal with Web Job Vacancies</title>
      <p>Background on Web Labour Market Information. A Web job o er
extracted can be seen as a document mainly composed of a pair of texts: a title and
a (full job) description. The title summarises the working position o ered by the
employer, while the description usually provides the position details, including
all the required relevant skills, according to the employer preferences.</p>
      <p>One of the most important classi cation system designed for this purposes is
ISCO: The International Standard Classi cation of Occupations is a four-level
classi cation that represents a standardised way for organising the labour market
occupations. ESCO is the multilingual classi cation system of European Skills,
Competences, Quali cations and Occupations, it is the European standard for
supporting the whole labour market intelligence over 24 EU languages. Basically,
the ESCO data model includes the whole ISCO structure, and extends it through
(i) a further level of ne-grained occupation descriptions and (ii) a taxonomy of
skills, and competences.</p>
      <p>
        Web job vacancy classi cation through Text Classi cation. Text
categorization aims at assigning a Boolean value to each pair (dj ; ci) 2 D C where D
is a set of documents and C a set of prede ned categories. A true value assigned
to (dj ; ci) indicates document dj to be set under the category ci, while a false
value indicates dj cannot be assigned under ci. In our scenario, we consider a set
of job vacancies J as a collection of documents each of which has to be assigned
to one (and only one) ISCO occupation code. We can model this problem as a
text classi cation problem, relying on the de nition of [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Formally speaking,
let J = fJ1; : : : ; Jng be a set of Job vacancies, the classi cation of J under jOj
ESCO occupation labels consists of jOj independent problems of classifying each
job vacancy J 2 J under a given ESCO occupation code oi for i = 1; : : : ; jOj.
Then, a classi er for oi is a function : J O ! f0; 1g that approximates an
unknown target function _ : J O ! f0; 1g. Clearly, as we deal with a
singlelabel classi er, 8j 2 J the following constraint must hold: Po2O (j; o) = 1.
Notice that in this way we can also extend the ESCO skills taxonomy through
the skills extracted from the job vacancies, as well as to weight both occupations
and skills with respect to the frequency through which they appear in the Web
Labour Market.
      </p>
      <p>
        Building up the Machine Learning Model. We build a machine learning model
for classifying multilingual Web job vacancies exploiting a single-label classi er
using both titles and descriptions. Indeed, titles often does not contain enough
information for performing a correct classi cation as we shown in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Several
machine learning techniques have been evaluated for developing the text
classi er, and they have been comparatively evaluated on a data set of 75; 546 job
vacancies in Italian. The evaluated techniques are: Support Vector Machines
(SVMs), in particular SVM Linear, SVM RBF Kernel, Random Forests (RFs),
and Arti cial Neural Networks (ANNs).
      </p>
      <p>Focusing on the Italian Labour Market Information, a set of 57,740
vacancies previously classi ed by domain experts belonging to ENRLMM5 was used.
5 The European Network on Regional Labour Market Monitoring
The set of already classi ed vacancies are split into train, validation, and test
sets. Then, a grid-search was performed over each classi er parameter space to
identify the values maximizing the classi cation e ectiveness (using training and
validation sets). Tab. 1 shows the scores computed over the test set.</p>
      <p>Skills Identi cation. The text pieces stating the required skills usually
concentrate on a small portion of job vacancy descriptions. Thus, these relevant
text pieces are extracted through a look-up search of sentinel expressions
selected by domain experts involved within the projects. The domain expert kept
adding new sentinel expressions until the number of n-grams identi ed in the
next step grows (i.e., the set converged to a stable set). Then, the n-gram
Document Frequency (DF ), i.e., the number of vacancies where the n-gram is found, is
computed considering as a scope both (1) the whole dataset and (2) the vacancy
subsets homogeneous w.r.t. the ISCO occupation code.</p>
      <p>The n-gram produced by the previous step underwent a string similarity
comparison with respect to ESCO skill concept labels. The pair &lt;skill candidate
n-gram, ESCO Skill label&gt; matching the following criteria were suggested for
domain experts approval. The string similarity was computed as the mean value
among the following well-known string metrics: Levenshtein distance, Jaccard
similarity, and the S rensen-Dice indexes6. The pairs having a similarity lower
than 70% are dropped while the others are proposed to the domain experts for
evaluation. Each &lt;candidate n-gram, ESCO Skill label&gt; above the threshold has
been reviewed by a domain expert to decided whether to consider an n-gram as:
(1) a skill described in ESCO; (2) a skill not enlisted in ESCO (i.e., a novel skill);
(3) or to reject the proposed n-gram as a skill concept. The outcome of this
process can be seen as a dictionary of n-gram related skills and their corresponding
ESCO skill concept (when available). Finally, the n-gram dictionary produced
by the previous steps and the mappings among the ESCO skill concepts have
been used to look for skills among the downloaded vacancies.</p>
      <p>LM Knowledge as a Graph. The resulting knowledge on LM is then
modelled as a graph. In Fig. 1 we report a selection of the graph-db data model
according to the Neo4j property graph structure. The model is basically
composed of two main node labels, occupations and skills. The former are the ISCO
occupation codes whilst the latter are the union of both ESCO skills and the
skills recognised as novel in the skill extraction phase, as described above. Then,
two distinct directed relationships are allowed between skills and occupations to
model that a skill s belongs to a given occupation o. The :BELONG relation
would represent an occupation o requiring an ESCO skill s with a relevance of w
in the ESCO taxonomy. This relationship measures the importance of the skill
for a given occupation according to a set of labour market experts. Di erently,
the :BELONG DATA relation models that w job vacancies have asked for skill
s in the Web job vacancy text.</p>
      <p>Such a knowledge graph allowed us to perform a several path-traversal
analyses, such as Skill2Job, to identify the most promising occupations that one could
6 Though the latter is not a proper distance metric, it has been selected as we do not ask string
metrics to satisfy the triangle inequality
be interested into given a set of skills, the Gap Analysis to identify the most
important skills that one should acquire given a set of skills that a candidate
already holds. Due to the space restrictions, here we only give the idea of how
the graph-structure can be used to identify occupation groups on the basis of
the skills they have in common, distinguishing between groups according to the
ESCO and real data. To this end, we employed a local-clustering-coe cient
metric to identify all the occupations that share at least k% skills in common (i.e.,
having a clustering coe cient equals to 1). We employed a weigthed Jaccard
metric to compute the similarity between occupations on the basis of skills in
common. Fig. 2 shows the ESCO skills category asked for a group of
occupations related to mathematicians and statisticians. The outer circle refers to
ESCO skills whose relevance has been computed using the :BELONGS relation,
whilst the inner circle refers to ESCO skills with a relevance computed using the
:BELONG DATA relation (i.e., Web job vacancies). As one might note,
computing skills account for 6% according to the ESCO experts, whilst this value grows
up to 66% in the real data that mainly speci es skills such as SQL, Relational
Databases, Python and Data Warehouse. Conversely, the Business &amp;
Administration sector seems to be overstimated by ESCO taxonomy, that indicates up
to 56 B&amp;A related skills whilst only few on them are actually repeatedly asked
by companies. This analysis would allow one to measure the gap between (1)
an LMI system built through an expert-driven approach as in the ESCO case,
where labour market experts indicates a list of skills that are relevant in an
occupation pro le, and (2) a data-driven system, where skills are recognised as
important on the basis of the real labour market expectations.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Some Results and Concluding Remarks</title>
      <p>The WollyBI Experience. WollyBI is a SaaS tool for collecting and classifying
Web job vacancies on the ISCO/ESCO standard taxonomies, and extracting the
most requested skills from job descriptions. It has been designed to provide ve
distinct entry points to the user on the basis of the analysis purposes, that are,
Geographical Area, Skill, Firm, Occupation, and free OLAP queries.
Competition Analysis for Strategic Decision Making. WollyBI supported a
recruitment agency to identify and measure the market share with respect to
its competitors, that included the most relevant recruitment Agencies in Italy,
namely: GiGroup, Adecco, ManPower, RandStad, ObiettivoLavoro, and Umana.
In Fig. 4 we report the market share distribution over the top-10 ISCO
occupations, by analysing the Italian Web Job Vacancies since February 2013 to
April 2015. Clearly, due to undisclosure agreements, the agency labels reported
in Fig. 4 have been anonymized. We analysed about 850K Web job vacancies
posted by these agencies. This competition analysis has been validated as helpful
to the customer (Agency B) as it allowed to (i) measure the position in the
market and the gap with respect to their competitors; (ii) to drive the identi cation
of strategic decisions to improve its market share and, in turn, to identify the
corresponding strategies that allow achieving the desired goals. Just to give a
few examples, our analysis revealed that Agency B is leader in recruiting "shop
sales assistants" whilst its market share ranges between 9% and 15% in the
remaining professions. Thanks to this results, Agency B has been made in state
to design its strategic intervention through fact-based decision-making.
10000
isce 8000
cboaan
jvb 6000
feW 4000
eo
r
ubm 2000
N 0</p>
      <p>FBAECD</p>
      <p>
        The Cedefop Experience. In 2014, the experience of WollyBI put the basis
of a prototype system we realised within a call-for-tender for the Cedefop EU
Agency aimed at collecting Web job vacancies from ve EU countries and
extracting the requested skills from the data. The rationale behind the project is
to turn data extracted from Web job vacancies into knowledge (and thus value)
for policies design and evaluation through fact-based decision making. The
architecture of the system basically relies on WollyBI, that is now running on the
Cedefop data centre since June 2016, gathering and classifying job vacancies
from 5 EU countries, namely: United Kingdom, Ireland, Czech Republic, Italy
and Germany. To date, the system collected 7+ million job vacancies over the
5 EU countries, and it accounts among the research projects that a selection of
Italian universities addressed in the context of big data [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In Fig. 3 we report a
snapshot from the project dashboard that allows to surf the data over the ISCO
taxonomy and the ESCO skills extracted from the data.
      </p>
      <p>Concluding Remarks and Research Directions. In this paper we
described our approach to Web Labour Market Intelligence, focusing on the
realisation of a machine learning model for classifying job vacancies and showing the
bene ts of a graph-based representation of the knowledge base. Our research
goes toward two directions. From an application point of view, we have been
commited by Cedefop for extending the prototype to the whole EU community
to all 28 EU Countries, building the system for the EU Web Labour Market
Monitoring7. From a methodological perspective, reasoning with Web job
vacancies raises some interesting research issues, such as the automatic synthesis
of the labour market knowledge through word embeddings, the identi cation of
AI heuristic-search algorithms for path-traversal over big knowledge-graph, as
well as the design of novel AI techniques for data cleasing in a big data scenario.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Amato</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boselli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cesarini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mercorio</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mezzanzanica</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moscato</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Persia</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Picariello</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Challenge: Processing web texts for classifying job o ers</article-title>
          .
          <source>In: IEEE International Conference on Semantic Computing</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Amato</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Colace</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greco</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moscato</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Picariello</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Semantic processing of multimedia data for e-government applications</article-title>
          .
          <source>J. Vis. Lang. Comput</source>
          .
          <volume>32</volume>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bergamaschi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carlini</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ceci</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furletti</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giannotti</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malerba</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mezzanzanica</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Monreale</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedreschi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , et al.:
          <article-title>Big data research in italy: A perspective</article-title>
          .
          <source>Engineering</source>
          <volume>2</volume>
          (
          <issue>2</issue>
          ),
          <volume>163</volume>
          {
          <fpage>170</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Boselli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cesarini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mercorio</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mezzanzanica</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Inconsistency knowledge discovery for longitudinal data management: A model-based approach</article-title>
          . In:
          <article-title>SouthCHI13 special session on Human-Computer Interaction</article-title>
          &amp; Knowledge Discovery. Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Boselli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mezzanzanica</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cesarini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mercorio</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Planning meets data cleansing</article-title>
          .
          <source>In: The 24th International Conference on Automated Planning and Scheduling (ICAPS</source>
          <year>2014</year>
          ). pp.
          <volume>439</volume>
          {
          <fpage>443</fpage>
          .
          <string-name>
            <surname>AAAI</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <issue>6</issue>
          .
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kayed</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girgis</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shaalan</surname>
            ,
            <given-names>K.F.</given-names>
          </string-name>
          :
          <article-title>A survey of web information extraction systems</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>18</volume>
          (
          <issue>10</issue>
          ),
          <volume>1411</volume>
          {
          <fpage>1428</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Khan</surname>
            ,
            <given-names>F.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bashir</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qamar</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          : Tom:
          <article-title>Twitter opinion mining framework using hybrid classi cation scheme</article-title>
          .
          <source>Decision Support Systems</source>
          <volume>57</volume>
          ,
          <fpage>245</fpage>
          {
          <fpage>257</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Mercorio</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Model checking for universal planning in deterministic and nondeterministic domains</article-title>
          .
          <source>AI</source>
          Communications
          <volume>26</volume>
          (
          <issue>2</issue>
          ),
          <volume>257</volume>
          {
          <fpage>259</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Mezzanzanica</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boselli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cesarini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mercorio</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Data quality through model checking techniques</article-title>
          .
          <source>In: Intelligent Data Analysis (IDA)</source>
          ,
          <source>LNCS</source>
          . pp.
          <volume>270</volume>
          {
          <fpage>281</fpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mezzanzanica</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boselli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cesarini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mercorio</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>A model-based evaluation of data quality activities in KDD</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>51</volume>
          (
          <issue>2</issue>
          ),
          <volume>144</volume>
          {
          <fpage>166</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vaithyanathan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Thumbs up?: sentiment classi cation using machine learning techniques</article-title>
          .
          <source>In: ACL-02 conference on Empirical methods in natural language processing. Association for Computational Linguistics</source>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Sebastiani</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>: Machine learning in automated text categorization</article-title>
          .
          <source>ACM computing surveys (CSUR) 34(1)</source>
          ,
          <volume>1</volume>
          {
          <fpage>47</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Tjong Kim Sang</surname>
            ,
            <given-names>E.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Meulder</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Introduction to the conll-2003 shared task: Language-independent named entity recognition</article-title>
          .
          <source>In: Conference on Natural Language Learning at HLT-NAACL</source>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Javed</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jacob</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McNair</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Skill: A system for skill identi cation and normalization</article-title>
          .
          <source>In: In the Twenty-Seventh AAAI Conference on Innovative Applications of Arti cial Intelligence</source>
          . pp.
          <volume>4012</volume>
          {
          <fpage>4018</fpage>
          .
          <string-name>
            <surname>AAAI</surname>
          </string-name>
          (
          <year>2015</year>
          )
          <article-title>7 \Real-time Labour Market information on Skill Requirements: Setting up the EU system for online vacancy analysis AO/DSL/VKVET-GRUSSO/Real-time LMI 2/009/16</article-title>
          . Contract notice - 2016/S 134-240996 of 14/07/2016 https://goo.gl/5FZS3E
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>