<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Synthesizing a Knowledge Graph of Data Scientist Job O ers with MINTE+</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mikhail Galkin</string-name>
          <email>galking@cs.uni-bonn.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Collarana</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mayesha Tasnim</string-name>
          <email>fmayesha.tasnimg@iais.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria-Esther Vidal</string-name>
          <email>fmaria.vidalg@tib.eu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer Institute for Intelligent Analysis and Information Systems</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ITMO University</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>L3S Institute at University of Hannover</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>TIB Leibniz Information Centre for Science and Technology</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Bonn</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Data Scientist is one of the most sought-after jobs of this decade. In order to analyze the job market in this domain, interested institutions have to integrate numerous job advertising coming from heterogeneous Web sources e.g., job portals, company websites, professional community platforms such as StackOver ow, GitHub, etc. In this demo, we show the application of the RDF Molecule-Based Integration Framework MINTE+ in the domain-speci c application of job market analysis. The use of RDF molecules for knowledge representation is a core element of the framework gives MINTE+ enough exibility to integrate job advertising from di erent web resources and countries. Attendees will observe how exploration and analysis of the data science job market in Europe can be facilitated by synthesizing at query time a consolidated knowledge graph of job advertising. The demo is available at: https://github.com/ RDF-Molecules/MINTE/blob/master/README.md#live-demo</p>
      </abstract>
      <kwd-group>
        <kwd>Data Integration RDF Knowledge Graphs RDF Molecules</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>According to the latest research, e.g., by PwC1 and LinkedIn2, the demand
for data science professionals is still growing pushing data science and machine
learning to the rst places of various ratings of top emerging and sought-after
jobs. Attempting to cover a broader audience of possible candidates
employers disseminate jobs o ers and hiring news at numerous Web sources including
corporate websites, public job portals, community forums, social networks, and
many more. Those Web sources exhibit a high degree of heterogeneity as there is
no one agreed format of publishing job adds and vacancies. Therefore, in order
1 https://www.pwc.com/us/en/library/data-science-and-analytics.html
2 https://economicgraph.linkedin.com/research/LinkedIns-2017-US-Emerging-Jobs-Report</p>
      <p>User Interface
M1 M4 M7
M2 M5 M8
M3 M6 Mn</p>
    </sec>
    <sec id="sec-2">
      <title>1. RDF Molecules Creation</title>
      <p>MINTE+ Framework</p>
    </sec>
    <sec id="sec-3">
      <title>2. Semantically Equivalent</title>
    </sec>
    <sec id="sec-4">
      <title>Molecules Identification</title>
      <p>PaDratittaiosenter ΦMMM..(G123) ΦMMM..(D123) 1-C1MaPWalectruecflheiagcithntotgred
Mn Mn</p>
      <sec id="sec-4-1">
        <title>Bipartite Graph</title>
        <sec id="sec-4-1-1">
          <title>Wrapper</title>
          <p>API1</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Mediator</title>
        </sec>
        <sec id="sec-4-1-3">
          <title>Wrapper</title>
          <p>API2</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>Wrapper</title>
          <p>API3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3. Integrated</title>
    </sec>
    <sec id="sec-6">
      <title>Molecules</title>
      <p>(MG1,MD1)
(MG2,MD2) RDF Molecule
(MG3,MD3) Integrator
(MGn,MDn)</p>
      <sec id="sec-6-1">
        <title>Equivalent RDF</title>
      </sec>
      <sec id="sec-6-2">
        <title>Molecules</title>
        <sec id="sec-6-2-1">
          <title>Wrapper</title>
          <p>
            API4
to provide a holistic view on the job market and enable market analysis job
descriptions have to be integrated using universal knowledge representation
mechanisms. A exible semantic data model provided by RDF and ontologies solves the
knowledge representation task. Interoperability is tackled by a manifold of data
integration frameworks [
            <xref ref-type="bibr" rid="ref1 ref2 ref5">1, 2, 5</xref>
            ]. In this demo, we demonstrate MINTE+, an RDF
Molecule-Based Integration Framework able to perform semantic data
integration techniques in order to synthesize a knowledge graph of job adds collected
from heterogeneous Web sources. Main features and applications of MINTE+
are reported in Collarana et al. [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ], while in this paper, the application of the
MINTE+ data integration techniques are illustrated in a Job Market Analysis
application. Attendees of this demo will be able to examine di erent MINTE+
components, i.e., source description, integration process con guration by
tweaking semantic similarity functions, and a uni ed knowledge graph building.
2
          </p>
          <p>Architecture
MINTE+ generates RDF molecules, i.e., a set of RDF triples that share the same
subject. A MINTE+ knowledge graph consists of two components, i.e., ontologies
and schema de nitions (TBox), and RDF molecules that contain knowledge
annotated with those ontologies (ABox). The integration task, therefore, is to
achieve an RDF molecule representation for data gathered from heterogeneous
data sources; an ontology and con gurable parameters are received as input. Fig.
1 shows the MINTE+ architecture and these three integration steps.</p>
          <p>At the RDF Molecules Creation step, wrappers are used to collect data
from data sources; a mediator utilizes an ontology O to create RDF molecules,
Senior Data ScientistDigital Services
Company:BMWGroup
Description:BMWGroupJEDEIDEEISTNURSO GUTWIEDER, DERSIE INSZIEL BRINGT.</p>
          <p>TeilenSie mitunsIhreLeidenschaftfüreine moderne IT.KomplexeSysteme brauchen. .</p>
          <p>Skil s:Python, Tensorflow
Date:2016-05-07
Wissenschaftliche/r Mitarbeiter/in amZentrumfür Lern- undWissensmanagement
Company:ZentrumfürLernundWissensmanagementderRTWH Aachen
Description: -undkulturwissenschaftlichenFragestelungen. Hierbei kommeninsbesondere
AlgorithmenausdemBereichdesDataMiningzum Einsatz. Anwendungs. .</p>
          <p>Skil s:DataMining, Algorithms
Date:2016-05-02
IT-Berater/CloudComputing
Company:enteroAG
Description:IT-Berater/CloudComputing(m/w) enteroAGEschbornDurchführung
anspruchsvolerEntwicklungsprojekte mitWebtechnologienundJava. .</p>
          <p>Skil s:Java, WebTechnology
Date:2016-05-02
Big DataEngineer (m/w)
Company:TelefonicaDeutschland
Description:. . jobs, itberufe, Telekommunikation, Specialist, Fachkraft, BigDataEngineer
(Smki/lws:)B.ig.Data,Spark See moreresults. .</p>
          <p>(a) Exploration</p>
          <p>Senior Data Scientist Digital Services
@sdo:jobTitle &gt; Senior Data Scientist Digital Services
@sdo:description &gt; BMW Group JEDE IDEE IST NUR
SO GUT WIE DER, DER SIE INS ZIEL BRINGT. Teilen
Sie mit uns Ihre Leidenschaft für eine moderne IT.</p>
          <p>
            Komplexe Systeme brauchen...
@sdo:datePosted &gt; "2016-05-07"^^xsd#date
@sdo:hiringOrganization &gt; "BMW Group”
@saro:source &gt; adzuna, jooble
@saro:requiredSkill &gt; python, tensorflow
Integration log
Threshold &gt; 0.6
Ontology &gt; SARO
Similarity function &gt; GADES
Fusion Policy &gt; Union
(b) RDF Molecule
e.g., SARO [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] for job adds. During the RDF Molecules Integration step,
RDF molecules are partitioned in a bipartite graph using a semantic
similarity function Simf and a threshold . All similarity scores less than are
discarded. The most similar molecules in a bipartite graph are identi ed via the
1-1 Weighted Perfect Matching Calculator. Finally, the RDF Molecule
Integrator component merges identi ed similar RDF molecules following the
rules speci ed in a fusion policy . Details of the architecture can be found at [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ].
          </p>
          <p>
            MINTE+ Con guration: In this demonstration, Adzuna, Indeed, Jooble,
and XING are the sources. Wrappers implemented in Scala create RDF molecules
using the Web APIs provided by each Web source. The SARO ontology [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ] is
used for semantically describing the job ads; to decide relatedness between RDF
molecules and the semantic similarity measure GADES [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] is used; for the sake
of simplicity, a union fusion policy is followed to merge input RDF molecules
into an RDF molecule that preserves all the properties of the original molecules.
The demo and all the components of MINTE+ are publicly available3.
3
          </p>
          <p>Demonstration of Use Cases
In this demo, MINTE+ integrates publicly available data about job ads. MINTE+
provides neither a monitoring service nor persistent storage, thus avoiding data
protection risks by design. MINTE+ serves as a back-end of a faceted browsing
user interface as illustrated in Fig. 2 which visual premises will be employed
during the demo. We will present the following use cases:</p>
          <p>Building RDF molecules from Web sources. Given a keyword query,
e.g., a query for some job ad as 'data scientist', we will demonstrate how wrappers
3 https://github.com/RDF-Molecules/MINTE
interact with source APIs, and how the RDF molecules are built by the mediator
using the SARO ontology. Attendees will be able to add new predicates to the
formal job ad description and link them to particular attributes of job postings
available at original Web data sources. They will also observe how RDF molecules
enable the description of data gathered from heterogeneous sources.</p>
          <p>E ects of changing integration parameters on a knowledge graph.
The attendees will be able to adjust core integration parameters, e.g., a threshold
of a semantic similarity function, number of attached sources, and fusion policy
rules, in order to observe how a knowledge graph evolves in the ad-hoc fashion.
Fig. 2b illustrates an example of an integrated RDF molecule of the same job
ad published on two websites, i.e., Azduna and Jooble, obtained after applying
GADES similarity function with 0.6 threshold and the union fusion policy.</p>
          <p>Faceted knowledge graph browser. A synthesized knowledge graph of
RDF molecules for unique job ads relevant for a given keyword query as presented
in Fig. 2a. A graphical user interface with a faceted browser allows attendees to
con gure MINTE+ integration parameters, apply lters on gathered predicates
and values, and inspect contents of each RDF molecule including its integration
log, as well as navigate and explore the knowledge graph.
4</p>
          <p>Conclusions
The application to job market analysis is just one out of many possible
applications of MINTE+. This demo of MINTE+ emphasizes the exibility and
advantages of the RDF molecule-based semantic integration approach. Attendees
will explore the use-cases and understand the semantic integration mechanisms
employed by the framework. More importantly, evidence of the relevance of
creating meaningful knowledge graphs from heterogeneous sources will be provided.
Acknowledgements: Work supported by the European Commission (project
SlideWiki, grant no. 688095) and the German Ministry of Education and
Research (BMBF) in the context of the project InDaSpacePlus (grant no. 01IS17031).</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D.</given-names>
            <surname>Collarana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Galkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Scerri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.-E.</given-names>
            <surname>Vidal</surname>
          </string-name>
          .
          <article-title>Synthesizing knowledge graphs from web sources with the MINTE+ framework</article-title>
          .
          <source>In Accepted for publication at ISWC</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>R.</given-names>
            <surname>Isele</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Active learning of expressive linkage rules using genetic programming</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>23</volume>
          :2{
          <fpage>15</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>I. T.</given-names>
            <surname>Ribon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vidal</surname>
          </string-name>
          , B. Kampgen, and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sure-Vetter</surname>
          </string-name>
          .
          <article-title>GADES: A graph-based semantic similarity measure</article-title>
          .
          <source>In Proceedings of SEMANTICS 2016</source>
          , pages
          <fpage>101</fpage>
          {
          <fpage>104</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Sibarani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Scerri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Morales</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Collarana</surname>
          </string-name>
          .
          <article-title>Ontology-guided job market demand analysis: A cross-sectional study for the data science eld</article-title>
          .
          <source>In Proceedings of SEMANTICS 2017</source>
          , pages
          <fpage>25</fpage>
          {
          <fpage>32</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M.</given-names>
            <surname>Taheriyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Szekely</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Ambite. Rapidly Integrating</surname>
          </string-name>
          <article-title>Services into the Linked Data Cloud</article-title>
          .
          <source>In Proceedings of the 11th International Semantic Web Conference (ISWC</source>
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>