<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extracting Functional Job Roles From Professional Social Networking Sites Pro les</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>National Research University Higher School of Economics St. Petersburg</institution>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Despite the employment crisis on the Russian job market, demand for the IT
specialists is not stagnating. Growth of exible specialisation on the market
leaves the trace in skill descriptions[8, p. 119]. Using modern Network Science
and Machine Learning methods, we analysed pro le data from business social
networking site MoiKrug.ru and were able to extract skill map and patterns,
characterising functional job roles. This paper is a part of the project is aimed
at comparing signals on two sides of the Russian IT job market: in requirements
extracted from job advertisements and in skills extracted from pro les of
potential employees. At this stage we make an attempt to understand how the supply
is represented, which functional job roles exist, and how they are connected with
each other.</p>
      <p>
        Todd and McKeen[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] performed the analysis of roles and their dynamics
during 1970 - 1990 in the Information Systems area, uncovering three main roles:
computer programmers, systems analysts, information systems managers. The
research was based on the analysis of job advertisements from newspapers. They
showed a growing role of communication and business skills during that period,
compared to knowledge of several programming languages and other technical
skills. As for systems analysts, they needed to grow in both directions, although
the requirement of technical skills had increased dramatically in the mid-80s.
      </p>
      <p>
        Later Byrd and Turner[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] divided management skills into technology
management skills, business functional skills, interpersonal skills, while classi cation
of technical skills were stayed unchanged. Noll and Wilkins[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] chose for analysis
of future demand for skills the following occupations: programmers, analysts,
and end-user support. "Soft skills" continued to play a signi cant role, while
in technical skills there were some changes toward the web-based languages. In
process of time, more and more attention was paid to technical skills. Litecky
et al.[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], analyzing the job advertisements with the help of statistical tools and
clustering, identi ed 20 professional categories and their respective skill sets.
Assessing the similarity of skills, the researchers combined more general
occupations: web developers, software developers, database developers, managers (the
largest area), and analysts.
      </p>
      <p>
        Changes in the demand for skills contribute to the emergence of new
professions. Debortoli [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] studies sharp rise of Big Data jobs compared to more
traditional "business intelligence" using Latent Semantic Analysis on job
advertisements devoted to these areas. They found some similarities and di erences in
application areas (about 15), and in required skills. The methods and concepts,
which are speci c to BI were: database administration, software engineering,
BI architecture, whereas for Big Data quantitative analysis, machine learning,
database administration, software engineering, software testing and data
warehousing were more salient.
      </p>
      <p>
        Another study that has the similar goals and objectives, was the research
of Wowczko I. A. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Based on the selected frequency terms in job titles, were
identi ed professional subsets: Administrator (keyword: Administrator),
Analyst (keyword: Analyst), Support ( keywords: Engineer), Lead (keywords: Lead,
Manager), Test,Tester, Quality, QA). Using these general categories, 4755 jobs
were classi ed. During the analysis of their description, was constructed matrix
terms based on ngramms, which to some extent, are similar to skills, although
they are not so clear in comparison with the previous study. In general, the
categories included both technical and managerial disciplines, similarly to our
work.
      </p>
      <p>While these studies show emergence of new specialisations, demonstrating
development of IT area, there is a lack of up-to-date comprehensive skill map of
Russian IT job market, and the proposed paper is a step in this direction.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data and Methods</title>
      <p>
        Using rvest package, we downloaded all available at that moment (11.2015) user
pro les from business social networking site MoiKrug.ru. In total there were
11000 pro les, containing more than 1000 unique skill tags. After
preprocessing was done: removing punctuation, making DocumentTermMatrix, correlation
matrix, we extracted the hierarchy of tags[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], using the co-occurrence of the tags
in pro les, and their network characteristics, built a hierarchical skill map.
      </p>
      <p>
        Hierarchical clusterization was made, based on this distance matrix, to
analyze skill areas underrepresented in the dataset. To analyze the quality of
clusterization, we used silhouette[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] plot (Fig. 1), which shows the distribution of
observations in a cluster and their t. Silhouette width calculates how close the
object is to other objects within the cluster in comparison with objects from the
other clusters, the higher it is, the better is alignment of the elements in the
cluster[0:1].
      </p>
      <p>In addition, using an association rule learning algorithm Apriori agrawal:imieliski
and transactions between users and skills, we extracted frequent combination of
skills, characterising job pro les for the largest clusters.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <p>Analysis of the CVs tag hierarchy allowed us to identify two large skill clusters,
containing areas of general purpose Web development, High-load systems, and
Web- and UI- and graphics design, Project management, Internet marketing.</p>
      <p>Clusterization based on the Jaccard index allowed us to extract 9
professional elds (Table 1), which we described as general web development, design,
backend, mobile application development, infrastructure, frontend, systems
administration, testing, administrative cluster.</p>
      <p>Further comparison of the clusters, using the dissimilarity metric, made it
possible to identify two large areas that are similar to the results of tag
hierarchies, but in addition allowed to analyze smaller skill clusters.(Fig. 2).</p>
      <p>Administrative cluster includes: marketing, analytics, sales, content and some
part of management. All of these areas are quite close to each other and less
presented in comparison with other sectors. Association rule analysis showed
that the most common combinations of skills in administrative cluster are: SMM,
Sales, Internet Marketing, Human Resource management, Project management.
Therefore, roles here are quite mixed. The most common skills in the cluster
of designers were UX-design, Adobe product family, Web design and Design of
mobile applications. Regarding the development of mobile applications, here we
have three main programming languages: Java, Objective-C and C++ and some
links to them from skills like: development for Android/ for IOS, XML, Qt, SVN.
In the sphere of backend the most common skills: Python and PHP. With Python
we usually can nd: Django, Linux, PostgreSQL, while PHP is linked with Redis,
Laravel, MongoDB, Git, Symfony 2, Zend framework, Yii framework, Node.js.
MySQL is connected with both of them. In the area of software development,
we found trivial associations between HTML, CSS, Git, Javascript and JQuery
being prevalent. A more detailed study of the links between the clusters revealed
that system administration, software development, and machine learning are
closely linked because of multipurpose programming language Python.</p>
      <p>Unfortunately, algorithm did not reveal association rules for testing, frontend
and system administration because of the lack of data, CVs in these sectors.</p>
      <p>AWnogrudlparre.sjss,, JJQavuaesrcyr,ipHt,TWMeLb, dCeSvSe,loNpomdeen.jts, Git 25</p>
      <p>AUIdodbeesigIlnlu,sAtrdaotboer,IUndXesdigesni,gGn,rWapehbicddeesisgignn 15</p>
      <p>PMyotnhgoonD,BJa,vMa,ySSQQLL,, CY#ii ,frRaumbeyw,oXrMk, LPHP 68
Swift, Development for iOS,
ObjectiveUnity3d, Development for Android, Jira, Shell 15
SCV,NC,+U+n,iDx,eWlphCi,FL,iWnupxf,, MSoifctrwoasoreftDSeQvLelospermveenrt 19</p>
      <p>AGdruanptt,ivJeadlaey,oGutu,lpCrboosws-ebrr,oLwessesr, lSatyyoluust, Sass 11
System administration, Network Administration 9
Linux Administration, Project Management</p>
      <p>SFoufntcwtaiorneaTl etsetsitnign,gT,MestainnugaWlteebstsiintegs 12</p>
      <p>SSamlems,, IPnrtoedrnuectt mMaarnkaegtienmgent 24
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Future work</title>
      <p>This paper presents the results of exploratory analysis of the Russian IT market,
based on the data from the business social network MoiKrug.ru. Skills clusters,
detected by the methods of social network analysis, revealed two large groups of
functional roles. Hierarchical clustering and association rules allowed us to form
nine clusters, which are closer to the professional elds. In addition there is an
idea of connectedness (common skills) and separateness of areas.</p>
      <p>Although current results dont allow us to make a direct comparison with
the results of the previous studies of the IT market due the sampling bias, we
underline some contemporary trends, in particular { the mixing of roles and
an emergence of a large cluster of design jobs, interlinked with other IT areas,
compared with previous research.</p>
      <p>While this sample is not representative to the whole Russian IT-industry with
a bias towards web-development and IT startup roles, and administrative sector
jobs being underrepresented, we still consider the results interesting as they allow
to discover exible data-grounded job roles and skill patterns. Our current task is
to improve our skill matching approach to allow comparisons taking into account
skills on di erent generalisation hierarchy levels and compare these results with
the structure, based on the job advertisements skills.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>We would like to express our gratitude to Ekaterina Mekhnetsova, Stanislav
Pozdniakov, Daria Kharkina, Vadim Voskresenskii, Paul Okopny, and Viktor
Karepin.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Byrd</surname>
          </string-name>
          and
          <string-name>
            <surname>D. E. Turner.</surname>
          </string-name>
          <article-title>An exploratory analysis of the value of the skills of IT personnel: Their relationship to IS infrastructure and competitive advantage</article-title>
          .
          <source>Decision Sciences</source>
          ,
          <volume>32</volume>
          (
          <issue>1</issue>
          ):
          <fpage>21</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Debortoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Mller</surname>
          </string-name>
          , and J. vom Brocke.
          <source>Comparing Business Intelligence and Big Data Skills. Business &amp; Information Systems Engineering</source>
          ,
          <volume>6</volume>
          (
          <issue>5</issue>
          ):
          <volume>289</volume>
          {
          <fpage>300</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Litecky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Aken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H. J.</given-names>
            <surname>Nelson</surname>
          </string-name>
          .
          <article-title>Mining for computing jobs</article-title>
          .
          <source>Software</source>
          , IEEE,
          <volume>27</volume>
          (
          <issue>1</issue>
          ):
          <volume>78</volume>
          {
          <fpage>85</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>W. M. Noll C. L.</surname>
          </string-name>
          <article-title>Critical skills of IS professionals: A model for curriculum development</article-title>
          .
          <source>Journal of information technology education</source>
          ,
          <volume>1</volume>
          (
          <issue>3</issue>
          ):
          <volume>143</volume>
          {
          <fpage>154</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Rousseeuw</surname>
          </string-name>
          .
          <article-title>Silhouettes: A graphical aid to the interpretation and validation of cluster analysis</article-title>
          .
          <source>Journal of Computational and Applied Mathematics</source>
          ,
          <volume>20</volume>
          :
          <fpage>53</fpage>
          {
          <fpage>65</fpage>
          ,
          <string-name>
            <surname>Nov</surname>
          </string-name>
          .
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>G.</given-names>
            <surname>Tibly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pollner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vicsek</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Palla</surname>
          </string-name>
          .
          <article-title>Extracting tag hierarchies</article-title>
          .
          <source>PloS one</source>
          ,
          <volume>8</volume>
          (
          <issue>12</issue>
          ),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Todd</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>McKeen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Gallupe</surname>
          </string-name>
          .
          <article-title>The evolution of IS job skills: a content analysis of IS job advertisements from 1970 to 1990</article-title>
          . MIS quarterly, pages
          <volume>1</volume>
          {
          <fpage>27</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Watson</surname>
          </string-name>
          . Sociology, Work and Industry. Routledge,
          <article-title>Fourth edition</article-title>
          , London, UK,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>I. A.</given-names>
            <surname>Wowczko</surname>
          </string-name>
          .
          <article-title>Skills and Vacancy Analysis with Data Mining Techniques</article-title>
          . In Informatics, volume
          <volume>2</volume>
          , pages
          <fpage>31</fpage>
          {
          <fpage>49</fpage>
          . Multidisciplinary Digital Publishing Institute,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>