<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>BPM in German Companies - Information Gathering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Institut für Rechnergestützte Ingenieursysteme</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Universiät Stuttgart</string-name>
        </contrib>
      </contrib-group>
      <fpage>33</fpage>
      <lpage>37</lpage>
      <abstract>
        <p>This position paper outlines a method devised to gather information on companies involved in BPM from publicly available data. The method is based on searching for job openings on job search engines which are related to BPM. The assumption is that companies searching for new employees with skills regarding BPM are either involved in BPM or are about to be involved in BPM.</p>
      </abstract>
      <kwd-group>
        <kwd>BPM</kwd>
        <kwd>BPM in Germany</kwd>
        <kwd>Crawler</kwd>
        <kwd>Publicly Available Data</kwd>
        <kwd>Web Scraping</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation</title>
      <p>This work is based on the automated search and indexation of job advertisement
with relation to BPM. As the sources of information the homepage of the
Bundesagentur für Arbeit 1 - with currently 829516 listed open positions (as
of 2015-01-27) - and the German homepage of Stepstone 2 - with currently
56398 listed open positions (as of 2015-01-27) - were chosen. The homepage of
the Bundesagentur für Arbeit has been chosen because it is the governmental
institution handling unemployed persons or persons currently searching for a
job. Furthermore this site integrates other job search engines within its own
search engine and makes those data sources available through a single interface.
Stepstone has been chosen as a data source to counterbalance the governmental
page of the JobBoerse as a privately owned entity. Stepstone is by its own accord
Germanys job search engine number 1 (“Deutschlands Jobbörse Nr.1”). Other
job search engines e.g. the German homepage of Monster3 have not been used
because they either did give no account on the number of jobs they are listing or
are available through the search engine of the JobBoerse - e.g. XING4 - or would
only give replicates of jobs found on the two search engines used.
2.1</p>
      <sec id="sec-1-1">
        <title>Research questions</title>
        <p>
          We have identified the following set of questions that we want to answer.
1. Questions related to enterprises: (a) Is this method valid to acquire data on
enterprises employing BPM? (b) Which enterprises do use BPM? (c) What
kind of companies are these? (d) Where are they geographically located?
(e) What is the size of these companies? (f) In which domain(s) are these
companies located?
2. Questions related to the qualification of BPM practitioners: (a) What are
qualities applicants are expected to have? (b) Is there an overlap between
BPM and other fields of work like cloud computing? (c) What positions are
BPM practitioners employed in? (d) What is the level of experience required?
(e) Is the required skillset changing? (f) Do enterprises look for the same
qualities as described in scientific literature[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]?
3. General questions: (a) What is the duration of each job advertisement? This
could be indicative of abundance of of qualified employees? (b) Are the job
openings reccurent (for the seemingly same position)? (c) Is there a general
trend of enterprises employing BPM?
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>2.2 Implementation</title>
        <p>Starting with the analysis of the datasources, we have devised a crawler able
to search those datasources for a given keyword using the search functionality</p>
        <sec id="sec-1-2-1">
          <title>1 https://jobboerse.arbeitsagentur.de 2 http://www.stepstone.de 3 http://www.monster.de 4 http://www.xing.de</title>
          <p>provided by said sites. The data crawler consists of scripts which split the work
into the following parts
1. Form the search query
2. Filter the results accordingly to the searched for information
3. Gather further information, e.g. URL of employer
4. Save the results in a persistent way
The crawler uses a variety of Linux software in order to fulfill the specific tasks
(e.g. wget5 and curl6) and to filter the results by using streamline processing
e.g. awk7 and sed8. The respective HTML pages are stored for documentation
purposes twofold. They are stored as HTML lfies and audit-proof as PDF. The
information extracted are stored in corresponding XML files and additionally in
a SQLite9 database.
2.3</p>
        </sec>
      </sec>
      <sec id="sec-1-3">
        <title>Data Acquisition</title>
        <p>The actual keywords/phrases are being sent to the respective data sources/search
engines daily. The crawler handles the communication with the search engines
and stores the data persistently. Data collection has started 2014-09-18 and is
ongoing. Currently there is 131 days worth of data available. There are 9 search
queries with 9 keywords/phrases going to both search engines. The actual search
phrases in use are: 1. “yawl” 2. “epk” 3. “epc” 4. “business process management”
5. “prozess modell” 6. “prozessmanager” 7. “sysml” 8. “bpmn” 9. “bpm”
2.4</p>
      </sec>
      <sec id="sec-1-4">
        <title>Preliminary Data Analysis</title>
        <p>The current data set consists of 767314 HTML pages out of which 104791 (13%)
are originating from JobBoerse, 289065 (37%) from Stepstone and another 204737
(26%) pages from third parties accessed through the JobBoerse search engine.
The remaining 24% of the pages is broken or otherwise impaired. These HTML
pages have not been filtered for duplicates as of yet but there are certainly double
entries that will be filtered out. Searching for unique job titles yields 7231 results.
On average there seems to be 55 job openings per day spread over 2 search
engines and 9 diferent keywords/phrases. Taking those diferent keywords and
the diferent search engines into account it results in 3.06 job openings per day
per keyword gathered.
2.5</p>
      </sec>
      <sec id="sec-1-5">
        <title>Challenges</title>
        <p>After the preliminary analysis the following challenges have arisen:</p>
        <sec id="sec-1-5-1">
          <title>5 https://www.gnu.org/s/wget</title>
          <p>6 http://curl.haxx.se
7 http://www.gnu.org/s/gawk
8 https://www.gnu.org/software/sed
9 http://www.sqlite.org
– Job openings by consultancy companies. Job advertisements by these
companies can not necessarily be correlated to a specific enterprise which does the
actual BPM work. Employees might be loaned to other companies or work
as consultants for other companies. These job advertisements are identifiable
by the company name and its classification as a consultancy company.
– Variations of the labour market. There might be other factors e.g. crises,
wars or weather influencing the overall labour market. Taking other search
phrases from other areas of information technology related areas into the
data collection can reduce the efect of these variations as they can function
as a baseline. Suggested terms include 1. cloud computing 2. data mining
3. java (programmer) 4. service engineer .
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Conclusion and Outlook</title>
      <p>The main contribution of our work will be an extensive set of data relating to
companies involved in BPM, especially their geographic distribution, company
size, domain and business. Furthermore the contribution will be a dataset of
skills and related areas of business of people practising BPM in the real world.
This data can be extend where necessary and used to support existing research
in BPM using surveys and interviews based on participant selection. This is an
ongoing experiment and we will be able to produce results without the need to
halt data collection. A longer duration of data collection is necessary to track
variations and trends in BPM and the associated job advertisements.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cebeci</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kol</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Analysis for the Implementation of the Business Process Management in Selected Turkish Enterprises</article-title>
          .
          <source>International Journal of Economics and Financial Issues</source>
          <volume>3</volume>
          (
          <issue>2</issue>
          ),
          <fpage>420</fpage>
          -
          <lpage>425</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Herrouz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khentout</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Djoudi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Overview of Web Content Mining Tools</article-title>
          .
          <source>The International Journal Of Engineering And Science (IJES) 2</source>
          ,
          <fpage>106</fpage>
          -
          <lpage>110</lpage>
          (7
          <year>2013</year>
          ), http://adsabs.harvard.edu/abs/2013arXiv1307.1024H
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Imanipour</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Talebi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rezazadeh</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Obstacles in Business process Management implementation and adoption in SMEs (1</article-title>
          <year>2012</year>
          ), http://ssrn.com/abstract= 1990609
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Knuppertz</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schnägelberger</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clauberg</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <source>Umfrage Status Quo Prozessmanagement</source>
          <year>2010</year>
          /
          <year>2011</year>
          . Tech. rep. (
          <year>2011</year>
          ), http://www.bpmo.de/bpmo/export/sites/ default/de/know_how/content/downloads/Status_Quo_Prozessmanagement_
          <year>2011</year>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Marres</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weltevrede</surname>
          </string-name>
          , E.:
          <article-title>Scraping the Social? Issues in live social research</article-title>
          .
          <source>Journal of Cultural Economy</source>
          <volume>6</volume>
          ,
          <fpage>313</fpage>
          -
          <lpage>335</lpage>
          (4
          <year>2013</year>
          ), http://dx.doi.org/10.1080/17530350.
          <year>2013</year>
          .772070
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thome</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vogeler</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : Zukunftsthema Geschäftsprozessmanagement.
          <source>Tech. rep. (</source>
          <year>2011</year>
          ), http://www.pwc.de/de/prozessoptimierung
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harmon</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <source>The State of Business Process Management 2012. Tech. rep. (</source>
          <year>2012</year>
          ), http://www.bptrends.com/bpt/wp-content/surveys/2012-_
          <source>BPT%20SURVEY-3-12-12-CW-PH.pdf</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>L.D.</given-names>
          </string-name>
          :
          <article-title>Enterprise Systems: State-of-the-Art and Future Trends</article-title>
          .
          <source>In: IEEE Transactions on Industrial Informatics</source>
          . vol.
          <volume>7</volume>
          , pp.
          <fpage>630</fpage>
          -
          <lpage>640</lpage>
          . IEEE (09
          <year>2011</year>
          ), http://dx.doi.org/10.1109/TII.
          <year>2011</year>
          .2167156
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>