<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Self-Organizing Feature Maps in Correlating Groups of Time Series: Experiments with Indicators Describing Entrepreneurship</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marta Czyz_ewska</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaroslaw Szkola</string-name>
          <email>jszkola@wsiz.rzeszow.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Krzysztof Pancerz</string-name>
          <email>kpancerz@wsiz.rzeszow.pl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Information Technology and Management Sucharskiego Str.</institution>
          <addr-line>2, 35-225 Rzeszow</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the paper, we brie y describe a problem of identi cation of entrepreneurship determinants with respect to economic development of countries. In order to solve this problem, we need to identify correlations between entrepreneurship and macroeconomic indicators. The main attention in the paper is focused on selecting a proper computer tool for solving this problem. As a tool supporting identi cation, SelfOrganizing Feature Maps (SOMs) have been chosen. Some modi cation of the clustering process using SOMs is proposed by us to improve classi cation results and e ciency of the learning process. At the end, we indicate some challenges of further research.</p>
      </abstract>
      <kwd-group>
        <kwd>self-organizing feature maps</kwd>
        <kwd>correlation</kwd>
        <kwd>entrepreneurship</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The phenomenon of entrepreneurship is the subject of various levels of
observations such as the entrepreneur, industry, region or nation with respect to
many aspects re ecting the entrepreneurship level. The worldwide interest in
the entrepreneurship especially innovative entrepreneurship based on advanced
knowledge and technology shows the importance of the phenomenon particularly
for underdeveloped countries, for nations of aging societies, for those with youth
unemployment growing rate, and for the global economy as well. Our research
concerns designing e ective methods for computer support of identi cation of
entrepreneurship determinants as it is the key factor of countries economic
development. Therefore, in our research, we are going to build a specialized computer
aided system based on applying neural networks to determine the cross-countries
di erences referring to propensity for entrepreneurship and the country
framework in order to assess policy gaps and opportunities for future actions. The
multidimensional analysis enables us to form speci c recommendations to a country
government on how to lead a policy toward entrepreneurship development. The
question how to increase the development level by the entrepreneurship
stimulation policy is still open. Building an e ective and boosting entrepreneurship
system is challenge of the century, see e.g. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        The Clustering Procedure using Self Organizing
Feature Maps
The concept of a Self-Organizing Feature Map (SOM) was originally developed
by T. Kohonen [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. SOMs are neural networks composed of a two-dimensional
grid (matrix) of arti cial neurons that attempt to show high-dimensional data
in a low-dimensional structure. Each neuron is equipped with modi able
connections.
      </p>
      <p>In this section, we describe a clustering procedure used in experiments for
nding correlations of groups of multidimensional objects using Self Organizing
Feature Maps. We propose some modi cation to improve classi cation results
and e ciency of the learning process, among others:
{ a modi ed coe cient for adjusting weights,
{ a modi ed way for adjusting weights of neighboring neurons (the modi
cation coe cient is not constant, but it decreases along with the distance from
the pattern neuron),
{ a modi ed way of the learning process (only neighboring neurons of the
pattern neuron for a given input vector are trained).</p>
      <p>Input for the procedure is a matrix of real numbers. Each row of the
matrix represents a feature vector of one object (corresponding to one country)
subjected to clustering. All rows (feature vectors) have the same dimension. An
input matrix must have at least two rows (feature vectors). A fragment of
exemplary data (for the indicator "New business density") subjected to clustering
is shown in Table 1.</p>
      <p>n = ceil(p2m + 0:5);
where ceil is a function rounding up elements and m is a number of feature
vectors. An initial size of the output matrix is 2 2. This size is increased, during
a learning process, up to n n. A learning process is performed iteratively. In
our research, a number of iterations has been set as 100. More iterations did
not improve a quality of classi cation. Each feature vector is associated with an
individual map. A map represents a matrix of neurons. We have as many maps
as many feature vectors is present. We will treat this set of maps as a multilayer
n. The parameter n is
map labeled with M . An initial value of weights of the map is set on the basis
of the following formula:</p>
      <p>M [x][y][i] =
random(min; max)
10
;
where x and y determine a position in the map and i is the index of the feature
vector, i is integer included in the interval [1; k], where k is a number of all
feature vectors subjected to clustering, random(min; max) is a
pseudorandomnumber generator returning a number from the interval [min; max], where min
and max determine minimal and maximal values of input feature vectors. A
learning process includes the following steps:
1. Calculating a current coe cient for modi cation of weights of the map.
2. Calculating a new desired size of the map.
3. Random selection of the order of feature vectors for training the network.
4. Modi cation of weights of the map after calculation of the error on the basis
of an input feature vector and current weights of the map.</p>
      <p>Steps from 1 to 4 are performed iteratively up to the xed number of iterations
(in our case, 100). After nishing the learning process, the testing process is
run. In this process, assessment of classi cation results is made for each input
feature vector used in the learning process. Assessment consists in calculation
di erences between a given feature vector and weights of all neurons. A neuron
with the smallest di erence is selected and identi ed as the pattern neuron for
this feature vector. On the basis of pattern neurons, a map including all feature
vectors and their assignments to centroids is created.</p>
      <p>The current coe cient for modi cation of weights is calculated as:
ec
= e em ;
where ec is a current epoch (its index changing from 1 to em), em is the maximal
number of epochs.</p>
      <p>The new desired size nd of the map is calculated as:
nd =
2(ec
1)(n
em
ns) ;
where ec is a current epoch, n is the maximal size of the map, ns is the initial
size of the map, em is the maximal number of epochs. If the new desired size of
the map is greater than the current one, the size nc of the map is increased in
the following way:</p>
      <p>n0c = nc + 1
if n0c is less than the maximal size of the map.</p>
      <p>After changing the size of the map, weights need to be modi ed. For
modication of weights in the map, we calculate auxiliary variables:
xtemp = x
ytemp = y
(x
(y
Weights of the map for each input feature vector are changed in the following
way:</p>
      <p>M [x][y][i] = c1M [x 1][y][i] + c2M [x][y
+c3M [x 1][y 1][i] + c4M [x][y][i]
for i = 1; : : : ; k.</p>
      <p>A di erence between a given feature vector input and weights of all neurons
is calculated from the formula:</p>
      <p>vu k
d = tuX(M [x][y][i]
i=1
input[i])2
for each neuron in the position x and y. Next, a neuron with the smallest d is
selected and neighboring neurons ( 1 neurons in both directions, i.e., x and y)
are modi ed according to:</p>
      <p>M 0[x][y][i] = M [x][y][i]+
+ (input[i]M [x][y][i])(1
0:4abs(x
xtop))(1
0:4abs(y
ytop))
for each i = 1; : : : ; k, where abs is the function of an absolute value, xtop = x + 1
and ytop = y + 1.</p>
      <p>Results of the clustering process are presented in the form of minimal
spanning trees with respect to distances between feature vectors and centroids.
3</p>
    </sec>
    <sec id="sec-2">
      <title>Examined Data</title>
      <p>
        Examined data consisted of entrepreneurship and macroeconomic indicators
called World Development Indicators (WDI) published by the World Bank [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
The exemplary indicators published in the report and describing the
entrepreneurship we choose for our research are:
{ New business density,
{ Start-up procedures to register a business,
{ Firms using banks to nance investment,
{ Time to resolve insolvency,
{ Strength of legal rights index,
{ Time to prepare and pay taxes,
{ Firms expected to give gifts in meetings with tax o cials,
{ Researchers in R&amp;D,
{ Patents and trademark application,
{ High-technology exports.
      </p>
      <p>Periodicity of data is annual. They cover developing and high-income economies.
For each selected country, we have a time series consisting of annual values of
a given indicator (cf. Table 1). Therefore, for the clustering process, we have
as many feature vectors as many countries is selected. Each clustered object
represents a time series. Examined indicators come from years of the rst decade
of 21st Century.
4</p>
    </sec>
    <sec id="sec-3">
      <title>Challenges</title>
      <p>The presented paper constitutes the rst attempt to dealing with the problem
of identi cation of correlations between groups of time series obtained from the
clustering process. Therefore, it has rather a rudimentary (introductory)
character. In this section, we give some challenges of further research.</p>
      <p>As the result of clustering process of the set of time series corresponding to
a given indicator, we obtain a minimal spanning tree with respect to distances
between feature vectors and centroids. An exemplary spanning tree is shown in
Figure 1. It presents clusters of countries regarding the indicator called "New
business density" showing new businesses registrations per thousand population
15-64 years old. According to the Figure 1 we can notice several groups of
countries with similar values of the indicator, i.e., one cluster form countries: Vanuatu,
Spain, Romania, Ireland, Latvia, Denmark, Singapore; whereas the second one
covers: Bolivia, Philippines, Algeria, Argentina, Guatemala, Uganda, Jordan,
Zambia, Morocco; and in the third we have: Malaysia, Kazakhstan, Uruguay,
Croatia, Netherlands, France, Finland, Belgium, Portugal and Sweden.</p>
      <p>In order to identify correlations between groups of time series formed in the
clustering process, we need to apply some methods for comparison of topological
structures of minimal spanning trees. In simple case, we can make one-to-one
comparison, i.e., we compare a minimal spanning tree of one of the indicators
with the one of another indicator. Results of comparison process should enable
us to identify entrepreneurship determinants.</p>
      <p>
        Moreover, we plan to test other clustering methods, among others, that
proposed by us (see [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]) based on the ant principle. It is worth noting that we need
to use clustering methods without a predetermined number of clusters. A xed
number of clusters can disturb the process of searching for correlations.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. WDI, http://data.worldbank.org/data-catalog/world-development-indicators</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Amit</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glosten</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muller</surname>
          </string-name>
          , E.:
          <article-title>Challenges to theory development in entrepreneurship research</article-title>
          .
          <source>Journal of Management Studies</source>
          <volume>30</volume>
          (
          <issue>5</issue>
          ),
          <volume>815</volume>
          {
          <fpage>834</fpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Audretsch</surname>
            ,
            <given-names>D</given-names>
          </string-name>
          . (ed.):
          <article-title>Entrepreneurship, innovation and economic growth</article-title>
          .
          <source>Edward Elgar Publishing Limited</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Cios</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pedrycz</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swiniarski</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurgan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Data mining. A knowledge discovery approach</article-title>
          . Springer, New York (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Harper</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          :
          <article-title>Foundations of Entrepreneurship and Economic Development</article-title>
          . Routledge Taylor &amp; Francis Group, London and New York (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kohonen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Self-organized formation of topologically correct feature maps</article-title>
          .
          <source>Biological Cybernetics</source>
          <volume>43</volume>
          (
          <issue>1</issue>
          ),
          <volume>59</volume>
          {
          <fpage>69</fpage>
          (
          <year>1982</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pancerz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewicki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tadeusiewicz</surname>
          </string-name>
          , R.:
          <article-title>Ant based clustering of time series discrete data - a rough set approach</article-title>
          . In: Panigrahi,
          <string-name>
            <surname>B.K.</surname>
          </string-name>
          , et al. (eds.) Swarm, Evolutionary, and
          <source>Memetic Computing, Lecture Notes in Computer Science</source>
          , vol.
          <volume>7076</volume>
          , pp.
          <volume>645</volume>
          {
          <fpage>653</fpage>
          . Springer-Verlag, Berlin Heidelberg (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>