<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Business Activity Clustering: A Use Case in Curitiba</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yuri S. Bichibichi</string-name>
          <email>yuribichibichi@alunos.utfpr.edu.br</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Na´dia P. Kozievitch</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ricardo da S. Dutra</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artur Ziviani</string-name>
          <email>ziviani@lncc.br</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Petro´polis</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>RJ - Brazil</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>41</fpage>
      <lpage>48</lpage>
      <abstract>
        <p>In the context of smart cities, the information of businesses licenses has the potential to discriminate economics characteristics of the observed urban environment. This work performs an initial analysis on business activity clustering using the k-means algorithm with data from the granting of business licenses (from 1980 to 2016) in the city of Curitiba - Brazil. Nowadays large-scale data analytics enables analyzing socio-economic factors in metropolitan areas with less resources than by traditional surveys, such as censuses and questionnaires. The challenge typically resides, however, in how to explore the available data to achieve relevant results [Silva and Loureiro 2016]. In this context, business licenses, which are granted by the municipality to entities that intend to exert commercial activity, can be used to estimate the development and distribution of commercial agglomerates [Carr et al. 2003]. This type of analysis can be used for the benefit of the community, by influencing public strategies, for example. In addition, the study of the development of a commercial area offers possible indicators that should help identifying potential new commercial areas.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>neighborhood has old mansions of the “Mate Barons”, and its larger range of households
(28.5%) has incomes between 5 and 10 times the Brazilian minimum wages.</p>
      <p>This paper presents a business license analysis using the k-means algorithm, with
a use case in Curitiba. Several values for k are tested using the Elbow Method. The
objective was to identify correlations between category, localization, and the creation
date in business licenses using clusterization. The contribution here is search a relation
between theses types of data. The results shows that a strong relation couldn’t be find out,
probably due to the characteristics of the data.</p>
      <p>The remainder of this paper is organized as follows. Section 2 discusses related
work. Section 3 presents the studied dataset as well as data processing details. Section 4
shows the study about the business activity clustering. Finally, Section 5 presents the
conclusion of the paper and future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        Curitiba open data has been explored for urban mobility research projects aiming at
modeling the public transportation system operation [da Silva et al. 2016b, Bona et al. 2016,
Stenneth et al. 2011], getting insights of its quality of service [da Silva et al. 2016a],
suggesting data exploitation for new services [Diniz Junior 2017] or
technologies [Sebastiani et al. 2016], and environmental impacts [Dreier and Silveira 2016,
Dreier et al. 2015]. [Gay et al. 2016] presented a study involving the data obtained from
the GeoSampa portal of the city of Sa˜o Paulo-SP. Using GIS, the researchers identified
the most vulnerable places of Sa˜o Paulo, as those with the highest levels of flood and with
the lowest levels of accessibility. Geospatial open data is also used to risk assessment in
Curitiba4 through special applications (vicom saga5). Many of these application systems
are based on global positioning systems (GPS) [Stenneth et al. 2011], using specific data
(such as real time bus locations, spatial rail and spatial bus stop information) and specific
techniques
        <xref ref-type="bibr" rid="ref10">(such as spatial data mining [Mennis and Guo 2009])</xref>
        .
      </p>
      <p>In particular, if we consider business licenses, [Rosa et al. 2016] analyzed the
entropy from the districts Center, Batel, and Tatuquara, concluding that it tends to reduce
when are a large number of business licenses, i.e., the business categories tends to be
dispersed. [Bichibichi et al. 2018] presented an study for the surrounding areas for two
shoppings within the district Batel. The same business license data was used for general
clusterization, using heatmaps in [Vila et al. 2016]. Finally, [Kozievitch et al. 2017] at
the other hand, showed the challenges related to the data.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Dataset and Pre-Processing Steps</title>
      <p>A dataset of historical data on business licenses was provided by the Curitiba City Hall
(Prefeitura Municipal de Curitiba – PMC). This dataset covered granted business licenses
from January 1st, 1980 to December 31st, 2015, in a total of 172,173 records. For this
analysis, initially only the data from the Batel district is used, in a total of 7,268 records.</p>
      <p>Approximately 0.05% of the records presented problems in the address geocoding
process, requiring manual intervention to correct the latitude and longitude information.
4https://www.viconsaga.com.br/grrd – Last accessed on May 24th, 2018.
5https://www.viconsaga.com.br/site/language=us – Last accessed on May 24th, 2018.
The dataset has only the creation dates for the entries and it was not possible to infer
if the business eventually stopped operating. Economic activities, initially cataloged in
2,977 detailed types (such as dancing restaurant, pizza restaurant, etc.) were reassembled
into 73 types of aggregated activities (such as office, restaurant, banking, construction,
etc.), following the recently presented steps in [Kono 2016]. Table 1 shows the attributes
present within the data along with their description.</p>
      <p>The Curitiba Municipality Data was imported into a PostGIS server, where
tablespaces, indexes, and schemes were created. Subsequently, the QGIS6 software was
used to integrate the data with other tools: GoogleMaps7 and OpenStreetMaps.8 The
kmeans algorithm and Elbow Distortion were implemented in python (along with scikit,
pandas and matplotlib libraries).</p>
      <p>NOME EMPRESARIAL</p>
      <p>NUMERO DO ALVARA
ATIVIDADE SECUNDARIA1
ATIVIDADE SECUNDARIA2</p>
    </sec>
    <sec id="sec-4">
      <title>4. Business activity analysis</title>
      <p>From the attributes listed in Table 1, only location, time, and aggregated business license
types were used. The underlying hypothesis is that there are clusters (within a
combination of location, time and type) which present more affinity (for example, given the
location of a restaurant, we might have a drugstore nearby).</p>
      <p>The attributes for location (latitude and longitude) were both initially normalized
to the interval [0, 1] according to the minimum and maximum values for latitude and
longitude registered for Curitiba. The accuracy of the data remains similar, since the
curvature effect of the globe captured by latitude and longitude is negligible within a single
city district. The attribute INICIO ATIVIDADE was also normalized, so zero represents
the oldest date and one represents the newest date. For each value on the attribute that
represents the aggregation of the business activity, a matrix was created such that only the
6http://www.qgis.org/en/site/ – Last visited on May 24th, 2018.
7https://www.google.com.br/maps – Last visited on May 24th, 2018.
8https://www.openstreetmap.org/ – Last visited on May 24th, 2018.
actual activity is marked as one, while the others are marked as zero. Table 2 shows an
example of that procedure. Note that all normalized values are between 0 and 1 and the
columns referring to the business categories are sparse (is this example, Gym, Butcher
shop, and Agency).</p>
      <p>For the clustering analysis the k-means algorithm was used. The choice of the
cluster number k is determined using the Elbow Method, that shows the sum of squared
errors (SSE) for different values of k. Figure 1 shows the Elbow Method for the following
combinations of the clustering parameters: space, time, type (of business license), time
⇥ type; space ⇥ type; space and time; space ⇥ type ⇥ time; and street ⇥ time ⇥ type.
Note that “space ⇥ type ⇥ time”, “space ⇥ type”, and “time ⇥ type” suggest k 3 while
“space ⇥ time” suggests k 5.</p>
      <p>Figure 2 (for Batel) and Figure 3 (for Curitiba) present more details about all the
aggregated business licenses over time: light green presents the most common types and
blue lines represent the cluster others. Note that (i) not all aggregated license types are
present at the Batel District (only 54 out of 73) and that (ii) the office and retail trade types
Cluster Nr.</p>
      <p>0
1
2
3
4
are the most common types for both the city of Curitiba and for the Batel District.</p>
      <p>The data clustering suggested that the attributes space, time and type produce
clusters which are divided only by the type of business license (as shown in Table 3,
which uses k = 5). Figure 4 shows the five clusters for Batel. Note that clusters such as
office and retail trade ignore the time factor. Darker colors indicate a higher amount of
business licenses of that type in a specific date.</p>
      <p>The experiments using several k indicated that space, time and business license
type are disjoint when clustered: in summary, the three dimensions are unrelated. The
most commons types (Office and Retail Trade - representing half of the data), are
distributed equally in time and space (Figures 2 and 3). Since there are different numbers of
records for each type of business license (as shown in Figure 6), the less common business
license types end up not influencing in the larger clusters. This presented a major impact
during the analysis, since the less common business license types are the ones
responsible for characterizing districts. Batel for example, is a district which has the majority
of hospital services in Curitiba. That information can be noticed in the third cluster in
Table 3.</p>
      <p>In other words, the most common business license types do not present correlation,
since they are present over all the time and over all the space. Probably this occurs because
the most common business licenses are spread homogeneously and have more influence
than the smaller clusters. Future studies include the analysis of smaller clusters (such as</p>
      <p>Hospital Services in Batel - Figure 2) so that business licenses characterizing the districts
of the city can be better understood.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper presented a business license analysis using the k-means algorithm, with the
Elbow Method. The objective was to identify the affinity of data, considering the
different types of business, and their distribution along time and space, in a use case of three
decades of data in Curitiba.</p>
      <p>The results for Batel indicated that offices, retail trade, hospital services and
restaurants are the activities which identify the first four clusters, followed by the rest
of business license types.</p>
      <p>The studies presented no correlation among location, time and business license
type, indicating that: (i) these type of data is unrelated, i.e, opening a restaurant does
not indicate a new pharmacy, for example, at the same location and year; and (ii) the
most commons business licenses (Office, Retail Trade and others) are distributed equally
in time and space and the smalls clusters, which show more interesting relations, are
overshadowed by the biggest clusters.</p>
      <p>Within future work, we can mention the integration using other techniques, along
with the analysis of smaller clusters, present only on specific districts.</p>
      <p>Acknowledgments. We would like to thank the Municipality of Curitiba, IPPUC,
CAPES, CNPq, Fapemig, FAPERJ, FAPESP, EUBra-BIGSEA project (EC/MCTIC 3rd
Coordinated Call), and INCT em Cieˆncia de Dados (INCT-CiD).
[Bona et al. 2016] Bona, A. A. D., Fonseca, K. V. O., Rosa, M. O., Luders, R., and
Delgado, M. R. B. S. (2016). Analysis of public bus transportation of a brazilian city
based on the theory of complex networks using the p-space. Mathematical Problems
in Engineering, 2016:1–12.
curitiba, brazil. In Systems Analysis 2015, International Institute for Applied Systems
Analysis (IIASA), Laxenburg, Austria, 11.-13. November 2015. KTH Royal Institute of
Technology.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Bichibichi et al. 2018] Bichibichi,
          <string-name>
            <given-names>Y. S.</given-names>
            ,
            <surname>Kozievitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. P.</given-names>
            , and
            <surname>Carvalho</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. A. M.</surname>
          </string-name>
          (
          <year>2018</year>
          ). Ana´lise de evoluc¸a˜o de emissa˜o de alvara´s. In Anais da XIV Escola Regional de Banco de Dados.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [Carr et al. 2003]
          <string-name>
            <surname>Carr</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Education</surname>
            ,
            <given-names>D. R. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lawson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lawson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Schultz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>Mastering Real Estate Appraisal</article-title>
          . Kaplan Financial Series. Kaplan.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>[da Silva et al. 2016a] da Silva</source>
          ,
          <string-name>
            <given-names>E. L. C.</given-names>
            ,
            <surname>de Oliveira Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Fonseca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. V. O.</given-names>
            ,
            <surname>Luders</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            , and
            <surname>Kozievitch</surname>
          </string-name>
          ,
          <string-name>
            <surname>N. P.</surname>
          </string-name>
          (
          <year>2016a</year>
          ).
          <article-title>Combining k-means method and complex network analysis to evaluate city mobility</article-title>
          .
          <source>In ITSC'2016</source>
          , pages
          <fpage>1666</fpage>
          -
          <lpage>1671</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[da Silva et al. 2016b] da Silva</source>
          ,
          <string-name>
            <given-names>E. L. C.</given-names>
            ,
            <surname>Fonseca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. V. O.</given-names>
            ,
            <surname>de Oliveira Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            , and
            <surname>Munaretto</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          (
          <year>2016b</year>
          ).
          <article-title>Analysis of curitiba's public transport system as a complex network</article-title>
          .
          <source>In ISPE TE</source>
          , pages
          <fpage>267</fpage>
          -
          <lpage>276</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>[Diniz Junior</source>
          <year>2017</year>
          ]
          <string-name>
            <given-names>Diniz</given-names>
            <surname>Junior</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. C.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Servic¸os telema´ticos em uma rede de transporte pu´blico baseados em ve´ıculos conectados e dados abertos</article-title>
          .
          <source>Msc. thesis</source>
          , UTFPR.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Dreier and Silveira</source>
          <year>2016</year>
          ]
          <string-name>
            <surname>Dreier</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Silveira</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Smart City Concepts in Curitiba-innovation for sustainable mobility and energy efficiency: Project NEWSLETTER</article-title>
          ,
          <year>January 2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Dreier et al. 2015]
          <string-name>
            <surname>Dreier</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silveira</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khatiwada</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fonseca</surname>
            ,
            <given-names>K. V. O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nieweglowski</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Schepanski</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Energy use and co2 emissions of city buses in [Gay et al</article-title>
          . 2016]
          <string-name>
            <surname>Gay</surname>
            ,
            <given-names>J. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giannotti</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Tomasiello</surname>
            ,
            <given-names>D. B.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Accessibility and flood risk spatial indicators as measures of vulnerability</article-title>
          .
          <source>XVII GEOINFO</source>
          , (
          <volume>17</volume>
          ):
          <fpage>93</fpage>
          -
          <lpage>104</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Kono 2016]
          <string-name>
            <surname>Kono</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Um modelo de representac¸a˜o computacional baseado em conceitos de crescimento urbano associados a alvara´s e primitivas em banco de dados espacial</article-title>
          .
          <source>Master's thesis</source>
          , Department of Informatics, UTFPR, Brazil.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Kozievitch et al. 2017]
          <string-name>
            <surname>Kozievitch</surname>
            ,
            <given-names>N. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>T. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ziviani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Costa</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Lugo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Three decades of business activity evolution in curitiba: A case study</article-title>
          .
          <source>Annals of Data Science</source>
          ,
          <volume>4</volume>
          (
          <issue>3</issue>
          ):
          <fpage>307</fpage>
          -
          <lpage>327</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[Mennis and Guo</source>
          <year>2009</year>
          ] Mennis,
          <string-name>
            <given-names>J.</given-names>
            and
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Spatial data mining and geographic knowledge discovery - an introduction</article-title>
          .
          <source>Computers, Environment and Urban Systems</source>
          ,
          <volume>33</volume>
          (
          <issue>6</issue>
          ):
          <fpage>403</fpage>
          -
          <lpage>408</lpage>
          .
          <article-title>Spatial Data Mining-Methods and Applications</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [Rosa et al. 2016]
          <string-name>
            <surname>Rosa</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>T. H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozievitch</surname>
            ,
            <given-names>N. P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ziviani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2016</year>
          ). Cieˆncia de dados: Explorando treˆs de´cadas de evoluc¸
          <article-title>a˜o da atividade econoˆmica em curitiba</article-title>
          .
          <source>In Anais da XII Escola</source>
          Regional de Banco de Dados, pages
          <fpage>139</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [Sebastiani et al. 2016] Sebastiani,
          <string-name>
            <given-names>M. T.</given-names>
            , Lu¨ders, R., and
            <surname>Fonseca</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. V. O.</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Evaluating electric bus operation for a real-world brt public transportation using simulation ptimization</article-title>
          .
          <source>ITSC'</source>
          <year>2016</year>
          ,
          <volume>17</volume>
          (
          <issue>10</issue>
          ):
          <fpage>2777</fpage>
          -
          <lpage>2786</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Stenneth et al. 2011]
          <string-name>
            <surname>Stenneth</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolfson</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>P. S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Transportation mode detection using mobile phones and gis information</article-title>
          .
          <source>In GIS '11</source>
          , pages
          <fpage>54</fpage>
          -
          <lpage>63</lpage>
          , New York, NY, USA. ACM.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Vila et al. 2016]
          <string-name>
            <surname>Vila</surname>
            ,
            <given-names>J. J. R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozievitch</surname>
            ,
            <given-names>N. P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gadda</surname>
            ,
            <given-names>T. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fonseca</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosa</surname>
            ,
            <given-names>M. O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gomes-Jr</surname>
            ,
            <given-names>L. C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Akbar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>Urban mobility challenges-an exploratory analysis of public transportation data in curitiba</article-title>
          . Revista de Informa´tica Aplicada,
          <volume>12</volume>
          (
          <issue>1</issue>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>