<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Supporting open dataset publication decisions based on Open Source Software reuse</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alvaro E. Prieto</string-name>
          <email>aeprieto@unex.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adolfo Lozano-Tello</string-name>
          <email>alozano@unex.es</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jose-Norberto Mazón</string-name>
          <email>jnmazon@dlsi.ua.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luis-Daniel Ibáñez</string-name>
          <email>l.d.ibanez@southampton.ac.uk</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad de Alicante, San Vicente del Raspeig</institution>
          ,
          <addr-line>Alicante</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universidad de Extremadura</institution>
          ,
          <addr-line>Cáceres</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universidad de Extremadura</institution>
          ,
          <addr-line>Cáceres</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Southampton</institution>
          ,
          <addr-line>Southampton</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Publishing and maintaining open data is a costly task for public institutions, that becomes even more challenging in the context of Smart Cities, where large amounts of varied data are generated from diferent domains. To optimize resources, they should prioritize the publication and maintenance of datasets most likely to generate social and economic impact. However, there is currently a lack of decision-support tools to help public sector data publishers to evaluate datasets on the light of their particular reuse goals. In this paper, we propose to suggest to data publishers the dataset categories with most potential impact, based on the impact of already published datasets of the same category. To measure impact, we propose a set of indicators based on the amount and quality of Open Source Software projects that use datasets. To aggregate indicators according to specific reuse goals, we provide an Analytic-Hierarchy-Process based tool.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        One of the most important challenges faced by Smart Cities is
creating an ecosystem of public and private actors that reuse open
data in order to produce IT services and products that both (i)
would improve citizens’ quality of life and (ii) would contribute
to economic growth [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ]. However, few open data portals in
cities currently track data usage and consider the impact of data
on deciding which datasets to maintain or what complementary
datasets publish. Cities are not even aware of what kinds of
apps are developed, using what data, and how many there are.
Answering these questions is a significant research issue [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]
that would allow prioritizing which categories of data must be
published and maintained with respect to the applications that
use them (i.e., impact that a category of open data generates).
      </p>
      <p>
        To reverse this situation, publishing datasets as open data
requires a decision support system to select those categories of
datasets that ofer higher potential to generate value [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Such a
system must consider indicators about the impact of the already
published open datasets, as well as the strategy of the Smart
City. E.g., a small town could provide an open data portal with
many high-quality datasets but the portal is rather unknown,
and the technological fabric of the city is composed of small IT
companies. Therefore, the goal of the city could be to extend the
use of the open data portal by prioritizing those datasets that
belong to categories that are likely to generate a large number
of projects -though simpler ones that involve fewer people. On
the other hand, a big city with consolidated open data portals
may prefer opening datasets that could be used in complex and
mature software applications that involve big teams, since it is
more relevant to their specific technological industry context.
      </p>
      <p>
        Unfortunately, to the best of our knowledge, Smart Cities
lack such decision support system, mainly because the process
of calculation of those indicators that would use the system is
not a trivial task. According to Janssen et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] , “there is no
way to predict and calculate the return of investment (ROI) in
advance [. . . ]". The main challenge is that open data has no value
in itself; it only becomes valuable when used”. Therefore, the
main problem is that data owners have limited understanding
on how open data is reused, thus lacking knowledge about the
impact generated by reusing the published open data.
      </p>
      <p>
        More reasonable indicators of the use of open datasets could
help to identify which categories of datasets have more
possibilities of being reused and, in this way, generate some type of
economic impact to people or enterprises. In this sense, good
indicators could come from the reuse of datasets within the open
source community. The Tenth Annual Future of Open Source
Survey [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] reflects the increasing adoption of pen source and
highlights the abundance of organizations participating in the
open source community. Concretely, this survey estimates that
65% of companies currently participate in open source projects.
Open Source Software (hereon OSS) is considered to encourage
the creation of SMEs and jobs, by providing a skills development
environment valued by employers and retaining a greater share
of generated value locally [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Focusing in Europe, a study
estimated that the contribution of OSS to its economy was of 450
billion euro per year [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>Based on these figures, an estimation of the use of the diferent
categories of datasets by the OSS community could be a good
indicator of their potential impact. Therefore, when Smart Cities
make decisions on which data to publish, they could prioritize
publication of data which allows a community of developers
to generate impact and efectively release benefits of open data
through OSS projects.</p>
      <p>
        In this paper, we present an approach based on the estimation
of indicators of the use of open datasets in OSS projects. The
goal of this approach is to provide Smart Cities with a Decision
Support System which provides an ordered list of categories of
datasets most suitable to be published or maintained in their open
data portal. To do so, we have carried out a set of actions aimed
at estimating useful impact indicators related to the datasets of
the same category already published by open data portals of
other cities. Concretely, to calculate our proposed indicators we
needed two kinds of data sources: (i) already published Smart City
datasets (and their metadata) and (ii) OSS projects (together with
information about them) which referenced the gathered datasets;
i.e., we needed to know which open datasets were being used in
which OSS projects. To collect already published open datasets,
we chose Socrata [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] because it is one of the most used open
data repositories, and notably by some of the most important
US cities. We also measured the existence of potential reuses
within a community in order to measure open data impact. To
do this, we used GitHub [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], because it is the largest web-based
distributed revision control and source code repository in the
world, and the source of several empirical studies such as in Yu
et al. [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ].
      </p>
      <p>
        Using the indicators obtained from these sources, we provide
an Analytic Hierarchy Process (hereon AHP)-based [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] tool1
that allows decision makers weigh these indicators, taking into
account the reuse objectives of the city, to ofer an ordered list of
categories of datasets recommended to publish.
      </p>
      <p>This paper is structured as follows: section 2 describes a new
approach to select the most relevant categories of data to be
published in a smart city open data portal. Section 3 presents
toy samples of two diferent stereotypical smart cities using our
approach and, to finish, section 4 summarizes other work related
to the publishing of open data in Smart Cities.
2</p>
    </sec>
    <sec id="sec-2">
      <title>USING REUSE INDICATORS BASED ON</title>
    </sec>
    <sec id="sec-3">
      <title>DATA FROM OSS PROJECTS IN GITHUB</title>
    </sec>
    <sec id="sec-4">
      <title>FOR SELECTING DATASETS TO OPEN</title>
      <p>This section describes the steps that have been carried out to get
an AHP process that allows classifying categories of dataset based
on the preferences of the decision-maker. These preferences are
applied to a set of useful indicators obtained from data about
their reuse in OSS Projects of GitHub repositories. Concretely,
these steps2 are detailed in the following subsections and are
summarized below:
(1) From GitHub repositories, studying the characteristics of
OSS projects that use open datasets. This information was
analyzed to establish a set of reuse indicators.
(2) Gathering datasets from 32 cities of the United States (such
as San Francisco, Chicago or New York) which use Socrata
as an open data repository. With respect to this point, it
should also be noted that, although these cities are from
the same country, United States, they have diferent
cultural, social and economic characteristics that make us
consider that the results obtained from their data are enough
scalable to other Smart Cities located in diferent
countries.
(3) Classifying the datasets according to a set of categories
specifically designed for Smart Cities.
(4) Searching for references to the datasets obtained from</p>
      <p>Socrata in GitHub to calculate the indicators.
(5) With the reuse indicators established in step 1 as
criteria, and the values from step 4, we have created a Google
Spreadsheet [w3] based on AHP that allows decision
makers to prioritize the most relevant categories of datasets
that must be published in a smart city open data portal.
1https://goo.gl/HcUc1e
2A repository containing all the scripts and detailed instructions needed to carry out
a functional application of our approach is available at GitHub https://goo.gl/TDp1xi
2.1</p>
    </sec>
    <sec id="sec-5">
      <title>A proposal of indicators of reuse based on</title>
    </sec>
    <sec id="sec-6">
      <title>GitHub</title>
      <p>
        Smart Cities should follow a strategy for opening data as
described in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. This strategy should prioritize publication of
data which allows a community of developers to generate
impact and efectively release benefits of open data through OSS
projects [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ]. A Smart City could in fact prioritize publication
of open data with more reuse potential depending on the
category to which the data belong to. However, due to “open-data
by default” idiosyncrasy [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], data is usually published without
establishing specific goals and without imposing utilization or
authentication restrictions to the infomediaries and end users. As
a result, collecting the usage information and measuring impact
generated by open datasets may become very complex.
      </p>
      <p>
        To overcome this situation, our approach is based on
considering that the more used an open dataset is by OSS projects,
the more impact is generated. Therefore, we borrowed some
well-known indicators that measure the success of OSS projects
and we have used as starting point to develop our indicators
to measure such success when open data is reused. Then, these
indicators allow Smart Cities to measure which categories of
open data have more reuse potential and decide which data must
be released according to the requirements of each city. The
following indicators from existing research literature on OSS are
considered [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. First of all, we included (i) number of people
who agree to receive information about the project because they
ifnd it interesting (subscribers), and (ii) number of people who
actually work on the OSS project (developers). On the one hand,
subscribers to OSS choose to obtain information on the project
and thus reveal a deeper interest in the OSS project. The
subscriber indicator not only measures interest within the project
but the reputation of the project within the community and the
dissemination of the project through the community. On the
other hand, the number of developers working on a project is
critical to its success, since survival of an OSS project depends on
continued contribution from developers [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. There is another
measure for the success of OSS projects [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] as the (iii) age of
an active project that is positively related to OSS progress
toward completion, as well as the experience of the community of
developers.
      </p>
      <p>Based on these three indicators described in the literature
about success of OSS projects, we developed a set of three
indicators that measure the success of open source projects that
reuse open datasets (they are summarized in Table 1). The aim is
to compare projects that use diferent categories of datasets and
how successful they are. First of all, we define the reputation
among a community of developers of OSS projects that reuse
open data from a category. Some projects that reuse open data
from some specific categories can be perceived by developers
as being highly appealing projects. Smart Cities are interested
in opening data that will be reused in these kinds of projects in
view of creating a community around open data, thus allowing an
open data portal to attract the attention of potential developers.
Therefore, the reputation indicator measures how well-known
projects reusing data from some specific category are (within the
community of developers). Furthermore, the size of the
community involved in projects that use data from a category is
defined in terms of the size of the community of developers that
use open data from a given category. A city needs to adapt the
size of the community to the budget and available infrastructure.
Finally, maturity of projects that use an open data category is
proposed. Maturity means that the community has been working
on the project for some time without the project being
abandoned. A Smart City may want to select the datasets that help in
promoting fewer projects stretching over longer periods of time,
rather than promoting a larger number of short-term projects.</p>
      <p>An additional indicator has been developed in order to assess
the impact of a dataset category, i.e. the likelihood of datasets
from each category of being used. To do so, we defined eficiency
of an open data category, as the probability of datasets of one
category to be referenced by an OSS project. This indicator
determines how relevant a category of datasets is. Smart Cities
will use this indicator to know which categories of open data
are most likely to be reused. Therefore, in a scenario where the
Smart City has the chance of opening a large number of datasets,
the eficiency indicator will become secondary to the publishing
eforts regarding a wide a variety of datasets.</p>
      <p>As aforementioned, these indicators come from well-known
indicators from the OSS community, being thus completely
generalizable to be used in any OSS repository. It is worth noting
that our proposal of indicators is not set in stone, consequently
more indicators could be created and checked to be used by Smart
Cities according to their requirements.
2.2</p>
    </sec>
    <sec id="sec-7">
      <title>Search of smart city datasets on Socrata</title>
      <p>Once the impact measuring indicators have been established
and defined, information should be gathered. This gathering of
information focuses on datasets specifically related to the smart
cities so as to obtain a more accurate assessment of the collected
data.</p>
      <p>
        Socrata is a software company focused “exclusively on
democratizing access to public sector data around the world”. It
provides an Open Data Platform for allowing local, regional or
national governments to release data. Socrata is a partner of the
USA National League of Cities [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] for the development of open
data strategies. Nowadays, the Socrata Open Data Platform is
used by some of the most important US cities such as New York,
Chicago, San Francisco or Los Angeles. In this respect, Socrata
is very useful as a proof-of-concept of our approach, since it is
possible to collect precisely open dataset identifiers and their
metadata. In this sense, every Socrata dataset has its own
endpoint and each is designated by a unique dataset identifier. Every
Socrata open data portal provides a list of its published datasets
Id
In this step, we had to choose the taxonomy of dataset categories
to be analyzed. There is no common agreement on the best way
of classifying Smart City open datasets. However, a 14 high-value
data categories is suggested by the G8 Open Data Charter [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
These categories, together with example datasets for each one,
are shown in Table 2.
      </p>
      <p>
        These categories seem to be a good way to classify Smart
City datasets, however, some of these categories, such as Global
Development and Science and Research, might not be used in the
Smart City context. Thus, specific domains which can generate
data within a Smart City must be taken into account. In this sense,
Id
A
B
C
D
E
F
10.-Facility management
11.-Building services
12.-Housing quality
13.-Entertainment
14.-Hospitality
15.-Pollution control
16.-Public safety
17.-Healthcare
18.-Welfare and social inclusion
19.-Culture
20.-Public spaces management
21.-E-government
22.-E-democracy
23.-Procurement
24.-Transparency
25.-Innovation and
entrepreneurship
26.-Cultural heritage
management
27.-Digital Education
28.-Human capital management
a survey [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] about Smart City initiatives proposes a classification
divided in domains and subdomains show in Table 3
      </p>
      <p>Establishing an exhaustive classification of open data
categories for Smart Cities is beyond the scope of this paper.
However, this work proposes an initial classification of open data
categories for Smart Cities aimed to be as close as possible to
the G8 Open Data Charter but incorporating modifications to
encompass the aforementioned domains and subdomains proper
to Smart Cities. This proposed classification is given in Table4
together with example datasets for each category.</p>
      <p>Once the categories were established we had to classify the
collected datasets according to such categories. Due to its
characteristics, this step requires the participation of experts to
execute it adequately. The research groups that have developed
this approach includes researchers working in related fields such
as open data and knowledge representation. These researchers
were responsible for classifying the datasets following the steps
described below:
(1) Extracting diferent themes from US city datasets. In our
case, 215 diferent themes were extracted.
(2) Mapping every theme to one of the available categories.</p>
      <p>Themes without a clear fit had to be classified as ‘Others’ in
order to be discarded later. When we performed this step,
Id
1
2
3
4
5
6
7
8
9
10
11
12
13
14
211 themes could be mapped to the established categories
and 4 were classified as ‘Others’.
(3) Automatically classifying datasets with a theme according
to the mapping in step 2. In our case, 8299 datasets were
classified according to the established categories, 11 were
categorized as ‘Others’ and 650 were not categorized due
to their lack of theme.
(4) Optionally, trying to categorize datasets that have no
theme manually, using other metadata such as keywords.
This step can be carried out when the number of datasets
without a theme is considered high enough to distort the
value of the indicators. In our case, although the datasets
without a theme represented less than 10
(5) As a result of this process, 8949 datasets were adequately
categorized and 11 were discarded due to their unclear fit.
2.4</p>
    </sec>
    <sec id="sec-8">
      <title>Collecting data from GitHub to calculate indicators</title>
      <p>In order to calculate the above-described indicators on the
success of OSS projects that reuse open data, we decided to collect
data from GitHub. GitHub, as mentioned previously, is a
platform for collaborative development of software based on a Git
repository. It is used by individuals, communities and businesses
alike to develop software projects. GitHub is free to use for public
and open source projects, and it is profusely used in studies on
Software Engineering. Therefore, it ofers useful data about open
source software projects, including information on whether they
are using open data.</p>
      <p>
        GitHub has been used for collecting data and calculating
indicators related to OSS success in several works such as [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ],
where GitHub allows researchers to collect several measures
regarding open source projects, for example, forks, stars, etc.
GitHub has an API that is used to collect all required data from
an open source software project. More specifically, the data can
be acquired from repositories and from users. A repository is
a kind of software project folder that contains all the project
ifles. Valuable data from a repository that can be collected by
using the API, apart from the code itself, are as follows:
repository_id, user_id, stargazers_count, watchers_count, language,
forks_count, subscribers_count, network_count, created_at,
updated_at, pushed_at, total_contributors, total_contributions. GitHub
user data also provide interesting data to be considered, such as
followers_user, following_user, public_repos_user, location_user,
updated_at_user, created_at_user. The indicators used in our
approach are based on these data. We established a process for
identifying which OSS projects were using open datasets from
Socrata US Cities. Our process consists in the following steps
(it was implemented by using the GitHub API within a Pentaho
Data Integration [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] process):
(1) Searching every eight-character code from existing Socrata
datasets belonging to USA cities (obtained as described in
Section 3.3.1) based on code from OSS repositories hosted
on GitHub in order to know which projects are reusing
open data. When we performed this step, 350644
references were found from 2517 repositories to 5874 of the
8949 categorized datasets.
(2) Gathering required data from GitHub on the repositories
that reference open datasets to make an estimation of the
indicators. In our case we found that 2501 of the 2517
repositories had all the needed data.
      </p>
      <p>After this process, we made an estimation of the indicators in
order to be used with AHP. We defined a process consisting in
the following steps:
(1) Discarding repositories that do not have all the required
data to make an estimation of the indicators. When we
performed this step, only 2501 repositories remained.
(2) Discarding all repeated references to a specific dataset
from a specific repository. When we performed this step,
32551 unrepeated references from 2501 repositories
remained.
(3) Making an estimation of the indicators. When we
performed this step, we applied the formulas previously
presented in Table 1.
(4) Normalizing the indicators in order to use the ideal mode
of AHP. When we applied this step to our case, the
indicator of each category was divided by the maximal value
obtained by a category in the indicator. Thus, all the
indicators of each category were normalized to a 0-1 range.
2.5</p>
    </sec>
    <sec id="sec-9">
      <title>Use of AHP to weight indicators</title>
      <p>
        The method of decision-making, which our model is based on,
is named Analytic Hierarchy Process, hereinafter referred to as
AHP [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. It is a powerful and flexible tool for decision-making
in complex multi-criteria problem situations and is useful for
comparing several alternatives when several objectives need to
be borne in mind at the same time.
      </p>
      <p>Following this method, the evaluator can directly assign a
normalized weight to a criterion that will indicate the importance
which that criterion has with regard to the final objective. Firstly,
the AHP method compares the relative importance that each
criterion has in relation to all the others; this assessment enables
the relative weights of the criteria to be calculated, and finally the
method normalizes the weights in order to obtain the measures
for the existing alternatives; for this reason, AHP constitutes one
of the best options to assist multi-criteria decision making. This
method allows people to gather knowledge about a particular
problem, to quantify subjective opinions and to force the
comparison of alternatives in relation to established criteria. The method
consists in the following steps:
(1) Define the problem and the main objective in making the
decision.
(2) If required, build a hierarchy tree in this way: the root node
is the objective of the problem, the intermediate levels are
the criteria, and the lowest level contains the alternatives.
(3) At each level, build a pairwise comparison matrix with the
brothers (sons of the same node). The matrix contains the
weights of pairwise comparisons between brother nodes.
This provides us with a pairwise comparison matrix (see
a simple example in Table 5) for each parent node.
(4) For each comparison matrix, an eigenvector must be
calculated, using the equation: |A − λI | = 0, where A is the
comparison matrix, I is the identity matrix and λ is the
eigenvector. This calculus must be performed for each
level of the tree.
(5) Rate each alternative (leaf nodes) with a previously
calculated fixed value for every criteria. The scales for rating
alternatives should be established and described in a
precise way.
(6) Determine the value of each alternative using a weighted
addition formula, with the weights from the previous steps.
These results ascend up the tree to calculate the final value
of the objective (root). This final value is used to make a
decision about the alternative to choose.</p>
      <p>Using this method, as final stage, we have created a Google
Spreadsheet based on AHP that uses the reuse indicators as
criteria of the process. Concretely, this spreadsheet is composed of
three sheets:
(1) ‘Indicators’. This sheet provides the normalized indicators
that were calculated from GitHub in the previous step.
(2) ‘AHP Criterion Pair Comparison’. This sheet allows
assessing the relative importance between pairs of indicators
using AHP. Thereby, a decision maker could weigh the
importance of the indicators set out in the previous steps,
taking into account the characteristics and objectives of
the city. These weights can be assigned according to the
institution’s strategic reuse objectives. Thus, diferent Smart
Cities may have diferent objectives, strategies and target
audiences when deciding which datasets should have
priority of publication. Each city has its own idiosyncrasy
defining what is most important or of particular interest,
and it is unlikely two cities share the same priorities with
regard to their respective reuse objectives. Cities can be
characterized by their size, the importance of the tourism
sector, or its residential, commercial or industrial sectors,
etc. And also, cities may have diferent priorities for
publishing datasets depending on the type of reuse they want
to promote. The result of this step will be the eigenvectors
of each matrix, meaning the relative importance of the
established indicators.
(3) Finally, the ‘AHP Direct Results’ shows a suitability
ranking list of dataset categories to publish according to the
weights introduced in the second sheet and the indicators
calculated from GitHub shown in the first sheet. That is,
the value used to elaborate such ranking is the result of
multiplying the relative importance of each indicator,
calculated in the second sheet, by the values of the indicators
in the corresponding categories shown in the first sheet.
Thus, the use of this tool allows Smart Cities to prioritize datasets
in a reasonable way based on the data collected from well-known
cities, the indicators taken into account and the open data strategy
of the city.
3</p>
    </sec>
    <sec id="sec-10">
      <title>SIMULATING THE BEHAVIOUR OF THE</title>
    </sec>
    <sec id="sec-11">
      <title>TOOL ON STEREOTYPICAL CITIES</title>
      <p>In order to check our proposal according to diferent motivations
in the weighting process, we have simulated the behavior of the
tool taking into account the diferent prospects of two
stereotypical cities. We asked three experts to agree on the importance
assignment of the indicators, with the assumptions of the two
cities.</p>
      <p>On one hand, a medium-sized town located in a rural region,
with small software companies in its zone rather than big ones,
that is starting to develop its own open data portal. On the other
hand, a big city with a well-known open data portal and a lot of
cutting edge software companies in its area of influence.</p>
      <p>In the first case, we have guessed that the town could be
interested, mainly, in getting reuses of its diferent datasets through
the development of simple applications by small local enterprises.
Hence, the town would assign high weights to eficiency whereas
reputation, size of the community and maturity would perform a
secondary role.</p>
      <p>The weights applied with this philosophy are shown in
Figure 1, and the resulting in the ranking shown in Figure 2. The
ifrst position of ‘Geospatial’ does not change with respect to the
default ranking (same weights for all the indicators) shown in
Figure 3 but the rest of the ranking sufers some variations.</p>
      <p>In the second case, we have conjectured that, due to its portal is
well-known, it does not search for more reuses, that is, eficiency,
but for mature projects with good reputation and bigger
communities behind them. The weights applied with this philosophy
are shown in Figure 4.</p>
      <p>The ranking obtained with these weights is shown in Figure 5
Here, ‘Geospatial’ changes to third position and ‘Welfare’ takes
the first one. As can be seen, the indicators obtained from GitHub
produces that some categories of the ranking tend to have a stable
position regardless of the weights assigned with AHP but, even
so, diferent combinations of weights may change this ranking.
4</p>
    </sec>
    <sec id="sec-12">
      <title>RELATED WORK</title>
      <p>This section gives a description of (i) some relevant studies about
the use of GitHub to measure diferent indicators about Open
Source Software projects, (ii) applications of AHP in Smart Cities
as well as (iii) the most relevant studies about how (local)
governments publish open data.</p>
      <p>
        Firstly, GitHub is used by individuals, communities and
businesses alike to develop software projects. GitHub is free to use
for public and OSS projects, and it is profusely used in studies
on Software Engineering related to OSS success in several works.
Thus, Bissyande et al. uses GitHub [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to study a possible relation
between programming languages and projects success. Marlow et
al. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] analyze metadata projects of GitHub to find how its users
decide whom and what to keep track of, or where to contribute
next. Sheoran et al. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] investigate what kind of contributors
can be the “watchers” of GitHub. Jarczyk et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] study the
relation between popularity of a project in GitHub and its quality.
Muthukumaran et al. [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] uses GitHub to propose change metrics
that can predict possible bugs. As far as we know, this is the first
time GitHub has been used to estimate indicators related to reuse
of open data in OSS projects.
      </p>
      <p>
        Secondly, AHP is a multiple criteria decision making method
that has been used in many diferent applications related to
decision making [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ]. Some works specifically use AHP in Smart
Cities and e-government. In this context, Bartolozzi et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
present a DSS which uses AHP for supporting the
decisionmaking process related to Smart City issues. Sultan et al. [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]
suggest the use of AHP to decide the most appropriate technology
for the development of e-government projects in Smart Cities.
Boselli et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] use AHP to rank the factors for innovating a
smart-mobility service in the city of Milan. A very interesting
use of AHP to evaluate open data portal quality can be found in
Kubler et al. [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The authors propose considering diferent
dimensions: completeness, openness, addressability and
retrievability to assess the quality of 146 open data portals. Although there
are several applications of AHP to the domains of Smart Cities
and e-governments, they all aim at assessing Smart City
strategies and the quality of open data portals. Instead, our approach
proposes AHP to recommend the most appropriate datasets to
be published.
      </p>
      <p>
        Finally, with respect to how (local) governments publish open
data, Conradie &amp; Choenni [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] explain that data release by local
governments is still a novel task, thus knowledge is lacking as to
its benefits and barriers. Therefore, they conduct a participatory
action research approach to get a better understanding of how
internal processes of local governments influence data release.
The authors found that the following indicators needed to be
addressed by local governments to overcome barriers to releasing
public sector information: (i) Data Storage, i.e., is data stored
centrally, or is it decentralized?; (ii) Use of data, i.e., the way data
is used by the department; (iii) Source of data, i.e., how is a set
of data obtained?; and (iv) Suitability of data for release, i.e., are
there rules and regulations that determine whether a dataset may
be released or not, such as privacy or copyright.
      </p>
      <p>
        Notwithstanding, these indicators are related to current data
but do not address the actual use of the data and its benefits.
For example, Hossain et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] show that benefits associated
with opening data are ill-understood. In their systematic review
of open government data initiatives, Attard et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] explore
open data initiatives of a large number of governments, as well
as existing tools and approaches. They found that while eforts
have focused on developing tools for helping data publishers to
open data, there have been no initiatives related to strategies for
supporting decisions on which data to release. This means that
public entities may end up publishing data with no value, rather
than focusing on the relevance of the data they are publishing.
Therefore, success in opening data is not a matter of the amount
of data published, but of understanding how data is reused. As
highlighted by Zuiderwijk &amp; Janssen [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ], since providers of open
data are not concerned with needs of open data users, they do
not know how their data are reused, and business related issues
(such as creation of added-value services or products based on
open data) are not widely used as a decision criterion.
      </p>
      <p>
        Furthermore, Zuiderwijk et al. [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] argue that the publication
of open data is often cumbersome so standard procedures and
processes for opening data are required. They found a series of
barriers preventing easy and low-cost publication of open data,
leading them to propose a set of five design principles for
improving the open data publishing process of public organizations:
(i) start thinking about the opening of data at the beginning of
the process; (ii) develop guidelines, especially about privacy and
policy sensitivity of data; (iii) provide decision support by
integrating insights into the activities of other actors involved in
the publishing process; (iv) make data publication an integral,
well-defined and standardized part of daily procedures and
routines; and (v) monitor how the published data are reused. Our
approach is related to principle (iii) since we provide a decision
support framework based on activities of data consumers. We
also contribute to principle (v) since our approach is useful for
monitoring how datasets are being reused in OSS applications.
Additionally, Jetzek et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] propose a framework to explain
how value is generated from open data. This framework is useful
for governments to understand the value of their open data. Their
framework is based on assessing the impact of open data based
on two dimensions: (i) how openness generates value, and (ii)
how society as a whole can get value from openness. The
authors identify four diferent archetypical generative mechanisms
(cause-efect relationship between open data and value) in their
framework: transparency (open data helps to improve visibility
to ensure socially responsible resource allocation), participation
(open data as a mechanism for engaging stakeholders who help
in solving social problems), eficiency (open data to improve how
resources are used) and innovation (open data as a cornerstone
for generating new ideas, processes, services and products). The
authors claim that their framework can help governments in the
development of their strategy for opening data by considering
factors that can enable the generation of value from open data
through the mechanism of innovation.
      </p>
      <p>
        Furthermore, Zuiderwijk &amp; Janssen [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ] state that diferent
types of users of open data are often interested in diferent types
of data, therefore, publication of data can be improved by taking
into account preferences for certain types of data for certain open
data users.
      </p>
      <p>Therefore, there are several methods that support opening
data, but to the best of our knowledge no approaches focus on
supporting Smart Cities in selecting and prioritizing which datasets
should be open according to their preferences and the context of
the city they work for. To fill this gap, we presented our approach
based on obtaining useful indicators from Socrata and GitHub
and use them with AHP.
5</p>
    </sec>
    <sec id="sec-13">
      <title>CONCLUSIONS</title>
      <p>Smart Cities usually have a limited budget and insuficient time
to release and maintain all available open data. In this paper, we
have presented an approach whose goal is to provide an AHP tool
that allows weighting diferent indicators of reuse, calculated
using Socrata and GitHub as sources of information, in order
to combine them taking into account objective criteria. This
approach is characterized by:
(1) A classification of 14 categories for Smart City open datasets
based on the G8 Open Data Charter and the Smart City
domain.
(2) A definition of 4 indicators based on the reuse of datasets
in OSS projects.
(3) Almost 9000 open located datasets of many of the most
important US cities.
(4) A catalogue of these US city datasets classified according
to the proposed categories.
(5) Around 32000 distinct references from 2500 diferent GitHub
projects referencing two thirds of the categorized datasets
found, based on a search performed over all OSS projects
in GitHub.
(6) An estimation of the defined indicators of reuse of every</p>
      <p>Smart City dataset category.
(7) An AHP-based Decision Support System to recommend
Smart City dataset categories to prioritize, taking into
account the estimated indicators and the importance of
each indicator for the cities.</p>
      <p>This approach is completely functional and reproducible. We
provide a public repository containing the data obtained from
Socrata and GitHub, the scripts to collect and analyze the
information and the AHP tool in order to users can use or modify
these processes. So, Smart Cities or any other public institution
can reuse and adapt them to their concrete requirements
regardless of whether they work in a Smart City or in any other type
of institution. In this sense, further alternative applications of
our approach that can be considered as a continuation of this
research may include:
(1) Searching and categorizing open datasets of diferent cities,
regions, countries, companies or any other kind of
institutions in order to get more data.
(2) Developing semantic-based software tools for automatic
classification of datasets.
(3) Analyzing the reuse of open datasets in proprietary
software projects, for instance, by developing an app web
repository where developers could register their
applications that use open data and indicating which particular
datasets are reused.
(4) Analyzing the impact of open datasets in mass media,
social media, blogs, etc. by searching the references to the
datasets in these sites.
(5) A set of controlled experiments to demonstrate the
efectiveness of our approach in diferent scenarios.</p>
      <p>In summary, a successful publication of open datasets should
be based on the proper combination of the objectives of the open
data portal and the analysis of the impact of already available
open datasets. This approach provides a useful method for Smart
City decision makers to carry out this task in an objective and
analytic way.
6</p>
    </sec>
    <sec id="sec-14">
      <title>ACKNOWLEDGEMENTS</title>
      <p>We would like to thank GitHub that allowed us to use its API
without limitations and Socrata that provides a way to collect
precisely all the datasets published using its tools. This work
has been developed with the support of (i) TIN2015-69957-R and
TIN2016-78103-C2-2-R (MINECO/ERDF, EU) project, (ii) POCTEP
4IE project (0045-4IE-4-P), (iii) Consejería de Economía e
Infraestructuras/Junta de Extremadura (Spain) - European Regional
Development Fund (ERDF)- GR15098 project and IB16055 project,
and (iv) Consejería de Educación y Empleo/Junta de Extremadura
(Spain) - Becas de Movilidad al Personal Docente e Investigador
Curso 2016/2017.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Judie</given-names>
            <surname>Attard</surname>
          </string-name>
          , Fabrizio Orlandi, Simon Scerri, and
          <string-name>
            <given-names>Sören</given-names>
            <surname>Auer</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>A systematic review of open government data initiatives</article-title>
          .
          <source>Government Information Quarterly</source>
          <volume>32</volume>
          ,
          <issue>4</issue>
          (
          <year>2015</year>
          ),
          <fpage>399</fpage>
          -
          <lpage>418</lpage>
          . https://doi.org/10.1016/j.giq.
          <year>2015</year>
          .
          <volume>07</volume>
          .006
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Marco</given-names>
            <surname>Bartolozzi</surname>
          </string-name>
          , Pierfrancesco Bellini, Paolo Nesi, Gianni Pantaleo, and
          <string-name>
            <given-names>Luca</given-names>
            <surname>Santi</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>A Smart Decision Support System for Smart City</article-title>
          . In 2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity). IEEE,
          <fpage>117</fpage>
          -
          <lpage>122</lpage>
          . https://doi.org/10.1109/SmartCity.
          <year>2015</year>
          .57
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Tegawende</surname>
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Bissyande</surname>
            , Ferdian Thung, David Lo,
            <given-names>Lingxiao</given-names>
          </string-name>
          <string-name>
            <surname>Jiang</surname>
            , and
            <given-names>Laurent</given-names>
          </string-name>
          <string-name>
            <surname>Reveillere</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Popularity, interoperability, and impact of programming languages in 100,000 open source projects</article-title>
          .
          <source>In Proceedings - International Computer Software and Applications Conference</source>
          . IEEE,
          <fpage>303</fpage>
          -
          <lpage>312</lpage>
          . https://doi.org/10.1109/COMPSAC.
          <year>2013</year>
          .55
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Boselli</surname>
          </string-name>
          , Mirko Cesarini, Fabio Mercorio, and
          <string-name>
            <given-names>Mario</given-names>
            <surname>Mezzanzanica</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Applying the AHP to Smart Mobility Services: A Case Study</article-title>
          .
          <source>In Proceedings of 4th International Conference on Data Management Technologies and Applications - Volume</source>
          <volume>1</volume>
          : KomIS. SCITEPRESS,
          <fpage>354</fpage>
          -
          <lpage>361</lpage>
          . https: //doi.org/10.5220/0005580003540361
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Hitachi</given-names>
            <surname>Vantara Community</surname>
          </string-name>
          .
          <year>2018</year>
          . Data Integration - Kettle. (
          <year>2018</year>
          ). http: //community.pentaho.com/projects/data-integration/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Peter</given-names>
            <surname>Conradie</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sunil</given-names>
            <surname>Choenni</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>On the barriers for local government releasing open data</article-title>
          .
          <source>Government Information Quarterly 31</source>
          ,
          <issue>SUPPL</issue>
          .
          <volume>1</volume>
          (
          <issue>2014</issue>
          ),
          <fpage>S10</fpage>
          -
          <lpage>S17</lpage>
          . https://doi.org/10.1016/j.giq.
          <year>2014</year>
          .
          <volume>01</volume>
          .003
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Carlo</given-names>
            <surname>Dafara</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Estimating the Economic Contribution of Open Source Software to the European Economy</article-title>
          .
          <source>In The First Openforum Academy Conference Proceedings. OpenForum Europe LTD</source>
          ,
          <fpage>11</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Rishab</given-names>
            <surname>Aiyer Ghosh</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Economic impact of open source software on innovation and the competitiveness of the Information and Communication Technologies (ICT) sector in the EU</article-title>
          .
          <source>Technical Report</source>
          . Maastricht:
          <article-title>UNU-MERIT</article-title>
          . http://stuermer.ch/blog/documents/FLOSSImpactOnEU.pdf
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Github</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Github: The world's leading software development platform</article-title>
          . (
          <year>2018</year>
          ). https://www.github.com/
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10] Group of Eight.
          <year>2013</year>
          .
          <article-title>G8 Open Data Charter</article-title>
          . (
          <year>2013</year>
          ). https: //www.gov.uk/government/uploads/system/uploads/attachment_data/ ifle/207772/Open_Data_Charter.pdf
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Jefrey</given-names>
            <surname>Hammond</surname>
          </string-name>
          , Paul Santinelli, Jay Jay Billings, and
          <string-name>
            <given-names>Bill</given-names>
            <surname>Ledingham</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>The Tenth Annual Future of Open Source Survey</article-title>
          .
          <source>Technical Report. Black Duck Software and North Bridge</source>
          . https://www.blackducksoftware.com/ 2016-future
          <article-title>-of-open-source</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Anders</surname>
            <given-names>Hjalmarsson</given-names>
          </string-name>
          , Niklas Johansson, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Rudmark</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Mind the gap: Exploring stakeholders' value with open data assessment</article-title>
          .
          <source>In Proceedings of the Annual Hawaii International Conference on System Sciences. IEEE</source>
          ,
          <fpage>1314</fpage>
          -
          <lpage>1323</lpage>
          . https://doi.org/10.1109/HICSS.
          <year>2015</year>
          .160
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Mohammad</given-names>
            <surname>Alamgir</surname>
          </string-name>
          <string-name>
            <surname>Hossain</surname>
          </string-name>
          ,
          <article-title>Yogesh K Dwivedi,</article-title>
          and
          <string-name>
            <given-names>Nripendra P.</given-names>
            <surname>Rana</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>State of the Art in Open Data Research: Insights from Existing Literature and a Research Agenda</article-title>
          .
          <source>Journal of Organizational Computing and Electronic Commerce</source>
          <volume>26</volume>
          ,
          <fpage>1</fpage>
          -
          <lpage>2</lpage>
          (apr
          <year>2016</year>
          ),
          <fpage>14</fpage>
          -
          <lpage>40</lpage>
          . https://doi.org/10.1080/10919392.
          <year>2015</year>
          . 1124007
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Marijn</surname>
            <given-names>Janssen</given-names>
          </string-name>
          , Yannis Charalabidis, and
          <string-name>
            <given-names>Anneke</given-names>
            <surname>Zuiderwijk</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Benefits, Adoption Barriers and Myths of Open Data and Open Government</article-title>
          .
          <source>Information Systems Management</source>
          <volume>29</volume>
          ,
          <issue>4</issue>
          (sep
          <year>2012</year>
          ),
          <fpage>258</fpage>
          -
          <lpage>268</lpage>
          . https://doi.org/10.1080/ 10580530.
          <year>2012</year>
          .
          <volume>716740</volume>
          arXiv:arXiv:
          <fpage>1011</fpage>
          .1669v3
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Oskar</surname>
            <given-names>Jarczyk</given-names>
          </string-name>
          , Blazej Gruszka, Szymon Jaroszewicz, and
          <string-name>
            <given-names>Leszek</given-names>
            <surname>Bukowski</surname>
          </string-name>
          .
          <year>2014</year>
          . GitHub Projects.
          <article-title>Quality Analysis of Open-Source Software</article-title>
          .
          <source>In SocInfo 2014: The 6th International Conference on Social Informatics</source>
          . Springer, Cham,
          <fpage>80</fpage>
          -
          <lpage>94</lpage>
          . https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -13734-
          <issue>6</issue>
          _
          <fpage>6</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Thorhildur</surname>
            <given-names>Jetzek</given-names>
          </string-name>
          , Michel Avital, and
          <string-name>
            <surname>Niels</surname>
          </string-name>
          Bjorn-Andersen.
          <year>2014</year>
          .
          <article-title>Datadriven innovation through open government data</article-title>
          .
          <source>Journal of Theoretical and Applied Electronic Commerce Research</source>
          <volume>9</volume>
          ,
          <issue>2</issue>
          (aug
          <year>2014</year>
          ),
          <fpage>100</fpage>
          -
          <lpage>120</lpage>
          . https: //doi.org/10.4067/S0718-18762014000200008
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Maxat</given-names>
            <surname>Kassen</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A promising phenomenon of open data: A case study of the Chicago open data project</article-title>
          .
          <source>Government Information Quarterly</source>
          <volume>30</volume>
          ,
          <issue>4</issue>
          (
          <year>2013</year>
          ),
          <fpage>508</fpage>
          -
          <lpage>513</lpage>
          . https://doi.org/10.1016/j.giq.
          <year>2013</year>
          .
          <volume>05</volume>
          .012
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Sylvain</surname>
            <given-names>Kubler</given-names>
          </string-name>
          , Jérémy Robert, Yves Le Traon, Jürgen Umbrich, and
          <string-name>
            <given-names>Sebastian</given-names>
            <surname>Neumaier</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Open Data Portal Quality Comparison using AHP</article-title>
          .
          <source>In Proceedings of the 17th International Digital Government Research Conference on Digital Government Research - dg.o '16</source>
          . ACM Press, New York, New York, USA,
          <fpage>397</fpage>
          -
          <lpage>407</lpage>
          . https://doi.org/10.1145/2912160.2912167
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Jennifer</surname>
            <given-names>Marlow</given-names>
          </string-name>
          , Laura Dabbish, and
          <string-name>
            <given-names>Jim</given-names>
            <surname>Herbsleb</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Impression Formation in Online Peer Production : Activity Traces and Personal Profiles in GitHub</article-title>
          .
          <source>In 16th ACM Conference on Computer Supported Cooperative Work</source>
          . ACM Press, New York, New York, USA,
          <fpage>117</fpage>
          -
          <lpage>128</lpage>
          . https://doi.org/10.1145/2441776.2441792
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>K.</given-names>
            <surname>Muthukumaran</surname>
          </string-name>
          , Abhinav Choudhary, and
          <string-name>
            <given-names>N.L. Bhanu</given-names>
            <surname>Murthy</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Mining GitHub for Novel Change Metrics to Predict Buggy Files in Software Systems</article-title>
          .
          <source>In 2015 International Conference on Computational Intelligence and Networks</source>
          . IEEE,
          <fpage>15</fpage>
          -
          <lpage>20</lpage>
          . https://doi.org/10.1109/CINE.
          <year>2015</year>
          .13
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Paolo</surname>
            <given-names>Neirotti</given-names>
          </string-name>
          , Alberto De Marco, Anna Corinna Cagliano, Giulio Mangano, and
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Scorrano</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Current trends in smart city initiatives: Some stylised facts</article-title>
          .
          <source>Cities</source>
          <volume>38</volume>
          (
          <year>2014</year>
          ),
          <fpage>25</fpage>
          -
          <lpage>36</lpage>
          . https://doi.org/10.1016/j.cities.
          <year>2013</year>
          .
          <volume>12</volume>
          . 010
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <source>[22] National League of Cities</source>
          .
          <year>2018</year>
          . National League of Cities. (
          <year>2018</year>
          ). https: //www.nlc.org/
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Monica</surname>
            <given-names>Palmirani</given-names>
          </string-name>
          , Michele Martoni, and
          <string-name>
            <given-names>Dino</given-names>
            <surname>Girardi</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Beyond Transparency Introduction : OGA Beyond Transparency</article-title>
          .
          <source>Electronic Government and the Information Systems Perspective (EGOVIS</source>
          <year>2014</year>
          )
          <volume>8650</volume>
          ,
          <year>2014</year>
          (
          <year>2014</year>
          ),
          <fpage>275</fpage>
          -
          <lpage>291</lpage>
          . https://doi.org/10.1007/978-3-
          <fpage>319</fpage>
          -10178-1_
          <fpage>22</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>T.L.</given-names>
            <surname>Saaty</surname>
          </string-name>
          .
          <year>1980</year>
          .
          <article-title>The Analytic Hierarchy Process</article-title>
          .
          <string-name>
            <surname>McGraw-Hill</surname>
          </string-name>
          , New York.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Jyoti</surname>
            <given-names>Sheoran</given-names>
          </string-name>
          , Kelly Blincoe, Eirini Kalliamvakou, Daniela Damian, and
          <string-name>
            <given-names>Jordan</given-names>
            <surname>Ell</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Understanding "watchers" on GitHub</article-title>
          .
          <source>In MSR 2014: Proceedings of the 11th Working Conference on Mining Software Repositories</source>
          . ACM Press, New York, New York, USA,
          <fpage>336</fpage>
          -
          <lpage>339</lpage>
          . https://doi.org/10.1145/2597073.2597114
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Socrata</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Socrata: Data-driven innovation of government programs</article-title>
          . (
          <year>2018</year>
          ). https://www.socrata.com/
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Katherine</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Stewart</surname>
            ,
            <given-names>Anthony P.</given-names>
          </string-name>
          <string-name>
            <surname>Ammeter</surname>
          </string-name>
          , and
          <string-name>
            <surname>Likoebe</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maruping</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Impacts of license choice and organizational sponsorship on user interest and development activity in open source software projects</article-title>
          .
          <source>Information Systems Research</source>
          <volume>17</volume>
          , 2 (jun
          <year>2006</year>
          ),
          <fpage>126</fpage>
          -
          <lpage>144</lpage>
          . https://doi.org/10.1287/isre.1060.0082
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Chandrasekar</surname>
            <given-names>Subramaniam</given-names>
          </string-name>
          , Ravi Sen,
          <string-name>
            <given-names>and Matthew L.</given-names>
            <surname>Nelson</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Determinants of open source software project success: A longitudinal study</article-title>
          .
          <source>Decision Support Systems</source>
          <volume>46</volume>
          , 2 (jan
          <year>2009</year>
          ),
          <fpage>576</fpage>
          -
          <lpage>585</lpage>
          . https://doi.org/10.1016/j.dss.
          <year>2008</year>
          .
          <volume>10</volume>
          .005 arXiv:arXiv:cond-mat/
          <year>0402594v3</year>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Abobakr</surname>
            <given-names>Sultan</given-names>
          </string-name>
          ,
          <article-title>Khalid A. AlArfaj, and Ghassan A</article-title>
          .
          <source>AlKutbi</source>
          .
          <year>2012</year>
          .
          <article-title>Analytic hierarchy process for the success of e-government</article-title>
          .
          <source>Business Strategy Series</source>
          <volume>13</volume>
          ,
          <issue>6</issue>
          (nov
          <year>2012</year>
          ),
          <fpage>295</fpage>
          -
          <lpage>306</lpage>
          . https://doi.org/10.1108/17515631211286146
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Jefrey</surname>
            <given-names>Thorsby</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Genie N.L.</given-names>
            <surname>Stowers</surname>
          </string-name>
          , Kristen Wolslegel, and
          <string-name>
            <given-names>Ellie</given-names>
            <surname>Tumbuan</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Understanding the content and features of open data portals in American cities</article-title>
          .
          <source>Government Information Quarterly</source>
          <volume>34</volume>
          ,
          <issue>1</issue>
          (
          <year>2016</year>
          ),
          <fpage>53</fpage>
          -
          <lpage>61</lpage>
          . https://doi.org/ 10.1016/j.giq.
          <year>2016</year>
          .
          <volume>07</volume>
          .001
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Omkarprasad</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Vaidya</surname>
            and
            <given-names>Sushil</given-names>
          </string-name>
          <string-name>
            <surname>Kumar</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Analytic hierarchy process: An overview of applications</article-title>
          .
          <source>European Journal of Operational Research</source>
          <volume>169</volume>
          ,
          <issue>1</issue>
          (
          <year>2006</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          . https://doi.org/10.1016/j.ejor.
          <year>2004</year>
          .
          <volume>04</volume>
          .028
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Nils</surname>
            <given-names>Walravens</given-names>
          </string-name>
          , Jonas Breuer, and
          <string-name>
            <given-names>Pieter</given-names>
            <surname>Ballon</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Open Data as a Catalyst For The Smart City as a Local Innovation Platform</article-title>
          .
          <source>Communications &amp; Strategies</source>
          <volume>96</volume>
          , 4th quarter
          <year>2014</year>
          (
          <year>2014</year>
          ),
          <fpage>15</fpage>
          -
          <lpage>33</lpage>
          . https://ssrn.com/abstract=2636315
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Liguo</surname>
            <given-names>Yu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Alok</given-names>
            <surname>Mishra</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Deepti</given-names>
            <surname>Mishra</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>An Empirical Study of the Dynamics of GitHub Repository and Its Impact on Distributed Software Development</article-title>
          .
          <source>In Proceedings of the Confederated International Workshops on On the Move to Meaningful Internet Systems: OTM 2014</source>
          Workshops - Volume
          <volume>8842</volume>
          . Springer-Verlag New York, Inc.,
          <fpage>457</fpage>
          -
          <lpage>466</lpage>
          . https://doi.org/10.1007/ 978-3-
          <fpage>662</fpage>
          -45550-0_
          <fpage>46</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>Anneke</given-names>
            <surname>Zuiderwijk</surname>
          </string-name>
          and
          <string-name>
            <given-names>Marijn</given-names>
            <surname>Janssen</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A Coordination Theory Perspective to Improve the Use of Open Data in Policy-Making</article-title>
          .
          <source>In Proceedings of the 12th IFIP WG 8</source>
          .5 International Conference on Electronic Government - Volume
          <volume>8074</volume>
          . Springer-Verlag New York, Inc.,
          <fpage>38</fpage>
          -
          <lpage>49</lpage>
          . https://doi.org/10.1007/ 978-3-
          <fpage>642</fpage>
          -40358-
          <issue>3</issue>
          _
          <fpage>4</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>Anneke</given-names>
            <surname>Zuiderwijk</surname>
          </string-name>
          and
          <string-name>
            <given-names>Marijn</given-names>
            <surname>Janssen</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Barriers and Development Directions for the Publication and Usage of Open Data: A Socio-Technical View</article-title>
          . In Open Government. Vol.
          <volume>4</volume>
          . Springer New York, New York, NY,
          <fpage>115</fpage>
          -
          <lpage>135</lpage>
          . https://doi.org/10.1007/978-1-
          <fpage>4614</fpage>
          -9563-
          <issue>5</issue>
          _8 arXiv:arXiv:
          <fpage>1011</fpage>
          .1669v3
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Anneke</surname>
            <given-names>Zuiderwijk</given-names>
          </string-name>
          , Marijn Janssen, Sunil Choenni, and
          <string-name>
            <given-names>Ronald</given-names>
            <surname>Meijer</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Design principles for improving the process of publishing open data</article-title>
          .
          <source>Transforming Government: People, Process and Policy</source>
          <volume>8</volume>
          ,
          <issue>2</issue>
          (may
          <year>2014</year>
          ),
          <fpage>185</fpage>
          -
          <lpage>204</lpage>
          . https://doi.org/10.1108/TG-07-2013-0024
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>Anneke</surname>
            <given-names>Zuiderwijk</given-names>
          </string-name>
          , Iryna Susha, Yannis Charalabidis,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Parycek</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Marijn</given-names>
            <surname>Janssen</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Open data disclosure and use : critical factors from a case study</article-title>
          .
          <source>In In: CeDEM 2015: Proceedings of the International Conference for E-Democracy and Open Government</source>
          <year>2015</year>
          .
          <string-name>
            <given-names>Edition</given-names>
            <surname>Donau-Universität</surname>
          </string-name>
          <string-name>
            <surname>Krems</surname>
          </string-name>
          ,
          <fpage>197</fpage>
          -
          <lpage>208</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>