<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Regional Population Estimation Using Satellite Imagery</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aikaterini Koutsouri</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ilektra Skepetari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Konstantinos Anastasakis</string-name>
          <email>kon.anastasakis@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefanos Lappas</string-name>
          <email>steflappas@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National Technical University of Athens</institution>
          ,
          <addr-line>Iroon Politechniou 9, Athens</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National and Kapodistrian University of Athens</institution>
          ,
          <addr-line>Panepistimiou 30, Athens</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Population data provides statistical information that can in turn support decision making processes. It provides essential information regarding many practices such as rescue operations or humanitarian actions, which require the estimation of local population. While traditional approaches are possible, they tend to be time consuming and expensive. Copernicus Earth Observation data (Sentinel 2 satellite images) provides high resolution satellite imagery that could be a useful alternative to collecting census data since it is signi cantly cheaper than traditional methods. The purpose of the following study examines a population extraction method that is based on open source high resolution images retrieved from Sentinel 2, as proposed by the remote task[1] by the imageCLEF2017[2] campaign. This paper describes a methodology for exporting a population estimation using satellite imagery based on the use of classi cation techniques coupled with a statistical forecast on historical data.</p>
      </abstract>
      <kwd-group>
        <kwd>Satellite Image Analysis</kwd>
        <kwd>Supervised Classi cation</kwd>
        <kwd>Population Estimation Mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The work described in this paper is directed at the automated estimation of
population using Supervised Classi cation through a GIS platform applied to
satellite imagery. The motivation for the work is that the collection of population
data, when conducted in the traditional manner can be time consuming and
ine cient. This is especially the case in rural areas that lack of sophisticated
communication and transport infrastructure, especially in developing countries.
The solution proposed in this paper is based on the concept of using satellite
imagery in order to estimate the population. The concept includes using a small
sample of satellite images to build a classi er that can then be used to predict
populated areas. The main issue to be addressed is how to best use the classi ed
result towards matching it to the number of people in the study area. In order to
correspond to the task described above, the following methodology is proposing
(i) an image supervised classi cation technique , (ii) a statistical forecasting
procedure and (iii) a regional disaggregation technique which will be furtherer
explained in the following sections.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Gathering and Processing Sentinel Data</title>
      <p>
        With this chapter, the methodology of downloading and analyzing Sentinel2
data shall be described. The Semi-Automatic Classi cation Plug-in (SCP)[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is
a free open source plug-in for QGIS that is used for semi-automatic classi cation
or supervised classi cation of remote sensing images as well as tools for image
preprocessing, the classi cation post processing, and the raster calculation and
is used for obtaining the imagery needed for the purposes of this study.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Gathering Raster Files</title>
        <p>In order to gather the Satellite imagery, The Copernicus Open Access Hub can
be used, which provides complete, free and open access to Sentinel-1,
Sentinel2 and Sentinel-3 user products. The SCP Sentinel-2 tab allows the searching
and downloading of the desired imagery (Level-1C) using the Data Hub API.
The search area is de ned accordingly for each one of the areas of interest to
include all of the respective subregions. For the purposes of this study, the date
of acquisition was de ned as of 2016 or later in order to make the classi cation
process relevant to the desired results. The maximum cloud cover shall be set
not to exceed 10 % to help with the process whereas bands 1, 9, and 10 shall
not be included.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Supervised Classi cation</title>
        <p>In this section, the steps followed when classifying the satellite images will be
explained. For the purposes of this study, several compounds of Uganda and Lusaka
are examined. The two regions are classi ed separately. The same approach is
followed for both of the study regions. It is noted that multiple di erent satellite
images might be necessary to cover the entire study area. For the Sentinel-2
images, the bands are converted to re ectance and clipped to an extend large
enough to cover each area examined so as to reduce the computational time of
the process. The band set which is the input image for SCP is de ned and the
regions of interest are created.</p>
        <p>Iterating through the di erent color composites, allows completion of the
classi cation. The de ned buildings macro-class is the area of interest that will
be extracted. This shall provide us with high resolution two colored pixelated
images that hold information regarding the buildings of the areas. Such data
shall be used for completing the methodology through the process that will be
furtherer explained in the next sections.
The classi cation results can be validated through comparison with the original
raster les as well as old shape- le layers provided by various open data sources
(e.g. OpenLayers Plug-in, Geofabrik).</p>
        <p>
          Following the completion of the process described above, the two classi ed
regions are separated using polygons that describe the sub-areas of interest. For
the purposes of this study, the PNG images of the clipped images (17 for Uganda,
89 for Lusaka) are extracted, each one now containing a certain number of blank
pixels corresponding to the living areas of each district. Concerning their usage
in the estimation algorithm that will be described, it is important to note that
all extracted classi ed les of each study area shall correctly correspond to the
respective area's land coverage ratio.
While the results of the classi cation process can provide a decent overview of
the population distribution, the buildings information shall be used in a way
that can allow to extract a population estimation. The historical Census data of
the countries can be used to predict the aggregated population result that can be
later split into the study areas accordingly through the classi ed imagery. The
statistical analysis of the Census data consists of two basic steps, (i) constructing
the population growth time-series and (ii) a forecasting model in order to pertain
to the out of sample observation; the current year population number.
Various Census data for demographic purposes are used for both areas [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]
obtained from The World Bank organization. Concerning the areas of interest of
the areas of interest of the present study, the latest population data that was
available for both regions of Uganda and Zambia was that of 2015.
Following the process of collecting the population data time series, we can
proceed to choosing the best tting model to forecast the 2017 population of the
two countries based on previously observed values. Regarding choosing the
correct forecasting technique we shall examine several error metrics (ME, MPE,
MAPE, MSE, sMAPE). The trend of our time-series shall also not be
overlooked and, taking into consideration the growth ratio of the examined African
regions, slightly optimistic methods are suitable for the purposes of this study.
The time-series dataset that was obtained by the analysis procedure described
in the previous section can be used as input in the forecasting support system
OMEN[
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], a fully customizable web-based forecasting tool.
        </p>
        <p>
          A cross-validation competition between the methods Naive, MAPA[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], ETS[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ],
Theta[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], ARIMA[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], SES, Holt and Damped [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] is performed through the
platform. The rolling origin evaluation procedure [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] is used with the validation
window matching the forecasting horizon and the performance of each method
is evaluated based on the minimization of the squared errors. Thus, the 'best
tted' model is selected, which in this case was the autoregressive integrated
moving average (ARIMA) model for both Uganda and Zambia; suitable for low
frequency and short horizons.
Theta 362,352.64 1.04 % 1.85 % 394,477,077,721 1.85 %
Naive 928,936.47 2.99 % 2.99 % 937,103,090,150 3.03 %
        </p>
        <p>ETS -4,143.2 -0.01 % 0.31 % 20,334,332,759 0.31 %
ARIMA -29,783.84 -0.08 % 0.09 % 14,403,091,721 0.09 %
SES 701,607.45 2.01 % 3.96 % 1,816,289,360,487 3.93 %</p>
        <p>Holt -4,140.01 -0.01 % 0.31 % 20,334,001,686 0.31 %
Damped 1,361.34 0.01 % 0.3 % 20,354,628,986 0.3 %
MAPA -4,683,085.6 -17.93 % 19.79 % 42,915,827,697,557 17.12 %
Theta 132245.72 0.91 % 1.5 % 50,440,089,294 1.5 %
Naive 335,253.88 2.53 % 2.53 % 125,691,193,414 2.56%</p>
        <p>ETS -9,335.3 -0.06 % 0.3 % 6,423,231,745 0.3 %
ARIMA -6,156.78 -0.02 % 0.21 % 5,608,027,543 0.21 %
SES 256,146.93 1.77 % 3.28 % 232,161,298,596 3.27 %</p>
        <p>Holt -9,334.91 -0.06 % 0.3 % 6,423,208,336 0.3 %
Damped -5,292.38 -0.03 % 0.3 % 6,161,968,390 0.3 %
MAPA -1,742,891.55 -15.24 % 16.82 % 5,825,850,536,317 14.89 %
The results extracted are 40,989,197 and 16,932,235 for the two regions
respectively.
2001 24,146,152 10,723,229
2002 24,945,231 11,000,608
2003 25,786,777 11,282,992
2004 26,666,251 11,575,821
2005 27,578,578 11,884,613
2006 28,521,669 12,212,550
2007 29,496,442 12,560,093
2008 30,503,193 12,926,628
2009 31,540,776 13,311,214
2010 32,608,271 13,712,644
2011 33,704,880 14,130,483
2012 34,830,481 14,565,054
2013 35,987,004 15,016,334
2014 37,178,179 15,483,715
2015 38,407,677 15,966,555
2016 39,675,318 16,418,477
2017 40,983,129 16,885,858
3.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Regional Disaggregation and Adjustments</title>
        <p>The processes described above provide with an arithmetic output regarding
Uganda's and Zambia's total population. Such information is used to proceed to
the following step which is a top-down division of the countries' population in
order to come to a result about each speci c area of interest in the given
shapeles. It is however important to note that the subareas of interest might not
add up to the entire countries' extend and therefore a straightforward division
of the entire population to the acreage of the living areas (as obtained from the
classi cation process) might not be possible. Because of this di culty, a di
erent approach shall be examined. In order to split the total population into the
di erent areas, a weight variable will be calculated for each pixel corresponding
to living areas in the classi ed images. To achieve this, we gather Census data
about housing and population in a small subregion of each one of the countries
of interest and estimated the multiplier for each pixel.</p>
        <p>
          Uganda For the purposes of this study, we shall rst examine the areas of
Uganda. Regarding the region of Katwe Lake, according to Census data obtained
from the Uganda Bureau of Statistics[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], the region had a total population of
about 23,559 people in 2014. The quotient of the compound's population and
Uganda's total population is estimated to be equal to 0:00063368 in 2014.
Assuming that the population density of the region has not changed signi cantly,
it is estimated that the compound's population for the year of 2017 shall equal
to 25; 970.
        </p>
        <p>We proceed to examine the classi cation result extracted by the methodology
that was described in the previous section.</p>
        <p>We shall proceed to sum the pixels of the living areas (rgb(255,255,255)) in
the region; The above process gives us a result of 2,242 total pixels that represent
living areas in the compound of Lake Katwe. Thus, we calculate that the density
multiplier of each pixel that represents living area in the compounds of Lusaka,
shall equal to 11:5834.</p>
        <p>
          Lusaka The same process is followed for the regions of Lusaka, where the
district of Kanyama is examined. According to Census data from 2010[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], the
Kanyama compound had a total population of about 366,170 people in 2010
leading us to the conclusion that the quotient of the compound's population and
Zambia's total population equaled to 0:026703 in that year. Assuming that the
population density of the region has not changed signi cantly, the compound's
population for the year of 2017 is estimated to be equal to 388; 002.
        </p>
        <p>We then proceed to examine the classi cation results for Lusaka; it is noted
that the Kanyama compound consists of three polygon shapes: (i) Old Kanyama
(ZMB TW0029), (ii) New Kanyama (ZMB TW0050) and (iii) Kanyama West
(ZMB TW0078). The aggregated result for those three subregions can be
examined.</p>
        <p>We shall proceed to sum the pixels of the living areas (rgb(255,255,255)) in
the three districts using the same methodology. It is estimated that a result of
total 64,199 pixels represents living areas in the compound of Kanyama.
Therefore we calculate that the density multiplier of each pixel that represents living
area in our Lusaka study area shall equal to 7:02352.
After calculating the weight variables for both regions as described above, the
clipped pixelated images along with the multipliers can be used as input to
calculate the population of each region. For each compound, each white pixel is
multiplied by the estimated weight variable of the respective study area (11.5834
for Uganda - 7.02352 for Lusaka). The outputs provide with the nal dataset
containing the population estimations for all the subregions of each country.</p>
        <p>It is important to note that the accuracy of the procedure described, depends
largely on the classi cation output. A well classi ed image similar to the one
obtained from the regions of Uganda during this study, can improve the results
whereas a poorly classi ed region similar to the one obtained from Lusaka shall
result to a dataset that di ers greatly from its actual population.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Notes and Comments.</title>
      <p>The results obtained from the described methodology shall provide us with the
nal dataset containing information on the population data for each one of the
study areas. However, the same procedure can be followed for estimating the
dwellings of the regions of interest by appropriately parameterizing the census
data input along with the forecasting model described in the previous sections.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Helbert</given-names>
            <surname>Arenas</surname>
          </string-name>
          , Bayzidul Islam, and
          <string-name>
            <given-names>Josiane</given-names>
            <surname>Mothe</surname>
          </string-name>
          .
          <article-title>Overview of the ImageCLEF 2017 Population Estimation Task</article-title>
          .
          <source>In CLEF 2017 Labs Working Notes, CEUR Workshop Proceedings</source>
          , Dublin, Ireland,
          <source>September</source>
          <volume>11</volume>
          -14
          <year>2017</year>
          .
          <article-title>CEUR-WS</article-title>
          .org &lt;http://ceur-ws.
          <source>org&gt;.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Bogdan</given-names>
            <surname>Ionescu</surname>
          </string-name>
          , Henning Muller, Mauricio Villegas, Helbert Arenas, Giulia Boato,
          <string-name>
            <surname>Duc-Tien</surname>
            Dang-Nguyen, Yashin Dicente Cid, Carsten Eickho , Alba Garcia Seco de Herrera, Cathal Gurrin, Bayzidul Islam, Vassili Kovalev, Vitali Liauchuk, Josiane Mothe, Luca Piras,
            <given-names>Michael</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
            , and
            <given-names>Immanuel</given-names>
          </string-name>
          <string-name>
            <surname>Schwall</surname>
          </string-name>
          .
          <source>Overview of ImageCLEF</source>
          <year>2017</year>
          :
          <article-title>Information extraction from images</article-title>
          .
          <source>In Experimental IR Meets Multilinguality, Multimodality, and Interaction 8th International Conference of the CLEF Association, CLEF</source>
          <year>2017</year>
          , volume
          <volume>10456</volume>
          of Lecture Notes in Computer Science, Dublin, Ireland,
          <source>September</source>
          <volume>11</volume>
          -14
          <year>2017</year>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Congedo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <year>2014</year>
          :
          <article-title>Semi-Automatic Classi cation Plugin User Manual</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Rajendran</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mani</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <year>2015</year>
          :
          <article-title>Quantifying the Dynamics of Landscape Patterns in Thiruvananthapuram Corporation Using Open Source GIS Tools</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Assimakopoulos</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nikolopoulos</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <year>2000</year>
          :
          <article-title>The theta model: a decomposition approach to forecasting</article-title>
          .
          <source>International Journal of Forecasting</source>
          <volume>16</volume>
          (
          <issue>4</issue>
          ),
          <fpage>521</fpage>
          -
          <lpage>530</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>E. S.</given-names>
          </string-name>
          ,
          <year>2006</year>
          :
          <article-title>Exponential smoothing: The state of the art part II</article-title>
          .
          <source>International Journal of Forecasting</source>
          <volume>22</volume>
          (
          <issue>4</issue>
          ),
          <fpage>637</fpage>
          -
          <lpage>666</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hyndman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khandakar</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <year>2008</year>
          .
          <article-title>Automatic time series forecasting: the forecast package for r</article-title>
          .
          <source>Journal of Statistical Software</source>
          <volume>26</volume>
          (
          <issue>3</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hyndman</surname>
            ,
            <given-names>R. J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koehler</surname>
            ,
            <given-names>A. B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snyder</surname>
            ,
            <given-names>R. D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grose</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <year>2002</year>
          :
          <article-title>A state space framework for automatic forecasting using exponential smoothing methods</article-title>
          .
          <source>International Journal of Forecasting</source>
          <volume>18</volume>
          (
          <issue>3</issue>
          ),
          <fpage>439</fpage>
          -
          <lpage>454</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kourentzes</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petropoulos</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <year>2016</year>
          :
          <string-name>
            <given-names>Multiple</given-names>
            <surname>Aggregation Prediction Algorithm</surname>
          </string-name>
          ,
          <source>Version: 2.0</source>
          .1 https://CRAN.R-project.org/package=MAPA
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Skepetari</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spiliotis</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raptis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Assimakopoulos</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <year>2016</year>
          : OMEN:
          <article-title>Promoting Forecasting Support Systems</article-title>
          . 37th International Symposium on Forecasting
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Tashman</surname>
            ,
            <given-names>L. J.</given-names>
          </string-name>
          ,
          <year>2000</year>
          :
          <article-title>Out-of-sample tests of forecasting accuracy: an analysis and review</article-title>
          .
          <source>InternationalJournal of Forecasting</source>
          <volume>16</volume>
          (
          <issue>4</issue>
          ),
          <fpage>437</fpage>
          -
          <lpage>450</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>12. The World Bank Group, Global Development Data</mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Uganda Bureau of Statistics (UBOS),
          <year>2014</year>
          :
          <article-title>NPHC 2014 FINAL RESULTS REPORT</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <article-title>Republic of Zambia, Central Statistical O ce: 2010 CENSUS OF POPULATION</article-title>
          AND HOUSING
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>