<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hipi, as alternative for satellite images processing</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Juber Serrano Cervantes UNSA / Arequipa</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Pablo Yanyachi UNSA / Arequipa</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Wilder Nina Choquehuayta UNSA / Arequipa</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Yessenia Yari UNSA / Arequipa</institution>
        </aff>
      </contrib-group>
      <fpage>136</fpage>
      <lpage>137</lpage>
      <abstract>
        <p>These days, in different fields of both industry and academia, large amounts of data is generated. The use of several frameworks with different techniques is essential , for processing and extraction of data. In the remote sensing field, large volumes of data are generated (satellite images) over short periods of time. Information systems for processing these kind of images were not designed with scalable features. In this paper, we present an extension of the HIPI framework (Hadoop Image Processing Interface) for processing satellite image formats.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        KeyWords: Big Data, Remote Sensing,
Hadoop, HIPI, MapReduce, Satellite Images
The field of remote sensing is helpful in different
areas of both industry and academia, because it
uses images of the earth’s surface that are acquired
from different sources like antennas and satellites,
which provide increasingly better image
resolution as technology advances. Nowadays, several
open source frameworks are available to process
big data, such as Hadoop
        <xref ref-type="bibr" rid="ref4">(Shvachko et al., 2010)</xref>
        ,
H2O (0xdata:H2O, 2015), Spark
        <xref ref-type="bibr" rid="ref2">(Apache:Spark,
2015)</xref>
        , etc., which are used for distributed and
parallel processing of large volumes of data. HIPI
(Hadoop Image Processing Interface) is an
image processing library designed to be used with
the Apache Hadoop MapReduce parallel
programming framework. HIPI facilitates efficient and
high-throughput image processing with
MapReduce style parallel programs typically executed on
a cluster. In the present work the HIPI library was
modified, giving additional functionalities to read
and process the GeoTIFF format (format provided
by USGS -United States Geological Survey- for
Landsat satellite images).
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>State of the Art</title>
      <p>
        There are different techniques used for image
classification in semantic taxonomy categories such as
vegetation, water, etc.
        <xref ref-type="bibr" rid="ref3">(Codella et al., 2011)</xref>
        ,
however these methods don’t consider scalability as
part of its solutions, Noel C. F. Codella et. al.. In
Wanfeng Zhang, et. al.
        <xref ref-type="bibr" rid="ref6">(Zhang et al., 2013)</xref>
        An
infrastructure for massive processing of satellite
images in a multi-dataCenter environment,
consisting of a DataCenter, where Access Security,
Information Service and strategy Scheduling for
data management us introduced. It’s important to
consider HIPI
        <xref ref-type="bibr" rid="ref5">(Sweeney et al., 2011)</xref>
        as a state of
the art, extensible library for image processing and
computer vision applications, which helps to avoid
the problem of small files and achieving
improvements in memory and response time.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Proposal</title>
      <p>That is why this paper is a modification of HIPI, to
extends its functionality to work with TIFF images
or GeoTIFF type. To achieve this, we proceeded
as follows:
• It was decided to use the Tiff format from
satellite images obtained from the USGS
since this format unlike others has no
compression or data loss (Adobe, 1992).
• JAI API was chosen to read and write the
chosen format, JAI has more codecs and
features available that can be useful for reading
multiple formats.
• Classes needed to upload, encode and decode
images of Tiff type were modified.</p>
      <p>Based on the tests, the possibility of using HIPI for
processing multispectral and hyperspectral images
was analyzed. For such images, operations such as
PCA are important to process all spectral bands, it
is proposed to keep them in the same zip file, or
as different images belonging to a tiff format, and
then decode and interpret as a conventional
multiband image.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>The experiments were performed on a Local
Heterogeneous Cluster depicted in Table 1 where the
characteristics of slaves and master are shown. We
used satellite images from LandSat 7, only
considering the first 4 bands so we compressed a satellite
image in .zip then we used the .zip up to 0.5GB,
1GB, 5GB and 10GB. The algorithm was tested
about the average of channels which is explained
in the official website of HIPI. In each task map,
we iterated over each read of band of satellite
image as FloatImage, added each value of pixel
depending of channel then divided for number of
pixels (width x height) and returned the key of the
satellite image and array of data calculated. In
each reduce task, we only calculated the average
of the average of channels from each satellite
image.</p>
      <p>Node
master
slave 1
slave 2
slave 3
characteristics
Core i7, RAM 8GB, Disk 100GB,
S.O Ubuntu 64 bits
Core i7, RAM 8GB, Disk 100GB,
S.O Ubuntu 64 bits
Core 2 Duo, RAM 4GB, Disk
100GB, S.O Ubuntu 64 bits
Core 2 Duo, RAM 4GB, Disk
100GB, S.O Ubuntu 64 bits</p>
      <p>In Table 2 shows in axis x the amount of data in
GBs and in y axis the execution time. The Hadoop
configuration was 1 replication of data, chunks of
32MB, 64MB and 128MB, 4096MB in memory
for task reduce and map. The java virtual
machine for task reduce and map was configured with
4096MB at the most.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>Based on the review conducted and theoretical
experimental tests HIPI modified version of the
article concludes as follows: It is possible to perform
various image processing operations, such as
fil</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          0xdata:
          <fpage>H2O</fpage>
          .
          <year>2015</year>
          .
          <article-title>H2o@ONLINE</article-title>
          . Website. 0xdata:
          <fpage>H2O</fpage>
          , In: http://0xdata.com/ product/,accessed(may
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Apache</surname>
          </string-name>
          :Spark.
          <year>2015</year>
          .
          <article-title>H2o@ONLINE. Website. Spark, Lightning-fast cluster computing</article-title>
          , In : https://spark.apache.org, accessed
          <article-title>(accessed(</article-title>
          <year>2015</year>
          -05-20)).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Noel C.F. Codella</surname>
            , Gang Hua, Apostol Natsev,
            <given-names>and John R Smith.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Towards large scale land-cover recognition of satellite images</article-title>
          .
          <source>In Information, Communications and Signal Processing (ICICS)</source>
          <year>2011</year>
          , 8th International Conference on, pages
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Shvachko</surname>
          </string-name>
          , Hairong Kuang, Sanjay Radia, and
          <string-name>
            <given-names>Robert</given-names>
            <surname>Chansler</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>The hadoop distributed file system</article-title>
          .
          <source>In Mass Storage Systems and Technologies (MSST)</source>
          ,
          <source>2010 IEEE 26th Symposium on</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Chris</given-names>
            <surname>Sweeney</surname>
          </string-name>
          , Liu Liu, Sean Arietta, and
          <string-name>
            <given-names>Jason</given-names>
            <surname>Lawrence</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Hipi: a hadoop image processing interface for image based mapreduce tasks</article-title>
          . Chris,University of Virginia.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Wanfeng</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Lizhe Wang, Dingsheng Liu, Weijing Song, Yan Ma, Peng Liu, and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Towards building a multi-datacenter infrastructure for massive remote sensing image processing</article-title>
          .
          <source>Concurrency and Computation: Practice and Experience</source>
          ,
          <volume>25</volume>
          (
          <issue>12</issue>
          ):
          <fpage>1798</fpage>
          -
          <lpage>1812</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>