<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ANIMITEX project: Image Analysis based on Textual Information</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hugo Alatrista-Salas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Kergosien</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mathieu Roche</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maguelonne Teisseire TETIS (Irstea</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cirad</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>AgroParisTech)</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Montpellier</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>LIRMM (CNRS, Univ. Montpellier 2)</institution>
          ,
          <addr-line>Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <fpage>49</fpage>
      <lpage>52</lpage>
      <abstract>
        <p>With the amount of textual data available on the web, new methodologies of knowledge extraction domain are provided. Some original methods allow the users to combine different types of data in order to extract relevant information. In this context, this paper draws the main objectives of the ANIMITEX project which combines spatial and textual data. The data preprocessing step is detailed.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge extraction</kwd>
        <kwd>Text mining</kwd>
        <kwd>Satellite images</kwd>
        <kwd>Spatial feature identification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Aims of the ANIMITEX project</title>
      <p>A lot of high resolution satellite data are now
available. This raises the issue of fast and effective
satellite image analysis as it still requires a costly
human implication. In this context, remote sensing
approaches enable to tackle this challenge. The
exploratory and ambitious ANIMITEX project1 aims
at processing massive and heterogeneous textual
data (i.e. big data context) in order to provide
crucial information to enrich the analysis of satellite
images.</p>
      <p>The large amount of data are associated to a
temporal repetitivity that increases. For instance today
around ten images are available per year (e.g. SPOT,
Landsat), and in 3 years, one image every 5 days
(based on Sentinel-2 satellites) will be available.</p>
      <p>1http://www.lirmm.fr/⇠ mroche/ANIMITEX/ (web site in
French)</p>
      <p>
        The ANIMITEX project has many application
areas such as image annotation (Forestier et al. 2010).
For instance, identifying the precise type of culture
or the function of a building is not always
possible with the only use of images. Nevertheless,
textual data could contain this kind of information and
give additional meaning to the images. The
development of approaches based on image/text matching
becomes crucial in order to complete image
analysis tasks
        <xref ref-type="bibr" rid="ref1">(Alatrista-Salas and Be´chet 2014)</xref>
        . It also
enables a better classification of data.
      </p>
      <p>
        Moreover, image-text matching will enrich
Information Retrieval (IR) methods and it will provide
users a more global context of data
        <xref ref-type="bibr" rid="ref9">(Sallaberry et al.
2008)</xref>
        . This can be crucial for the decision maker in
the context of land-use planning projects that have
to take into account opinions of experts related to a
territory (managers, scientists, associations,
specialized companies, and so forth).
      </p>
      <p>In the context of the ANIMITEX project, we plan
to investigate two specific scenarios: (i) The
construction of a road on the north of Villeveyrac (city
close to Montpellier, France), (ii) A port activity
area, called Hinterland, in Thau area (near to Se`te,
France). The aim of this case studies is to enrich
images with information present in documents, e.g.
the opinions extracted in newspapers about land-use
planning.</p>
      <p>The section 2 describes the proposed data
preprocessing step. The section 3 details the partners
involved in the project.</p>
    </sec>
    <sec id="sec-2">
      <title>Data preprocessing process</title>
      <p>
        The current work focuses on adapting of Natural
Language Processing (NLP) techniques for
recognition of Spatial Features (SF) and thematic/temporal
information
        <xref ref-type="bibr" rid="ref5 ref8">(Gaio et al. 2012; Maurel et al. 2011)</xref>
        .
In the proposed approach, SF appearing in a text,
are composed of at least one Named Entity (NE) and
one or more spatial indicators specifying its location
        <xref ref-type="bibr" rid="ref7">(Lesbegueries et al. 2006)</xref>
        . For this, a set of articles
(i.e. 12000 documents) concerning Thau region
between the years 2010 and 2013 has been acquired. A
second part of the data set is composed of raster files
(image mosaics Pleiades - spatial resolution 2x2 m
- 4 spectral bands) covering all regions of the Thau
lagoon (See Figure 1). Satellite images are available
via the GEOSUD Equipex2.
      </p>
      <p>A detailed classification of the land occupation is
currently in progress. It will lead to a digital
vector layer where each SF (represented by a polygon)
belongs to a class of specific land use. The
nomenclature of this classification is organized into four
hierarchical levels (See Figure 2). Moreover we
investigate multi-scale information associated with
different levels of classification of satellite images.</p>
      <p>
        From this corpus, NLP methods have been
applied in order to identify linguistic features
concerning spatial, thematic, and temporal information
in the documents. The combined use of lexicons
and dedicated rules
        <xref ref-type="bibr" rid="ref5">(Gaio et al. 2012)</xref>
        allows us to
identify the absolute (e.g., Montpellier) and relative
(e.g., south of Montpellier) Spatial Features (ASF
and RSF)
        <xref ref-type="bibr" rid="ref6 ref7">(Lesbegueries et al. 2006; Kergosien et al.
2014)</xref>
        . A first framework based on sequential pattern
mining
        <xref ref-type="bibr" rid="ref3">(Cellier et al. 2010)</xref>
        has been proposed to
discover relationships between SF
        <xref ref-type="bibr" rid="ref1">(Alatrista-Salas
and Be´chet 2014)</xref>
        . To this end, a two-step process
has been defined (See Figure 3).
      </p>
      <p>SF validation: for each identified ASF, we check
on external resources if there is a corresponding
spatial representation. In particular, we have used layers
provided by the IGN3 (municipalities, roads,
railways, buildings, etc.). In addition, if an ASF does
not present on IGN ressources, we use gazetteers
(Geonames and Open Street Maps) to complet the
information. Concerning the representation of RSF,
we use spatial indicators of topological order
associates to ASF.</p>
      <p>
        Following the scopes proposed in
        <xref ref-type="bibr" rid="ref9">(Sallaberry et
al. 2008)</xref>
        , the spatial indicators of topological order
have been grouped in five categories:
• Proximity: different indicators can be used
in relationship of proximity, such as: near,
around, beside, close to, periphery, etc..
• Distance: the indicators used in this
relationship are of the form: x km, x miles, etc.. Two
representations are then proposed in our
approach: 1) calcul of distance from the centroid
of the ASF and construction of a circular buffer
of size x from the centroid; 2) regarding the
shape of the ASF and building a buffer of size
x from the edge of the processed ASF .
• Inclusion: this binary operation allow us to
check if an ASF is inside another taking into
account indicators such as: center, in the heart,
in, inside, etc.
• Orientation: This unary relationship has been
broadly studied in the literature. Different
approaches have been proposed to identify a
cardinal points of an ASF. We have chosen to use
the conical model proposed in (Frank 1991).
      </p>
      <p>For this, we use the centroid of ASF and we
3Institut National de l’information Gographique et forestire,
i.e. National Institute of Geography
build a buffer around. The size of this buffer
will be calculated taking into account the
surface of the studied ASF. Then we decompose
the buffer into four equal areas (forming a ”X”)
from the centroid. Each intersection between
the buffer and cones thus formed represent the
four cardinal points.
• Geometry: geometry relations are built from at
least two ASF. These relationships are, for
example, the union, the adjacency, the difference
or a position of an ASF with respect to other
ASF, for example, C between A and B (where
A,B and C are ASF), etc.</p>
    </sec>
    <sec id="sec-3">
      <title>Representation of the spatial footprint: after</title>
      <p>the extraction step and spatial representation of the
ASF and RSF, the spatial footprint associated with
the treated document can be mapped. In this
process, two main problems have been identified. The
first one is the persistent ambiguity of some NE
contained in SF because of some NE (e.g. Montagnac)
corresponding to several places. To address this
issue, a configurable spatial filter based on predefined
scenarios has been developed. For example, to
identify events related to a specific land-use planning
project occurred in a part of the area of the Thau
lagoon, only the SF contained in this area will be
explored. The second issue is related to the use of
external resources and the identification of the
spatial representation appropriate to each ASF. Taking
into account the spatial indicator (e.g. town, road,
etc.) preceding by the toponymic name is a first
answer because it allows us to specify the type of the
SF and thus take into account the appropriate spatial
representation.</p>
      <p>
        Thematic information is identified by semantic
resources (i.e. AGROVOC thesaurus, nomenclature
resulting of image classifications ...)
        <xref ref-type="bibr" rid="ref2">(Buscaldi et al.
2013)</xref>
        .
      </p>
      <p>These linguistic features allow us to identify
specific phenomena in documents (e.g., land-use
planning, environmental change, natural disasters, etc.).
The main idea is to link the phenomena identified
in images with subjects found in documents during
the same period. Overall, the ANIMITEX project
allows the users to integrate different information
sources, i.e. both types of expressions (texts vs.
images). The main objective is to enrich the
information conveyed by a text with images and vice versa.
3</p>
    </sec>
    <sec id="sec-4">
      <title>Consortium of the project</title>
      <p>The multidisciplinary consortium of the project
involves three research domains: Computer Science,
Geography and Remote Sensing. More precisely,
the expertise in remote sensing and complex
mining and heterogeneous spatio-temporal data, is one
of the foundations of the project.</p>
      <p>TETIS (Territories, Environment, Remote
Sensing and Spatial Information, Montpellier) aims
to produce and disseminate knowledge, concepts,
methods, and tools to characterize and understand
the dynamics of rural areas and territories, and
control spatial information on these systems. LIRMM
(Informatics, Robotics and Microelectronics,
Montpellier) focuses on knowledge extraction. ICube
(Strasbourg) is specialized in image analysis and
complex data mining. ICube collaborates with
geographers from LIVE laboratory (Image
Laboratory, City, and Environment) and specialists in NLP
(LiLPa lab – Linguistics, language, speech). These
two locations (Montpellier and Strasbourg)
constitute a cluster of local skills related to all major
aspects of the project. LIUPPA (Pau) includes
researchers specializing in Information Extraction (IE)
and Information Retrieval (IR). The main work of
this partner is about extraction and managment of
geographical information. GREYC (Caen) brings
researchers in data mining (e.g. mining sequences in
order to discover relationships between spatial
entities) and NLP. For this aspect, a collaborations with
two other labs is developed (LIPN and IRISA).</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The authors thank Midi Libre (French newspaper)
for its expertise on the corpus and all the partners
of ANIMITEX project for their involvement. This
work is partially funded by Mastodons CNRS grant
and GEOSUD Equipex.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Alatrista Salas</surname>
            <given-names>H.</given-names>
          </string-name>
          , Be´chet N.
          <article-title>Fouille de textes : une approche se´quentielle pour de´couvrir des relations spatiales</article-title>
          .
          <source>In Cergeo Workshop - EGC</source>
          ,
          <year>2014</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Buscaldi D.</given-names>
            ,
            <surname>Bessagnet</surname>
          </string-name>
          <string-name>
            <given-names>M.N.</given-names>
            ,
            <surname>Royer</surname>
          </string-name>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Sallaberry</surname>
          </string-name>
          <string-name>
            <surname>C</surname>
          </string-name>
          .
          <article-title>Using the Semantics of Texts for Information Retrieval: A Concept and Domain Relation-Based Approach</article-title>
          .
          <source>Proceedings of ADBIS (2) - Advances in Databases and Information Systems</source>
          , pp.
          <fpage>257</fpage>
          -
          <lpage>266</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Cellier P.</given-names>
            ,
            <surname>Charnois</surname>
          </string-name>
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Plantevit</surname>
          </string-name>
          <string-name>
            <surname>M.</surname>
          </string-name>
          , Cre´milleux B.
          <article-title>Recursive Sequence Mining to Discover Named Entity Relations Symposium on Intelligent Data Analysis</article-title>
          ,
          <source>LNCS</source>
          , pp.
          <fpage>30</fpage>
          -
          <lpage>41</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Forestier G.</given-names>
            ,
            <surname>Puissant</surname>
          </string-name>
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Wemmert</surname>
          </string-name>
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Ganc</surname>
          </string-name>
          <article-title>¸arski, Knowledge-based Region Labeling for Remote Sensing Image Interpretation Computers</article-title>
          ,
          <source>Environment and Urban Systems</source>
          , Vol.
          <volume>36</volume>
          (
          <issue>5</issue>
          ), pp.
          <volume>470</volume>
          ?
          <issue>480</issue>
          , 2012 Frank A. U.
          <article-title>Qualitative spatial reasoning with cardinal directions</article-title>
          .
          <source>In Seventh Austrian Conference on Artificial Intelligence</source>
          , volume
          <volume>287</volume>
          <source>of InformatikFachberichte</source>
          , pages
          <fpage>157</fpage>
          -
          <lpage>167</lpage>
          . Springer, Berlin Heidelberg,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Gaio</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sallaberry</surname>
            <given-names>C.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Nguyen</surname>
            <given-names>V.T.</given-names>
          </string-name>
          <article-title>Typage de noms toponymiques a` des fins d'indexation gee´ographique</article-title>
          . TAL,
          <volume>53</volume>
          (
          <issue>2</issue>
          ):
          <fpage>143</fpage>
          -
          <lpage>176</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Kergosien E.</given-names>
            ,
            <surname>Laval</surname>
          </string-name>
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Roche</surname>
          </string-name>
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Teisseire</surname>
          </string-name>
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Are opinions expressed in land-use planning documents?</article-title>
          <source>International Journal of Geographical Information Science</source>
          , Vol.
          <volume>28</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>739</fpage>
          -
          <lpage>762</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Lesbegueries J.</given-names>
            ,
            <surname>Gaio</surname>
          </string-name>
          <string-name>
            <surname>M.</surname>
          </string-name>
          , and Loustau P.
          <article-title>Geographical information access for non-structured data</article-title>
          .
          <source>In Proceedings of the 2006 ACM Symposium on Applied Computing, SAC '06</source>
          , pages
          <fpage>83</fpage>
          -
          <lpage>89</lpage>
          , New York, NY, USA,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Maurel D.</given-names>
            ,
            <surname>Friburger</surname>
          </string-name>
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Antoine J</surname>
          </string-name>
          .-Y.,
          <string-name>
            <surname>Eshkol-Taravella</surname>
            <given-names>I.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Nouvel</surname>
            <given-names>D.</given-names>
          </string-name>
          <article-title>Casen: a transducer cascade to recognize french named entities</article-title>
          .
          <source>TAL</source>
          ,
          <volume>52</volume>
          (
          <issue>1</issue>
          ):
          <fpage>69</fpage>
          -
          <lpage>96</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Sallaberry C.</given-names>
            ,
            <surname>Gaio</surname>
          </string-name>
          <string-name>
            <surname>M.</surname>
          </string-name>
          , and Lesbegueries J.
          <article-title>Fuzzying gis topological functions for gir needs</article-title>
          . In Jones C. B. and
          <string-name>
            <surname>Purves R</surname>
          </string-name>
          ., editors,
          <source>5th ACM Workshop On Geographic Information Retrieval</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>