=Paper=
{{Paper
|id=Vol-1376/paper05
|storemode=property
|title=On Redundancy in Linked Geospatial Data
|pdfUrl=https://ceur-ws.org/Vol-1376/LDQ2015_paper_05.pdf
|volume=Vol-1376
|dblpUrl=https://dblp.org/rec/conf/esws/SioutisLC15
}}
==On Redundancy in Linked Geospatial Data==
<pdf width="1500px">https://ceur-ws.org/Vol-1376/LDQ2015_paper_05.pdf</pdf>
<pre>
    On Redundancy in Linked Geospatial Data

         Michael Sioutis1 , Sanjiang Li2 , and Jean-François Condotta1
            1
                CRIL CNRS UMR 8188, Université d’Artois, Lens, France
                          {sioutis,condotta}@cril.fr
                 2
                   QCIS, University of Technology, Sydney, Australia
                             sanjiang.li@uts.edu.au


      Abstract. RCC8 is a constraint language that serves for qualitative spa-
      tial representation and reasoning by encoding the topological relations
      between spatial entities. As such, RCC8 has been recently adopted by
      GeoSPARQL in an effort to enrich the Semantic Web with qualitative
      spatial relations. We focus on the redundancy that these data might
      harbor, which can throttle graph related applications, such as storing,
      representing, querying, and reasoning. For a RCC8 network N a con-
      straint is redundant, if removing that constraint from N does not change
      the solution set of N . A prime network of N is a network which contains
      no redundant constraints, but has the same solution set as N . In this
      paper, we present a practical approach for obtaining the prime networks
      of RCC8 networks that originate from the Semantic Web, by exploiting
      the sparse and loosely connected structure of their constraint graphs,
      and, consequently, contribute towards offering Linked Geospatial Data
      of high quality. Experimental evaluation exhibits a vast decrease in the
      total number of non-redundant constraints that we can obtain from an
      initial network, while it also suggests that our approach significantly
      boosts the state-of-the-art approach.


1   Introduction
The Region Connection Calculus (RCC) is the dominant approach in Artificial
Intelligence for representing and reasoning about topological relations [10]. RCC
can be used to describe regions that are non-empty regular subsets of some
topological space by stating their topological relations to each other. RCC8 is the
constraint language formed by the following 8 binary topological base relations
of RCC: disconnected (DC), externally connected (EC), equal (EQ), partially
overlapping (P O), tangential proper part (T P P ), tangential proper part inverse
(T P P i), non-tangential proper part (N T P P ), and non-tangential proper part
inverse (N T P P i). These 8 relations are depicted in [10, Fig. 4].
    RCC8 has been recently adopted by GeoSPARQL [9], and there has been
an ever increasing interest in coupling qualititave spatial reasoning techniques
with Linked Geospatial Data that are constantly being made available [5, 7].
Thus, there is a real need for scalable implementations of constraint network
algorithms for qualitative and quantitative spatial constraints, as RDF stores
supporting Linked Geospatial Data are expected to scale to billions of triples [5,
7]. In this context, literature has mainly focused on the satisfiability problem of
a RCC8 network, which is deciding if there exists a solution of the network. To-
wards efficiently deciding the satisfiability of large real world RCC8 networks that
originate from the Semantic Web, there has already been a fair amount of work
carried out, presenting promising results, as described in [13, 12, 15]. Lately, the
important problem of deriving redundancy in a RCC8 network has been consid-
ered and well-established in [6]. For a RCC8 network N a constraint is redundant,
if removing that constraint from N does not change the solution set of N . A
prime network of N is a network which contains no redundant constraints, but
has the same solution set as N . Finding a prime network can be useful in many
applications, such as computing, storing, and compressing the relationships be-
tween spatial objects and, hence, saving space for storage and communication,
merging networks [1], aiding querying in spatially-enhanced databases [7, 9], and
adjusting geometrical objects to meet topological constraints [17]. Due to space
constraints, we refer the reader to [6] for a well-depicted real motivational ex-
ample and further application possibilities.
    In this paper, we propose a practical approach for obtaining the prime net-
works of RCC8 networks that have been harvested from the Semantic Web. In
particular, we exploit the sparse and loosely connected structure of their con-
straint graphs, by establishing results that allow us to build on the simple decom-
position scheme presented in [15]. The paper is organized as follows: in Section 2
we give some preliminaries concerning RCC8, redundant constraints, and the
notion of a prime network, in Section 3 we present our practical approach for
obtaining the prime networks of RCC8 networks that have been harvested from
the Semantic Web, in Section 4 we experimentally show that we can have a vast
decrease in the total number of non-redundant constraints that we can obtain
from an initial RCC8 network of the considered dataset, with significantly im-
proved performance over the state-of-the-art approach, and, finally, in Section 5
we conclude and make a connection with a relevant late breaking research effort.


2   Preliminaries

A (binary) qualitative constraint language [11] is based on a finite set B of jointly
exhaustive and pairwise disjoint (JEPD) relations defined on a domain D, called
the set of base relations. The base relations of set B of a particular qualitative
constraint language can be used to represent definite knowledge between any
two entities with respect to the given level of granularity. B contains the identity
relation Id, and is closed under the converse operation (−1 ). Indefinite knowledge
can be specified by unions of possible base relations, and is represented by the set
containing them. Hence, 2B represents the total set of relations. 2B is equipped
with the usual set-theoretic operations union and intersection, the converse op-
eration, and the weak composition operation denoted by symbol  [11]. In the
case of RCC8 [10], as noted in Section 1, B is the set {DC,EC,P O,T P P ,N T P P ,
T P P i,N T P P i,EQ}, with EQ being relation Id. RCC8 networks can be viewed
as qualitative constraint networks (QCNs), defined as follows:
                                                       v1                                v1
                                     {EC}                           {EC}
                                                      {DC}                           B
                              v0                 v2          v0                 v2
                                   {N T P P i}                    {N T P P i}

             Fig. 1: A RCC8 network (left) and its prime network (right)

Definition 1. A QCN is a pair N = (V, C) where: V is a non-empty finite set
of variables; C is a mapping that associates a relation C(v, v 0 ) ∈ 2B to each pair
(v, v 0 ) of V × V . C is such that C(v, v) = {Id} and C(v, v 0 ) = (C(v 0 , v))−1 .

    In what follows, given a QCN N = (V, C) and v, v 0 ∈ V , N [v, v 0 ] will denote
the relation C(v, v 0 ). N[v,v0 ]/r , with r ∈ 2B , is the QCN N 0 defined by N 0 [v, v 0 ] =
r, N 0 [v 0 , v] = r−1 , and N 0 [v, v 0 ] = N [v, v 0 ] ∀(v, v 0 ) ∈ (V × V ) \ {(v, v 0 ), (v 0 , v)}. A
QCN N = (V, C) is said to be trivially inconsistent iff ∃v, v 0 ∈ V with N [v, v 0 ] =
∅. A solution of N is a mapping σ defined from V to the domain D, yielding a
valid configuration, such that for every pair (v, v 0 ) of variables in V , (σ(v), σ(v 0 ))
can be described by N [v, v 0 ], i.e., there exists a base relation b ∈ N [v, v 0 ] such
that the relation defined by (σ(v), σ(v 0 )) is b. Two QCNs are equivalent iff they
admit the same set of solutions. The constraint graph of a QCN N = (V, C) is the
graph (V, E), denoted by G(N ), for which we have that (v, v 0 ) ∈ E iff N [v, v 0 ] 6=
B. (N ) denotes the refined -consistent QCN of N , iff ∀v, v 0 , v 00 ∈ V we have
that (N )[v, v 0 ] ⊆ (N )[v, v 00 ]  (N )[v 00 , v 0 ]. A sub-QCN N 0 of N = (V, C), is a
QCN (V, C 0 ) such that N 0 [v, v 0 ] ⊆ N [v, v 0 ] ∀v, v 0 ∈ V where N 0 [v, v 0 ] 6= B. Given
a QCN N = (V, C), N ↓V 0 , with V 0 ⊆ V , is QCN N restricted to V 0 . If b is a base
relation, then {b} is a singleton relation. A subclass of relations is a set A ⊆ 2B
closed under converse, intersection, and weak composition. In what follows, all
the considered subclasses will contain the singleton relations of 2B . Given three
relations r, r0 , and r00 , we say that weak composition distributes over intersection
if we have that r  (r0 ∩ r00 ) = (r ∩ r0 )  (r ∩ r00 ) and (r0 ∩ r00 )  r = (r0 ∩ r)  (r00 ∩ r).

Definition 2. A subclass A ⊆ 2B is a distributive subclass if weak composition
distributes over non-empty intersections for all relations r, r0 , r00 ∈ A. A subclass
A ⊆ 2B is a maximal distributive subclass if there exists no other distributive
subclass B with B ⊃ A.

    Notably, RCC8 has two maximal distributive subclasses, namely, D841 and
D8 [6]. Given a QCN N = (V, C), we say that N entails a constraint r(v, v 0 ) ∈
  64
 B          0                                                                0
2 , with v, v ∈ V , if for every solution σ of N , the relation defined by (σ(v), σ(v ))
is a base relation b such that b ∈ r(v, v 0 ). Relation N [v, v 0 ] is redundant if net-
work N[v,v0 ]/B entails N [v, v 0 ]. Note that by definition every universal relation B
in a QCN is redundant. Recalling the fact that the constraint graph of a QCN
involves all the non-universal relations, we can obtain the following lemma:

Lemma 1. Given a QCN N = (V, C) and its constraint graph G(N ) = (V, E),
a relation N [v, v 0 ], with v, v 0 ∈ V , is redundant if (v, v 0 ) 6∈ E.

We now recall the following definition of a reducible and a prime QCN:
                   v0     v1                 v0        v1              v2


                          v2    v3                     v2    v4        v5
                                                  G1              G2
                   v4     v5    v6           v3        v6    v5        v6

                       G                          G3              G4
        Fig. 2: A graph G (left) with its biconnected components (right)

Definition 3 ([6]). A QCN N = (V, C) is reducible if it comprises a redun-
dant relation other than relation B, and irreducible otherwise. An equivalent
irreducible sub-QCN of N , is called a prime QCN of N . If a prime QCN of N is
also unique, it is denoted by Nprime .
   In Figure 1, a QCN N of RCC8 and its prime QCN are depicted. Relation
{DC} is redundant as it can be entailed by N[v1 ,v2 ]/B and, thus, can be replaced
with relation B (denoting the lack of a constraint between two entities in a QCN).
Property 1 ([6]). Let N = (V, C) be a satisfiable QCN of RCC8. Then, N will be
said to satisfy the uniqueness property iff ∀u, v ∈ V , with u 6= v, we have that
N does not entail a relation r ⊆ N [u, v] where r = {EQ}.
   The uniqueness property specifies that every region in a QCN of RCC8 should
be unique and not identical to any other region. This is a necessary property to
be able to obtain the unique prime network of a QCN [6] and will hold for all
the considered QCNs in what follows. We recall the following important lemma
to be used in the sequel:
Lemma 2 ([6]). Let N = (V, C) be a not trivially inconsistent and -consistent
QCN of RCC8 defined on one of the maximal distributive subclasses D841 , or D864 ,
                                                                 0              0
and having the uniqueness property.T Then, a00 relation00 N0[v, v 00], with v, v ∈ V , is
non-redundant in N iff N [v, v ] 6= {N [v, v ]  N [v , v ] | v ∈ V \ {v, v 0 }}.
                              0

    Finally, we have the following result with respect to the unique prime network
of a QCN N of RCC8, namely, Nprime :
Theorem 1 ([6]). Let N = (V, C) be a satisfiable QCN of RCC8 defined on one
of the maximal distributive subclasses D841 , or D864 , and having the uniqueness
property. Further, let χ be the set of non-redundant relations in (N ). Then,
∀u, v ∈ V we have that Nprime [u, v] = (N [u, v] if (N )[u, v] ∈ χ else B).

3    Towards Efficiently Characterizing Non-Redundant
     Relations in a Network
In this section we present a practical approach for characterizing non-redundant
relations in QCNs that have been harvested from the Semantic Web. In partic-
ular, we exploit the sparse and loosely connected structure of their constraint
graphs, by establishing results that allow building on the simple decomposition
scheme of [15]. We recall the following definition regarding biconnected graphs:
  Algorithm 1: Delphys+(N )
   in     : A satisfiable QCN N = (V, C) of RCC8 defined on D841 or D864 .
   output : χ, the set of non-redundant relations in (N ).
 1 begin
 2    χ ← ∅;
 3    foreach n ∈ Decomposer(N ) [15] do
 4       χ ← χ ∪ Delphys(n) [6];
 5      return χ;

Definition 4 ([2]). A connected graph G = (V, E) is said to have an articula-
tion vertex u if there exist vertices v and v 0 such that all paths connecting v and
v 0 pass through u. A graph that has an articulation vertex is called separable,
and one that has none is called biconnected. A maximal biconnected subgraph is
called a biconnected component.

    Intuitively, an articulation vertex is any vertex whose removal increases the
number of connected components in a given graph. Figure 2 depicts a graph G,
along with its biconnected components. Vertices in grey are the articulation ver-
tices of G. The biconnected components of a graph G = (V, E) can be obtained
in O(|E|) time [2]. We recall the following result from [15]:

Proposition 1 ([15]). Let N be a QCN of RCC8, and {G1 , . . . , Gk } the bicon-
nected components of its constraint graph G(N ). Then, N is satisfiable iff Ni is
satisfiable for every i ∈ {1, . . . , k}, where Ni is N ↓V (Gi ) .

Then, by Proposition 1 and Lemma 1 we can obtain the following result:

Proposition 2. Let N be a satisfiable QCN of RCC8, and {G1 , . . . , Gk } the
biconnected components of its constraint graph G(N ). Then, a relation N [v, v 0 ],
with v, v 0 ∈ V , is non-redundant in N iff (v, v 0 ) ∈ E(Gi ) and N [v, v 0 ] is non-
redundant in Ni , where Ni is N ↓V (Gi ) , for some i ∈ {1, . . . , k}.

Proof. By Lemma 1 we know that a relation N [v, v 0 ] is redundant if (v, v 0 ) 6∈
E(Gi ). Let us consider a relation N [v, v 0 ] where (v, v 0 ) ∈ E(Gi ). Let N 0 =
N[v,v0 ]/B and Ni0 the restriction of N 0 to V (Gi ). Then, by Proposition 1 we have
that a mapping σ is a solution of N 0 iff σ is a solution of Ni0 . On the other
hand, since Gi is a biconnected component, any solution of Ni0 is the restriction
of some solution of N 0 to V (Gi ). Thus, N 0 entails N [v, v 0 ] iff Ni0 entails N [v, v 0 ],
and, consequently, N [v, v 0 ] is redundant in N iff N [v, v 0 ] is redundant in Ni . t    u


    An algorithm based on Lemma 2 to obtain the set of non-redundant relations
                                                       3
was provided in [6] with a time complexity of O(|V | ) for a given QCN N = (V, C)
of RCC8, which we here call Delphys. Proposition 2 allows us to establish a time
complexity of O( |Vc | · c3 ) = O(|V | · c2 ), where c ≤ |V | is the maximum order
among the biconnected components of constraint graph G(N ), as it suggests
that we can consider the smaller biconnected components of G(N ) instead of
            Table 1: Biconnected components of real RCC8 networks
               # of
  network                  max order    median order     min order     avg. order
            components
    nuts       1624            52             2              2               2
    adm1         27          11 665           2              2             437
   gadm1        712          19 864           2              2              61
   gadm2      113 097         2 371           2              2               3
    adm2       2 893         22 808          579             2             600

the entire constraint graph when characterizing non-redundant relations in N .
This new approach, which we call Delphys+, is shown in Algorithm 1.3
    It remains to be seen if there is any significant difference between the order of
the constraint graph of a QCN and the maximum order among the biconnected
components of that graph, that is, if the value of the latter is significantly smaller
than the value of the former, so that Delphys+ can be considered as an advance-
ment over Delphys. We will see that this is indeed the case for the considered
QCNs of RCC8 that have been harvested from the Semantic Web (originally
appeared in [8]), which we introduce as follows.
  – nuts: a RCC8 network that describes a nomenclature of territorial units and
     contains 2 235/3 176 nodes/edges.4
  – adm1: a RCC8 network that describes the administrative geography of Great
     Britain [4] and contains 11 761/44 832 nodes/edges.
  – gadm1: a RCC8 network that describes the German administrative units and
     contains 42 749/159 600 nodes/edges.4
  – gadm2: a RCC8 network that describes the world’s administrative areas and
     contains 276 727/589 573 nodes/edges (http://gadm.geovocab.org/).
  – adm2: a RCC8 network that describes the Greek administrative geography
     and contains 1 732 999/5 236 270 nodes/edges.4
    The aforementioned QCNs are satisfiable5 , comprise relations that are prop-
erly contained in any of the two maximal distributive subclasses D841 and D864
for RCC8, and originate from the Semantic Web, also called the Web of Data,
which is argued to be scale-free [16]. Graphs of scale-free structure are rela-
                                                           |E|
tively sparse [3], as it can be also observed by the |V      | ratio of the constraint
graphs of our real world QCNs, thus, we expect these constraint graphs to be
loosely connected and yield a high number of biconnected components. We can
view information regarding biconnected components of the constraint graphs of
our QCNs in Table 1. The findings are quite impressive, in the sense that the
maximum order among the biconnected components of a constraint graph is
significantly smaller than the order of that graph. For example, the constraint
graph of the biggest real RCC8 network, namely, adm2, has an order of value
   3
      Check |V (g)| > 2 within algorithm Decomposer as it appears in [15] must be
removed for appropriate use of Decomposer in Delphys+.
    4
      Retrieved from: http://www.linkedopendata.gr/
    5
      As obtaining the prime network of a QCN requires that the QCN is satisfiable
(see Theorem 1), we fixed some inconsistencies with gadm1 and gadm2 that were orig-
inally unsatisfiable. Also, identical regions were properly amalgamated to satisfy the
uniqueness property.
1 733 000, but the maximum order among its biconnected components is only of
value 22 808. Note also that, as the other metrics suggest, the largest propor-
tion of the biconnected components of a graph have an order much closer to the
minimum order than the maximum order among the components of that graph.


4   Experimental Evaluation
In this section, we compare the performance of Delphys+ with that of Delphys [6]
using the dataset presented in Section 3. Experimentation was carried out on a
PC with an Intel Core 2 Quad Q9400 processor, 8 GB RAM, and the Precise
Pangolin x86 64 OS. Both Delphys and Delphys+ were written in Python and
run with with PyPy 2.4.0 (http://pypy.org/). Only one CPU core was used.

                Table 2: Performance comparison on CPU time
                                                          speedup
                 network    Delphys        Delphys+
                                                            (%)
                  nuts        45.98s          0.26s        99.4%
                  adm1      30 917.23s     29 489.02s       4.6%
                  gadm1         ∞          151 295.62s     ∼ 100%
                  gadm2         ∞            12.05s        ∼ 100%
                  adm2          ∞               ∞             ?

    The results on the performance of Delphys+ and Delphys are shown in Table 2.
Note that symbol ∞ signifies that a reasoner hit the memory limit. The speedup
for Delphys+ reaches as high as nearly 100% for the cases where Delphys was
actually able to fully reason with the networks (e.g., nuts). Regarding adm1 the
speedup was limited and expected as the maximum order among the biconnected
components of the constraint graph of adm1 is very close to the order of the
entire graph itself (see Table 1). We also note that despite the overall much
better performance of Delphys+, it was unable to fully reason with adm2. (We
will refer to a late breaking research effort regarding this issue in the closing
section.)

             Table 3: Effect on obtaining non-redundant relations
                                               non-
                           initial # of                   decrease
                 network                  redundant #
                            relations                       (%)
                                           of relations
                  nuts        3 176            2 249       29.19%
                  adm1        44 832           44 601      0.52%
                  gadm1      159 600          158 440      0.73%
                  gadm2      589 573          292 331      50.42%
                  adm2      5 236 270             ?           ?

   Regarding redundancy, Table 3 shows the decrease that we can achieve with
respect to the total number of non-redundant constraints that we can obtain from
an initial network, which allows one to construct sparse constraint graphs that
can boost various graph related tasks, such as storing, representing, querying,
and reasoning. Notably, for the biggest network that Delphys+ was able to fully
reason with, namely, gadm2, the decrease is more than 50%, yielding a number of
non-redundant constraints which is almost linear to the number of its vertices,
confirming a similar observation in [6].
5    Conclusion
We focused on the redundancy that is harbored in RCC8 networks that origi-
nate from the Semantic Web, and proposed a practical approach for sanitizing
such networks of any redundancy, by obtaining the set of their non-redundant
constraints and, consequently, offering Linked Geospatial Data of high quality.
Experimental evaluation exhibited a vast decrease in the total number of non-
redundant constraints that we can obtain from an initial network, with signifi-
cantly improved performance over the state-of-the-art approach. A late breaking
research effort, presented in [14], builds on our approach and uses a particular
partial consistency to significantly boost its performance. Notably, it is able to
tackle even the largest of networks considered in our evaluation in this paper.

References
 1. Condotta, J., Kaci, S., Marquis, P., Schwind, N.: Merging Qualitative Constraint
    Networks in a Piecewise Fashion. In: ICTAI (2009)
 2. Dechter, R.: Constraint processing. Elsevier Morgan Kaufmann (2003)
 3. Del Genio, C.I., Gross, T., Bassler, K.E.: All Scale-Free Networks Are Sparse. Phys.
    Rev. Lett. 107, 178701 (2011)
 4. Goodwin, J., Dolbear, C., Hart, G.: Geographical Linked Data: The Administrative
    Geography of Great Britain on the Semantic Web. TGIS 12, 19–30 (2008)
 5. Koubarakis, M., et al.: Challenges for Qualitative Spatial Reasoning in Linked
    Geospatial Data. In: BASR@IJCAI (2011)
 6. Li, S., Long, Z., Liu, W., Duckham, M., Both, A.: On redundant topological con-
    straints. AIJ 225, 51–76 (2015), in press
 7. Nikolaou, C., Koubarakis, M.: Querying Incomplete Geospatial Information in
    RDF. In: SSTD (2013)
 8. Nikolaou, C., Koubarakis, M.: Fast Consistency Checking of Very Large Real-World
    RCC-8 Constraint Networks Using Graph Partitioning. In: AAAI (2014)
 9. Open Geospatial Consortium: OGC GeoSPARQL - A geographic query language
    for RDF data. OGC R Standard (2012)
10. Randell, D.A., Cui, Z., Cohn, A.: A Spatial Logic Based on Regions and Connec-
    tion. In: KR (1992)
11. Renz, J., Ligozat, G.: Weak Composition for Qualitative Spatial and Temporal
    Reasoning. In: CP (2005)
12. Sioutis, M.: Triangulation versus Graph Partitioning for Tackling Large Real World
    Qualitative Spatial Networks. In: ICTAI (2014)
13. Sioutis, M., Condotta, J.F.: Tackling large Qualitative Spatial Networks of scale-
    free-like structure. In: SETN (2014)
14. Sioutis, M., Li, S., Condotta, J.F.: Efficiently Characterizing Non-Redundant Con-
    straints in Large Real World Qualitative Spatial Networks. In: IJCAI (2015), to
    appear
15. Sioutis, M., Salhi, Y., Condotta, J.: A Simple Decomposition Scheme For Large
    Real World Qualitative Constraint Networks. In: FLAIRS (2015), to appear
16. Steyvers, M., Tenenbaum, J.B.: The Large-Scale Structure of Semantic Networks:
    Statistical Analyses and a Model of Semantic Growth. Cog. Sci. 29, 41–78 (2005)
17. Wallgrün, J.O.: Exploiting qualitative spatial reasoning for topological adjustment
    of spatial data. In: SIGSPATIAL (2012)

</pre>