RADON2: A buffered-Intersection Matrix Computing
    Approach To Accelerate Link Discovery Over Geo-Spatial
                   RDF Knowledge Bases
                                        OAEI2018 Results

      Abdullah Fathi Ahmed1 Mohamed Ahmed Sherif1,2 and Axel-Cyrille Ngonga Ngomo1,2
        1
            Paderborn University, Data Science Group, Pohlweg 51, D-33098 Paderborn, Germany
                                  E-mail: {firstname.lastname}@upb.de
            2
              Department of Computer Science, University of Leipzig, 04109 Leipzig, Germany
                              E-mail: {lastname}@informatik.uni-leipzig.de


       Abstract. Geospatial data is at the essence of the Semantic Web, where a knowledge base
       such as LinkedGeoData consists of more than 30 billions facts. Reasoning on these considerable
       amounts of geospatial data lacks efficient methods for the computation of links between the
       resources contained in these knowledge bases. In this paper, we present the participation of
       the extension of Radon algorithm (dubbed Radon2) in the OAEI 2018 campaign. The OAEI
       results show that Radon2 outperforms the other state of the art in most of the cases.


1     Presentation of the System

we present the extension of Radon algorithm [8, 6] (dubbed Radon2), where we, compute all
topological relations of DE9-IM in order to accelerate the topological relation discovery among
geospatial resources.


1.1   State, Purpose and General Statement

In the following, we start by formally defining the general link discovery problem. Thereafter, we
formally define the link discovery of topological relations problem, which we takeld by Radon2.

Link Discovery. Let K be a finite RDF knowledge base. K can be regarded as a set of triples
(s, p, o) ∈ (R ∪ B) × P × (R ∪ L ∪ B), where R is the set of all resources, B is the set of all blank
nodes, P the set of all predicates and L the set of all literals. The Link Discovery (LD) problem
can be expressed as follows: Given two sets of resources S and T (for example hotels and water
bodies) and a relation r (e.g., :touches), find all pairs (s, t) ∈ S × T such that r(s, t) holds. The
result is produced as a set of links called a mapping: MS ,T = {(si , r, tj )|si ∈ S, tj ∈ T }. Optionally,
a similarity score (sim ∈ [0, 1]) calculated by an LD tool can be added to the entries of mappings
to express assurance of a computed link. Finding solutions for the LD problem is challenging due
to the typically the large volume of current datasets as well as its semantic heterogeneity. The main
purpose of LD approaches is to meet the main requirements of (1) high effectiveness (i.e maximize
a fitness function such as F-measure) and (2) high efficiency (i.e., minimize runtime).
Link Discovery of Topological Relations. The Dimensionally Extended nine-Intersection Model (DE-
9IM) [3] is a topological model and a standard used to describe the spatial relations of two geometries
in two-dimensional space. Since the spatial relations expressed by DE-9IM are topological, they are
invariant to rotation, translation and scaling transformations [4]. The DE-9IM model is based on a
3 × 3 intersection matrix with the form:
                                                                                            
                              dim(I(g1 ) ∩ I(g2 )) dim(I(g1 ) ∩ B(g2 )) dim(I(g1 ) ∩ E(g2 ))
         DE9IM (g1 , g2 ) =  dim(B(g1 ) ∩ I(g2 )) dim(B(g1 ) ∩ B(g2 )) dim(B(g1 ) ∩ E(g2 ))      (1)
                              dim(E(g1 ) ∩ I(g2 )) dim(E(g1 ) ∩ B(g2 )) dim(E(g1 ) ∩ E(g2 ))
   where dim is the maximum number of dimensions of the intersection ∩ of the interior(I),
boundary(B), or exterior(E) of the two geometries g1 and g2 . The domain of dim is {−1, 0, 1, 2},
where −1 indicates no intersection, 0 stands for an intersection that results in a set of one or more
points, 1 indicates an intersection made up of lines and 2 stands for an intersection that results in
an area. A simplified binary version of dim(x) with the binary domain {true, f alse} is obtained
using the Boolean function β(dim(I(g)) = f alse iff dim(I(g)) = −1 and true otherwise. There is
only a subset of the topological relations obtainable through DE-9IM that reflects the semantics of
the English language [3] [2] including equals, within, contains, disjoint, touches, meets,
covers, coveredBy, intersects, crosses and overlaps.


1.2    Specific Techniques Used

in this section, we discuss the main idea behind our new extension of Radon.

Radon2 vs. Radon. The basic idea behind the original Radon approach [8] for topological relation
discovery is to provide an indexing method combined with space tiling that allows for efficient
computation of topological relations between geospatial resources. In particular, Radon presents a
novel sparse index for geospatial resources. Then, based on bounding boxes of the indexed geospatial
resources, Radon applies a strategy for discarding unnecessary computations of DE-9IM relations.
In Radon2, our concerns is focused on optimizing the computing of intersection matrix (IM)
used in DE9-IM standard. In the original Radon, the intersection matrix is computed for each
topological relation, while in Radon2 we compute the IM once for all relations among the same
pair of resources. We then apply the mask for each relation to the the computed IM. In particular,
we buffer the IM of each pair of geometries so that all topological relations of same pair can be
retrieved with no need to recompute their respective IM again. By applying this strategy, we can
save the time for recomputing the IM for each individual topological relation. Moreover, calculating
IM at once for each pair of geometries for all topological relations does not affect the completeness
of the linking result. i.e., the F-measure of Radon2 is the same as the F-measure of Radon, which
is always 1.


1.3    Adaptations Made for the Evaluation

No specific adaptations were made to the original Radon algorithm, we only provide a Java
SystemAdapter according to the campaign guidelines3 .
3
    https://project-hobbit.eu/challenges/om2017/om2017-tasks/
1.4    Link to the System

Both Radon and Radon2 are implemented in the link discovery framework Limes. Limes is
available under the GNU Affero General Public License v3.0 4 . Radon2 source code is available
online from the project website5 . The project web site also provide a user manual6 as well as a
developer manual7 .


2     Results

Radon2 has been evaluated only in the Hobbit Link Discovery Track Task 2 (Spatial). The basic
idea behind this task was to measure how well the systems can identify DE- 9IM (Dimensionally Ex-
tended nine-Intersection Model) topological relations. The supported spatial relations were: equals,
within, contains, disjoint, touches, meets, covers, coveredBy, intersects, crosses
and overlaps. The geospatial resources traces were represented in Well-known text (WKT) format
as LineStrings. The result is produced as a set of links called a mapping: MS ,T = {(si , r, tj )|si ∈
S, tj ∈ T }. All the systems were tested against two datasets: (1) the sandbox dataset, with a scale
of 10 instances, and (2) the mainbox dataset with a scale of 5K instances. The other participants
to this task in addition to Radon were Agreement Maker Light(AML) and Silk.
    The systems were judged on the basis of precision, recall, F-Measure and run time. The final
results are shown in Figures 1, 2, 3 and 4. Note that we are only presenting the time performance
and not precision, recall and F-Measure as all were equal to 1.0.
    From these results we can see that Radon2 outperforms the other systems in all relations for
the sandbox and mainbox (linestrings –polygons) (see Figures 3 and 4) dataset as well as the for
the the mainbox dataset (linestrings–linestrings) (Figure 2). For the sandbox dataset (linestrings–
linestrings) (Figure 1), Radon achives a better performance in most of the relations (e.g., overlaps,
crosses, covered by, covers, within, contains, disjoint and equal. Only for the touches
and intersects Aml was able to outperform Radon2 for the TomTom dataset of the sandbox
(linestrings–linestrings). The differences in performance between touches and intersects, where
AML outperforms Radon cannot be explained from an implementation point of view, as these
two relations share the exact optimizations. However, due to the datasets consisting exclusively of
LineStrings, it is apparent that touches and intersects are much more likely to hold between any
two geometries than other relations. Therefore, the benchmarks on these relations are the hardest
in this task.


3     Conclusions and Future Work

We present Radon2, a simple strategy for scaling the original Radon approach by computing the
intersection matrix for each pair of resources once and use it for computing all possible topological
relations associated with such resources at hand. The presented evaluation during the OAEI 2018
showed that, in addition to being complete and correct (i.e. achieving an F-Measure of 1.0), Radon2
also outperforms the other participating systems in most of the cases
4
  https://github.com/dice-group/LIMES/blob/master/LICENSE
5
  https://github.com/dice-group/LIMES
6
  https://dice-group.github.io/LIMES/user manual/
7
  https://dice-group.github.io/LIMES/developer manual/
Fig. 1: Runtime results of linestrings-linestrings Sandbox Dataset
Fig. 2: Runtime results of linestrings-linestrings M ailbox DataSet
Fig. 3: Runtime results of linestrings-polygons Sandbox Dataset
Fig. 4: Runtimes results of linestrings-polygons M ailbox DataSet
   In future work, we will apply this strategy on a larger datasets with more resources and more
points per resource, where we will implement more sophisticated parallelization techniques. For
enabling automatic configuration of Radon2, we will combine Radon2 with the machine learning
algorithm Wombat [7] implemented in Limes. Also, we will extend Radon2 for discovering spatial-
temporal relation by integrating it with [5]. Moreover, we intend to combine Radon2 with the
simplification algorithms introduced in [1] in order to achieve even better speedup.

Acknowledgments This work has been supported by Eurostars Project SAGE (GA no. E!10882),
the BMVI project the LIMBO (GA no. 19F2029C), the DFG project LinkingLOD (project no.
NG 105/3-2), the BMWI Project GEISER (project no. 01MD16014) as well as the H2020 projects
SLIPO (GA no. 731581) and HOBBIT (GA no. 688227).


References
1. A. F. Ahmed, M. Sherif, and A. Ngonga Ngomo. On the effect of geometries simplification on geo-spatial
   link discovery. In Proceedings of SEMANTiCS 2018, 2018.
2. E. Clementini, P. Di Felice, and P. Van Oosterom. A small set of formal topological relationships
   suitable for end-user interaction. In International Symposium on Spatial Databases, pages 277–295.
   Springer, 1993.
3. E. Clementini, J. Sharma, and M. J. Egenhofer. Modelling topological spatial relations: Strategies for
   query processing. Computers & graphics, 18(6):815–822, 1994.
4. M. J. Egenhofer and R. D. Franzosa. Point-set topological spatial relations. International Journal of
   Geographical Information System, 5(2):161–174, 1991.
5. K. Jha, M. Röder, and A.-C. Ngonga Ngomo. All That Glitters is not Gold – Rule-Based Curation of
   Reference Datasets for Named Entity Recognition and Entity Linking. In The Semantic Web. Latest
   Advances and New Domains: 14th International Conference, ESWC 2017, Proceedings. Springer Inter-
   national Publishing, 2017.
6. M. A. S. a.-C. N. Kevin Dreßler. Radon results for oaei 2017. In Proceedings of Ontology Matching
   Workshop 2017, 2017.
7. M. Sherif, A.-C. Ngonga Ngomo, and J. Lehmann. WOMBAT - A Generalization Approach for Auto-
   matic Link Discovery. In 14th Extended Semantic Web Conference, Portorož, Slovenia, 28th May - 1st
   June 2017. Springer, 2017.
8. M. A. Sherif, K. Dreßler, P. Smeros, and A.-C. Ngonga Ngomo. RADON - Rapid Discovery of Topological
   Relations. In Proceedings of The Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17),
   2017.