<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reviews of Geo LD 2018 accepted papers</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>. Ali Khalili</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter van den Besselaar</string-name>
        </contrib>
      </contrib-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Review 1</title>
      <sec id="sec-1-1">
        <title>PC member:</title>
        <p>Reviewer's
confidence:
Relevance:
Impact of ideas
and results:
Clarity and
quality of
writing:
Related work:
Implementation
and soundness:
Evaluation:
Overall
evaluation:
anonymous
4: (high)
5: (excellent)
3: (fair)
2: (poor)</p>
      </sec>
      <sec id="sec-1-2">
        <title>1: (very poor)</title>
        <p>4: (good)
3: (fair)
-2: (reject)
With 15 pages the paper exceeds the official maximum page count for long
papers (12 pages) in the CFP. This should lead to the rejection of this paper.
However, I read the paper anyways and want to provide my feedback on the
work.</p>
        <p>In their work, the authors address the delineation of functional urban areas
(FUAs) based on open data. In contrast to the rigid definition of FUAs
provided by the OECD and EC, they aim to provide means to adaptively
define FUAs by weighting relevant factors differently. In the motivation, the
authors claim that governments need a way to dynamically redefine different
types of urban areas. However, the citation provided does not support that
claim. The authors should either elaborate on that statement or provide a
source supporting the claim.</p>
        <p>A contribution mentioned by the authors is the reconstruction of FUAs
defined by the OECD based on open data sources. However, a presentation of
how this reconstruction is achieved is missing as well as an evaluation of how
well the OECD FUAs can be reconstructed by open data sources. It would be
interesting to see, how these areas differ and why they differ. Furthermore, the
authors argue that the access to the OECD shapefiles is limited and requires
negotiation, however, the data seems to be openly available at:
http://www.oecd.org/cfe/regional-policy/functionalurbanareasbycountry.htm .
Unfortunately, the paper does not present related work which does not allow
for the comparison of the work to previous research endeavors. A good
starting point for a literature review in this area would be "The semantics of
populations: A city indicator perspective".</p>
        <p>The authors present the integration of different open geo dataset and the
linking between the data sets in detail. The datasets provide data on
administrative boundaries on several levels which should be used to
reconstruct FUAs. Furthermore, they present applications (SPARQL endpoint
and an API) to access the integrated data. The data is integrated by
transforming all datasets into GeoJSON and using a Mapping and Enrichment
Function to transform the data to RDF. Details on the Mapping and Ontology
used for the enrichment are not provided.</p>
        <p>The authors present two case studies. The first one evaluates the usability of a
tool which allows for geocoding addresses with the different administrative
boundaries from the datasets in a spreadsheet. The result of 10 participants
filling out the questionnaire revealed a decent usability of the tool, however,
the relation to how this allows for dynamically defining FUAs is not obvious.
The second case study applies the tools to study the number of projects funded
by the Netherlands Enterprise Agency in different FUAs. For this purpose,
they use the OpenStreetMap data in combination with statistical data provided
by the Netherlands to define different adaptive FUAs (based on population,
businesses and both) and also use the FUAs defined by the OECD. For these
FUAs they compare the number of projects and show that the OECD FUAs do
not capture all relevant areas. The process how the FUAs in the second use
case is derived from the OSM and combined with the statistics data is not
presented thoroughly enough.</p>
        <p>Overall, the authors present a pipeline for integrating geodata from different
data sources and provide means to query the data. The data is used to define
FUAs based on the combination of different datasets on administrative areas
and statistical data. A detailed description of methodologies to define these
areas and for which use cases which methodology may be applied is not
provided. Furthermore, they present two use cases of the implementation and
investigate different research questions with the help of these tools. The
implementations and demos of them are provided online. Unfortunately, some
demos of the implementations are merely available for registered user or with
a valid API key, which somehow is in contrast to making the data openly
available.</p>
        <p>Layout/Writing:
- Page count (15 pages) exceeds the Long Paper page limit of 12 pages
- Missing page count in the PDF
- Footnote spacing inconsistent (sometimes there is a space before the
footnote)
- Typos:
* p.3: shpapefiles
* p.6: Virtuosos
* p.10: Fig. 4 An screenshot
* p.11: seem --&gt; seems</p>
      </sec>
      <sec id="sec-1-3">
        <title>PC member:</title>
        <p>Reviewer's
confidence:
Relevance:
Impact of ideas and
anonymous
5: (excellent)
3: (strong accept)
This paper introduces a dynamic approach of classifying urban areas using
openly available spatial and non-spatial resources on the Web. The authors
argue the current static factors of the Functional Urban Areas (FUAs)
notion and propose a dynamically defining FUAs based on linked open
data.</p>
        <p>The paper is clear and reads well. The use of English is advanced and
highly scientific. The structure of the paper is well defined and helps the
reader go through the paper, even if he lacks of previous knowledge in the
domain.</p>
        <p>Their data discovery and collection step includes a brief but compact
overview of the datasets in use with the right citations. Their Data
Extraction &amp; Conversion step is very well written and to the point. Their
Data Storage &amp; Querying is also brief but well explained.</p>
        <p>Regarding the 4th step of Data Linking, I have some concerns. Even
though they describe well the query they used for linking, querying an
endpoint in order to perform linking is not the most suitable method when
it comes to link discovery. There are plenty of LD frameworks such as
LIMES and SILK that can perform geospatial data linking with high
accuracy and completeness. The remaining steps are also well described.
Finally, their In-Use Case Study and Evaluation is well written and
explained. I have no major comments regarding their evaluation procedure
and how they explained the results.</p>
        <p>This paper needs no further improvement from my side. The authors know
the problem at hand very well and their work is high quality.
However, since the upper limit of the paper is 12, I would suggest that they
should try to minimize the size of the paper.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Review 3</title>
      <p>2: (poor)
3: (fair)
1: (weak accept)
The paper proposes an implementation for managing and combining
different linked open geo-data sources together with non-spatial
resources on the Web and an approach (and implementation) for
identifying and delineating Functional Urban Areas (FUAs).
The framework for the management of linked geo-data covers all the
phases of the LOD Lifecycle (Auer et al.) and the paper is structured in
a way that follows these phases (or steps) too.</p>
      <p>Interesting contribution is the proposed approach for dynamically
defining FUAs based on open data such as Flickr, OSM, GADM, etc.
transformed into Linked Data and interlinked with LOD data sources
such as DBpedia and Wikidata.</p>
      <p>The applications built on top of this infrastructure are also quite
interesting and well designed, as shown in the screenshots provided and
in the link to the online demos.</p>
      <p>These are validated with 2 use case studies which are clearly described
at the end of the paper.</p>
      <p>The paper is well written and clear, despite some typos and grammatical
errors.</p>
      <p>The paper goes well beyond the pages limitation for this workshop, and
should be penalized for that. However, it is a very interesting
contribution for the workshop and its audience and should be accepted.
A short section for comparison with SOTA approaches and projects is
missing and would e beneficial.</p>
      <p>In the usability evaluation section it is written that 20 users took part in
the study but only 10 completed the survey, why?
Also, in Fig 5., the answers to questions such as the 3rd one and the 9th
one highlight some negative results, this should be explained.
Section 9 could be shortened, as well as the datasets descriptions in
section 2.</p>
      <sec id="sec-2-1">
        <title>Minor issues:</title>
        <p>Add a space before some reference numbers in the text.</p>
      </sec>
      <sec id="sec-2-2">
        <title>In Section 1:</title>
        <p>"OpenStreetMap3 is already interlinked" -&gt; "OpenStreetMap3 are
already interlinked"
"cloud and provides the opportunity" -&gt; "cloud and provide the
opportunity"
Section 2:
"shpapefiles" -&gt; "shapefiles"
Section 2.1: add spaces before the "(" characters.</p>
        <p>Section 3: "spacial" -&gt; "spatial"
Section 4: "Virtuosoś" -&gt; "Virtuoso"
"specific addres"-&gt;"specific address"
Section 8: "20 particiapants" -&gt; "20 participants"</p>
        <p>Footnote 43 is a duplicate.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Review 1</title>
      <p>1: (very poor)
0: (borderline paper)
This paper provides the description of an access control implementation for
Ordnance Survey Ireland (OSi). It describes an engineering solution for the
access control problem from OSi.</p>
      <p>There is no comparison with existing access control models nor critical
analysis of the implemented access control model. The source code is also not
made available for the general public. The main contribution of the paper is
the technical description of the use case, which the authors have implemented
for OSi. It is relevant to the topic of workshop and is a borderline paper in my
opinion.</p>
      <p>The paper describes an authorization model for RDF data based on SPARQL
query templates. The templates have to be created by an administrator
dependending on a license model. The access control solution proposed in
this work masquerades SPARQL interface with a custom RESTful API. This
takes away the benefits of using RDF from the users in favor of providing
fine-granular security model for the data provider.</p>
      <p>The motivation simply comes from the necessity of OSi to have an
authorization model for its' users. The benefits of such system for other use
cases are not outlined. Could it be used in general for geospatial data
providers? If yes, then what are the benefits and drawbacks of such a
solution?
The authors provide a related work section, but do not compare their work
with the existing access control models. Moreover, they claim that "Our
access control approach is specialized for retrieving fine grain geospatial
instance data, which can exploit GeoSPARQL functions (if an administrator
creates a query to do so), while existing approaches do not offer such specific
access control to data. For example, the security mechanisms in Apache Rya
[1] allows fine granular authorization on per triple basis. In this case, the user
can be restricted access to only a part of the graph by labeling necessary set of
the triples. Then for the same SPARQL query, different users will have
different responses based on their rights. It would be good to see why existing
access control models were not used for this use case.</p>
      <p>The questions, which are not answered in the paper:
- How much time does it take to create templates based on a license or a user?
- What is the impact of such system on the users in comparison of using
SPARQL?
Other remarks:
- Figure 2 should be visualized (the same as figure 1)
- on page 8: Figure 7 shows a high architecture... --&gt; should be Figure 5
- on page 8: example of a status call... --&gt; do not include protocol and domain
in the status call, i.e. /acon/status/{userID}
[1] Punnoose, Roshan, Adina Crainiceanu, and David Rapp. "SPARQL in the
cloud using Rya." Information Systems 48 (2015): 181-195.</p>
    </sec>
    <sec id="sec-4">
      <title>Review 2</title>
      <p>Matthias Wauer
2: (poor)
1: (weak accept)
What I would expect after the abstract:
The work provides three contributions to the problem of access control to
geospatial Linked Data:
1) an access control model including a vocabulary for describing a "licence"
defining what data can be accessed and a "template" for actually accessing
data,
2) an architecture for implementing this access control method using a proxy
service, and
3) a case study of a prototypical implementation</p>
      <sec id="sec-4-1">
        <title>What I expect after the introduction:</title>
        <p>The introduction clearly presents the research question. However, it fails to
motivate 1) why there is a need for fine grained data access control, and 2)
how a geospatial data retrieval scenario differs from generic data access.
Further, while motivating LD with enabling linking to other datasets etc.) it is
uncertain at this point if customers using a RESTful API will still have this
benefit.</p>
        <p>Summary (following the procedure described in
https://violentmetaphors.com/2013/08/25/how-to-read-and-understand-ascientific-paper-2):</p>
      </sec>
      <sec id="sec-4-2">
        <title>Identify the BIG QUESTION.</title>
        <p>→ How to enable fine-grained access control for geospatial Linked Data.
Summarize the background in five sentences or less.
→ The related work section presents many general access control approaches
and methods specific to RDF data. While not particularly experienced in this
specific field of research, many of the cited works appear to be in an early
state (several workshop and “Towards…” papers).</p>
      </sec>
      <sec id="sec-4-3">
        <title>Identify the SPECIFIC QUESTION(S). → How to model access rights to geospatial Linked Data. How to enforce access restrictions to geospatial Linked Data.</title>
      </sec>
      <sec id="sec-4-4">
        <title>Identify the approach. → A separate licence and access model. Apparently the access model is a combination of a “facade” (the template) in addition to licence based checks.</title>
      </sec>
      <sec id="sec-4-5">
        <title>Read the methods section.</title>
        <p>→ Section 4 provides a comprehensive explanation of the proposed approach.
Unfortunately the link to the vocabulary is not accessible. A few details are
left out, e.g., whether the geographical feature classes also support using
other geospatial data types, such as multipolygons.</p>
        <p>Read the results section. Write one or more paragraphs to summarize the
results.
→ The authors claim that the proposed architecture and vocabulary are able
to represent the access restrictions given by the requirements. The method by
which these requirements were gathered are not explained. A closed-source
prototypical implementation was done to realize the proposed concept.
While the authors stress the benefits of Linked Data in the introduction, they
propose access only through a REST API in their architecture. While the
result set to such a request may still include URIs that can be dereferenced,
I’m wondering if an approach like GraphQL might be more reasonable here.
Do the results answer the SPECIFIC QUESTION(S)? What do you think they
mean?
→ Question 1 is likely solved by the given approach within certain
limitations. For example, if a scenario requires a single query with two
distinct values of a certain type (e.g., radius) the licence model can’t
distinguish between those. Question 2 was already clarified in Section 2, and
appears to be answered sufficiently by the proposed concept.</p>
        <p>Now, go back to the beginning and read the abstract. Does it match what the
authors said in the paper? Does it fit with your interpretation of the paper?
→ Yes it does.</p>
        <p>Reasons to accept:
- novel approach to access control for geospatial linked data, relevant to the
workshop
- real use case and application
- comprehensive description of vocabulary and architecture
- well-structured paper</p>
      </sec>
      <sec id="sec-4-6">
        <title>Reasons to reject:</title>
        <p>- vocabulary URL not accessible
- very limited evaluation
- some implementation details omitted (e.g., any information about the result
format/content of the status call)
- paper clearly hasn't been re-read before submission (see suggestions)</p>
      </sec>
      <sec id="sec-4-7">
        <title>Improvement suggestions:</title>
        <p>Clarify that the licence AND template are both necessary to enforce access
control (in the example URI above Fig. 4 I first assumed the template ID
might be sufficient because the system could resolve itself which licence
would fit, but this is not necessarily the case).</p>
        <p>Discussion of potential alternative approaches and their drawbacks compared
to this licence+template approach would be interesting.</p>
        <p>Writing:
abstract:
"an implementation architecture ... which implements..."
introduction:
"ISU is ... and _are_ tasked"
next line: "...geospatial data -- ..." (remove long dash?)
"..., and hence _ an access control..." (missing word?)
"...and flexible enough to _be_ meet the (potential)..." (remove)
"The remained of this paper..." (remainder?)
"Section 2 presents _some_ geospatial access control ... from _a_ geospatial
data organization" (imprecise, better: be more specific)
requirements:
"For example, OSI's building... in the country." (this is not a sentence.)
"...construction companies _ utility companies _ etc." (missing commas)
related work:
"Role based access control [7]_,_ is..." (remove comma)
"Steyskal and Polleres [9] _examines_ ..." (plural)
"Such policies _specify specific_ ... to _specific_ datasets..." (repetition)
"Our approach models ... Our approach..." (repetition)
approach:
"... and then present _ proposed ..." (missing word?)
4.1:
URL not dereferencable, and should be provided as footnote instead of inline
"...a user will be able _ access." (missing word)
"...a floating point number of _what_ the permitted radius." (remove)
Fig. 2 line 16: "geohiveb:Building" looks like a very unusual URI with the
given prefix
4.2:
"...and _the_ a Query Processor..." (remove)
case study:
"...would correctly reject a query call _then_ when values... allowed _to_
according to..." (remove)
"...did not contain any _did_ data that..." (remove)
conclusion:
"...also allows _provides_ the specification..." (remove)
"...how well it would _fair_ in situations..." (fare?)
references:
"Ireland? s Authoritative..."
"21-Februrary-..."</p>
      </sec>
      <sec id="sec-4-8">
        <title>1: (very poor)</title>
        <p>0: (borderline paper)
The paper presents an approach for providing different levels of access
control to geographical data. In particular, the approach propose a
vocabulary which represents different levels of licensing to indicate
different access control to the users. In addition, they provide a template to
indicate how the access control can be provided to the users.</p>
        <p>I find this work interesting and quite relevant to the workshop although I
have been used to see licensing as a process of associating one access
control to the data for all users. The idea of providing different types of
licenses to the same dataset for different users is a different way of
exploiting licensing.</p>
        <p>Except for the above observation, I have a concern that is related to the
experimentation. Although this is an initial work I would have expected
some initial results that can be relevant to be presented and discussed in
this workshop which to me are missing.
correct link http://ontologies.geohive.ie/ -&gt;
http://ontologies.geohive.ie/osi/index.html. why will it be hosted in the
future? why is it not yet hosted in this domain?
Although the paper is relevant to the workshop, I believe that the paper is
not well written and English need to be revised. In the following, I will
provide a few examples of minor comments which do not represent an
exhaustive list of errors. So please revise the whole paper according to
similar examples.
straight forward -&gt; straightforward
valid up until -&gt; valid up to
retrieve data, but -&gt; retrieve data but
will be able access -&gt; will be able to access
for a templates variables -&gt; for templates variables
3. Finn Årup Nielsen, Daniel Mietchen and Egon Willighagen. Geospatial
Data and Scholia
2: (poor)
0: (borderline paper)
This paper presents an extension to Scholia, the authors previous work
presented in [5]. I like the topic of the paper. However, I have the
following concerns needs to be addressed in the final camera-ready
paper.
1. Motivation and story: The paper missing a clear connecting story and
motivation.
2. Contributions: I was not able to detect what are the exact
contributions of this paper.</p>
        <sec id="sec-4-8-1">
          <title>Confidential remarks for the program committee:</title>
        </sec>
      </sec>
      <sec id="sec-4-9">
        <title>PC member:</title>
        <p>Reviewer's
confidence:
Relevance:
Impact of
ideas and
results:
Clarity and
quality of
writing:
Related work:
Implementatio
anonymous
3. User stories are presented in section 4. I think they can be handled by
original Scholia as well? As such what are the stories that can only be
handled due to the new extension?</p>
      </sec>
      <sec id="sec-4-10">
        <title>4. The paper needs a through proof reading. I encourage authors to continue their good in this direction. I am not expert in this topic and happy to change my score after discussion.</title>
        <p>1: (very poor)
1: (weak accept)
The paper presents Scholia, a web site that visualizes bibliographic and and
scientific datasets contained in WikiCite/Wikidata. A part of the visualization
instruments in Scholia regards geospatial aspects of the data and, thus, are
mapbased. The authors provide a good overview of what type of visualizations
Scholia can support. However, they could elaborate more on how these
visualizations can be achieved, i.e. how the underlying data can be queried or
browsed.</p>
        <p>Specifically, the authors just mention the WDQS functionality, but not elaborate
on how this can be exploited (e.g. through https://query.wikidata.org/) to build
all the examples presented in the latter parts of the paper; all these examples are
just provided as fixed links, resembring pre-calculated and stored views for
selected queries, on the database contents. The respective SPARQL queries are
not present in the paper (of course expectedly, due to lack of space) but they are
also not visible at the respective web pages that comprise the examples of the
paper (at least after a quick look). Eventually, it is not clear for the reader how
easily a user can browse-construct the exemplary views in the paper, from
scratch.</p>
        <p>The authors should further elaborate a bit more on the architecture of the system
and its scalability potential. The paper (and the contained examples) leave the
impression that Wikidata (and all its affiliated initiatives) are still ongoing
efforts, currently including a small amount of data compared to other initiatives
(e.g. DBpedia). Even so, some of the contained links in the papaer require
considerable time to fully load, despite the very smal amount of data they
eventually visualize. The authors should elaborate on what are the
efficiency/scalability plans for the time when Wikidata gathers orders of
magnitude more data.</p>
        <p>Finally, I would like to have seen, in the related work, a small discussion as to
what new is contributed by Wikidata (and the affiliated sites/software) compared
to existing/established initiatives.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Review 3</title>
      <sec id="sec-5-1">
        <title>PC member:</title>
        <p>Reviewer's
confidence:
Relevance:
Impact of ideas
and results:
Clarity and
quality of writing:
Related work:
Implementation
and soundness:
Evaluation:
Overall
evaluation:
anonymous
3: (fair)
0: (borderline paper)
*General comments and Summary*
Scholia caters information about scientific works from Wikidata by querying
the Wikidata Query Service. The paper mostly describes a set of user stories
to illustrate the applicability of Scholia for the consumption of Geospatial
data using Wikidata. While the paper does not produce any technical
contribution per se, it presents interesting user stories illustrating the question
answering capability of Scholia on geospatial data.</p>
        <p>The system -- Scholia is built mostly (or dominantly) around handling
“nearby” queries using the geof:distance function. While the application of
the system is of great value, the coverage of the system in terms of
GeoSPARQL query features is a concern.</p>
        <p>The presented system -- Scholia is novel, however, the method/technique
using which these queries (representing users interests or information need)
are generated is not mentioned clearly. Whether this process is automatic or
manual is also of interest to a reader. At this point the system appears to be
too static or in an extremely early stage to be of benefit for an interested user
(the user is not an expert in SPARQL or GeoSPARQL for that matter).
*Introduction*
*User stories*
Pg 1, Paragraph 2, Last line -&gt; “whereas, in GeoSPARQL, the function with
the same name takes a third argument for the unit.” =&gt; which one? And how
is it different from geof:distance?
While the user stories are narrative, the GeoSPARQL queries used to
generate the answers are not mentioned. This is not mission critical but
would be very interesting to see/have. Though one can find the query from
the mentioned links (from below the results page -&gt; edit this query). I wonder
whether these queries were handcrafted or generated automatically. In the
case of handcrafted queries, it would be near impossible for users who are
not experts in GeoSPARQL to be able to use the Scholia with any ease. If
these queries are automatically generated based on the user interests, it
should be mentioned how and where.</p>
      </sec>
      <sec id="sec-5-2">
        <title>No mao is generated for story 1</title>
        <p>https://tools.wmflabs.org/scholia/country/Q33/topic/Q2539 which would be
ideal for the geospatial aspect of knowing from which city/area is a particular
researcher, or at least the location of the research lab/group. Also, the reason
for co-citation graph is reported to be empty but a clear reason is not
mentioned.</p>
        <p>No map is generated for the story 2
https://tools.wmflabs.org/scholia/location/Q1748/topic/Q2539. Also, no
additional graphs or tables are reported apart from the list of authors in the
descending order of their scores. The scoring function, is to be assumed to be
based on the distance from the center of the Copenhagen? In general, this
case inclines more towards a regular (non-GeoSPARQL) user story than
geospatial.</p>
      </sec>
      <sec id="sec-5-3">
        <title>No map is generated for the story 3 https://tools.wmflabs.org/scholia/location/Q3806/topic/Q52 . Why is this? Also similar case as of user story 3. Furthermore, “links do not work” is reported in stories 2 and 3, which is worth investigating.</title>
        <p>
          User story 4 is quite interesting, however, appears to be flawed. While the
story description states “relevant scientific meetings”, the results wrt to
WWW conference visitors report rather irrelevant points of interests in terms
of people and publications, such as (1) Alessandro Vespignani Italian
physicist, (
          <xref ref-type="bibr" rid="ref1">2</xref>
          ) Luciano Floridi Italian philosopher and (3) Combining
Participatory Influenza Surveillance with Modeling and Forecasting: Three
Alternative Approaches, etc. Also, no map is generated highlighting in which
part of the city or nearby suburbs are these events taking place.
For story 5, the administrator would be more interested in observing the
patterns from Denmark to South Korea and not the reverse.
1: (very poor)
0: (borderline paper)
The paper highlight four lessons learned during the process of interlinking
Ordnance Survey Ireland (OSi) and DBpedia. The first lesson was
concerning the difficulty of ontology matching of the datasets to be
interlinked. The second lesson was regarding the incomplete results of
SPARQL endpoints. The third lesson was concerning finding proper
measures for comparing resources. Finally, the fourth lesson is pertaining to
the difficulty of finding proper link specification.
        </p>
        <p>The paper is well structured and written in good English, which ease the
understanding of the paper. I am not completely agree with the authors
pertaining to some points they claim in the paper. Here, I list my points that
could help in extending this paper (may be a to a complete resource paper or
a dataset paper for SWJ):
Lesson 1: The problem here is well known in the literature as the ontology
matching problem. There is a massive amount of work to deal with his
problem. For example see the DL-Learner paper [a]
Lesson 2: I think the problem here was that you tried to issue 1 SPARQL
query to get the whole result set in one go, which is only possible if your
result set is less than the internal endpoint limit (let us assume it is n). In case
your result set is greater than the endpoint internal limit n, you get only n
random resources. To overcome such a limit, you normally ask the endpoint
programmatically using some paging techniques (using the LIMIT and
OFFSET in your SELECT query). See
(https://github.com/SmartDataAnalytics/jena-sparql-api).</p>
        <p>Lesson 3 and 4: I do not agree with your claim in Section 4.2 “Techniques to
learn a link specification [7] proved to be unusable in our case study due to
the non-overlapping nature of the two datasets to be interlinked.”. Have you
tried to use machine learning in LIMES [c] to do your task? WOMBAT (the
most recent machine learning approach in LIMES) provide supervised,
active and unsupervised versions to do your task. I think even the
unsupervised version of WOMBAT should give you good results. If not, try
the supervised version of WOMBAT.
- In Section 2, it would be good to reference [b] together with the original
LIMES paper to emphasis the superiority of LIMES performance
- In Section 4.2, you reference [8], while no such a reference exists
[a] DL-Learner - A framework for inductive learning on the Semantic Web
by Lorenz Bühmann, Jens Lehmann, and Patrick Westphal in Web
Semantics: Science, Services and Agents on the World Wide Web
[b] RADON - Rapid Discovery of Topological Relations by Mohamed
Ahmed Sherif, Kevin Dreßler, Panayiotis Smeros, and Axel-Cyrille Ngonga
Ngomo in Proceedings of The Thirty-First AAAI Conference on Artificial
Intelligence (AAAI-17)
[c] WOMBAT - A Generalization Approach for Automatic Link Discovery
by Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo, and Jens
Lehmann in 14th Extended Semantic Web Conference, Portoroz, Slovenia,
28th May - 1st June 2017</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Review 2</title>
      <p>Kleanthi Georgala
2: (poor)
0: (borderline paper)
This paper gives an overview of the existing challenges of linking geospatial
datasets.</p>
      <p>The authors begin by explaining the problem at hand, focusing on the
preprocessing phase of Link Discovery (LD), thus the title of the paper itself does
not describes well the content of their work. They continue by describing the
issues involved with linking the OSi and DBpedia geospatial datasets. Then,
they give a detailed overview of how they identified the needed resources
from the two KBs and how they derived the datasets from the endpoints,
along with the challenges they faced. Finally, the describe how they selected
properties for matching and similarities measures to perform linking, along
with the challenges they faced when using a declarative LD framework.</p>
      <sec id="sec-6-1">
        <title>Comments for each section:</title>
        <p>Abstract
a case study in -&gt; a case study of
The word 'experience' is not a suitable word for research. Please replace it
through out the whole document.</p>
        <p>The abstract reads well. However, the pre-processing phase of LD is far from
being overlooked. In order to identify matching properties between two
datasets, there has been plenty of related work in the domain of ontology
matching. Please take a look at
http://www.ontologymatching.org/relwork.html.</p>
        <p>Regarding the identification of LSs in order to link datasets, the framework
you used (LIMES) incorporates 3 machine learning algorithms (wombat
simple and complete, eagle) that can produce appropriate LSs.</p>
      </sec>
      <sec id="sec-6-2">
        <title>Introduction</title>
        <p>Provide citation in the first sentence.
described in two parts -&gt; divided in two parts
The introduction reads well. For further comments, re-read my comments for
your abstract.</p>
      </sec>
      <sec id="sec-6-3">
        <title>Section 2 I have no comments for Section 2. The preliminaries are well described and so are the challenges the authors faced.</title>
      </sec>
      <sec id="sec-6-4">
        <title>Section 3.1</title>
        <p>Please try to avoid phrases like 'picking up' and 'figured out'. They are quite
informal for a scientific document.</p>
        <p>The 'Counties_of_the_Republic_of_Ireland' is not present in DBpedia. It's
equivalent is 'Counties_towns_in_the_Republic_of_Ireland'.</p>
        <p>You mention 5 different ways, however I read only 4. Also using Q_i is not
appropriate, since you are not asking questions, but making remarks.
Q_1: How did you deal with townlands that do not have the pattern in their
article category?
Q_2 and Q_4: How do you know that some resources are or aren't townlands?
My general comments for this section is that you tried to perform ontology
matching between predicates from your KBs. The whole idea seems to be
quite arbitrary to me and based solely on assumptions and insights. As I
mentioned above, you could have used an ontology mapping tool to carry out
this work for you.</p>
      </sec>
      <sec id="sec-6-5">
        <title>Section 3.2</title>
        <p>I completely disagree with Lesson 2.
1) SPARQL endpoints are not affected by which browser you use.
2) The reason you were getting different results sets has to do with how the
query plan is executed internally.
3) There is no ordering in the retrieved result set of a SPARQL query unless
you use the ORDER BY tag. This is not a problem of the Virtuoso or any
other endpoint. This is how SPARQL queries operate.
4) It is not possible to return more than 1M of rows in SPARQL query result
set over HTTP when using Virtuoso. There is a setting ResultSetMaxRows in
virtuoso.ini, but if you specify something huge, you won't have more than
1048576 rows, which is 2^20. I believe this number is very high compared to
the size of your retrieved datasets. If you run the same query from isql, you
will get all the rows. So it is not a bug, it is a feature that has a very specific
purpose.
5) If you wanted to retrieve all results with one query, you could set up
Virtuoso locally and change the ResultSetMaxRows setting in the INI file.</p>
      </sec>
      <sec id="sec-6-6">
        <title>Section 4.1 The first paragraph contradicts Lesson 3.</title>
      </sec>
      <sec id="sec-6-7">
        <title>Section 4.2</title>
        <p>You used LIMES for linking. LIMES incorporates 3 machine learning
algorithms for learning LSs. Their unsupervised version does not require any
training dataset and it is able to provide you with owl:sameAs links that you
were targeting.</p>
        <p>Additionally, a very quick transformation of the WKT polygon to/from WKT
points could have aided your work.</p>
      </sec>
      <sec id="sec-6-8">
        <title>Section 4.3</title>
        <p>LIMES have a detailed manual of how a user can use the pre-processing
functions
(http://dicegroup.github.io/LIMES/user_manual/configuration_file/data_sources.html).
As you can see, there are plenty of examples of how to use the pre-processing
functions. So your comment about not knowing that the quotes should be
omitted is not reasonable if you have checked the manual.</p>
        <p>Additionally, each atomic LS which consists of one similarity function will
retrieve a similarity between 0 and 1.
so:
0 &lt;= similarity_1 &lt;= 1 (+)
0 &lt;= similarity_2 &lt;= 1 (=)
0 &lt;= similarity_1 + similarity_2 &lt;= 2
if you multiply the first two lines with 0.5, you get:
0 &lt;= 0.5*similarity_1 &lt;= 0.5 (+)
0 &lt;= 0.5*similarity_2 &lt;= 0.5 (=)
0 &lt;= 0.5*similarity_1 + 0.5*similarity_2 &lt;= 1
thus, if you have used the AND function in the correct way
(http://dicegroup.github.io/LIMES/user_manual/configuration_file/metric/metric_operati
ons.html), then your similarity will be bounded to [0,1]
My final comment on the paper is that the authors tried to assist a very
interesting problem in linking geospatial datasets, however their work has not
considered a lot of main points described in my comments above. It was a
good initial work but it lacks insight at the problem at hand.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Review 3</title>
      <sec id="sec-7-1">
        <title>PC member:</title>
        <p>Reviewer's
anonymous
4: (high)
confidence:
Relevance:
Impact of ideas
and results:
Clarity and
quality of
writing:
Related work:
Implementation
and soundness:
Evaluation:
Overall
evaluation:
2: (poor)
3: (fair)
1: (weak accept)
The paper describes a practical case of interlinking a subset of geospatial data
from Ordnance Survey Ireland (OSi) with a reference public LOD dataset
(DBpedia). The subset included entities of two types: counties and townlands.
The authors describe the challenges arising with the selection of relevant data
subsets to match, appropriate similarity measures, and special configuration
parameters of the chosen link generation engine (LIMES).</p>
        <p>Overall, the paper does not provide some novel research contribution.</p>
        <p>However, it is interesting from the point of view of a practitioner’s experience
with methods and tools developed by the research community. One aspect that
is not fully clear to me is why the authors did not try the active learning
extensions available for data interlinking tools (both LIMES and SILK as the
better known ones). That could potentially save the effort of picking an
appropriate string similarity measure. Overall, however, the paper shows well
the issues arising with the usage of existing tools originated in the Semantic
Web research community and highlights the need for the improved usability
of tools.
5. Matthias Wauer and Axel-Cyrille Ngonga Ngomo. Towards a Semantic
Message-driven Microservice Platform for Geospatial and Sensor Data</p>
      </sec>
      <sec id="sec-7-2">
        <title>There are some remarks on the paper:</title>
      </sec>
      <sec id="sec-7-3">
        <title>1. Keywords are missing.</title>
        <p>2. Last paragraph of section 2 the mention of section 3 is missing
The code snippets are not self-explaining. Can you reference some
3rd party publication or give some short explanation?
Section 3: Do you have prove that it works? Did you already test the
chain of services as explained in sec. 2.4?
Section Acknowledgement - Please check regulations on publications
for the funding program (Kommunikations-Toolbox)</p>
      </sec>
      <sec id="sec-7-4">
        <title>Spelling: Page 6: "value AMQP value" Page 9, section 4.2, 2nd paragraph: readonable -&gt; reasonable</title>
        <sec id="sec-7-4-1">
          <title>Overall evaluation:</title>
          <p>2: (poor)
1: (weak accept)
The paper presents a funded research project called GEISER that aims at
developing a flexible and scalable platform for managing geospatial and
sensor data.</p>
          <p>The proposed platform is based on a microservices architecture and
integrates data using semantic technologies. It is based on the RabbitMQ
implementation of the AMQP standard.</p>
          <p>The project is still at an early stage of development and the software
implementation is still not complete. Hence, a proper evaluation of the
platform is still not available.</p>
          <p>However, the paper and the project are definitely relevant and interesting
for the workshop and its audience.</p>
          <p>The architecture of the platform is clearly described and some details of its
current implementation are provided. The paper is well written and clear.
An issue of the paper is that some sentences and claims are not properly
backed up by a reference. E.g. in Introduction claims in the 2nd and 4th
paragraphs should be supported by references. Similarly, in Section 2.3,
3rd paragraph, the choice of RabbitMQ as compared to Apache Kafka and
other approaches is not fully explained or supported by objective evidence.
As reg. the writing style, parts of Section 2 in particular, feel closer to
project proposal writing rather than scientific paper writing. This could be
easily tuned by adding more references or justifications for the design
choices. Requirements for the platform have been collected by authors and
project partners but they are not clearly discussed in the paper.
The evaluation section is very short and only provides some pointers to
papers where adopted tools have been evaluated independently. This
section should be rather titled "Validation" and could be either extended a
little or even included in Section 2 as a subsection.</p>
          <p>I am not sure the readers would be very interested in the Docker Compose
configuration in Listing 1.3. I would say that probably they would be more
interested in getting to know about the detailed requirements collected for
the platform and how they are reflected into the design decisions. This
could be interesting for the presentation at the workshop.</p>
        </sec>
      </sec>
      <sec id="sec-7-5">
        <title>Minor issues:</title>
        <p>- UnifiedViews has been replaced by Linked Pipes ETL, so its reference
and comparison in Section 4.2 could be updated.
- End of Introduction should also mention about the "Evaluation" and
section 3.
- The numbering of the Listings should be revised, not 1.x.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Review 3</title>
      <sec id="sec-8-1">
        <title>PC member:</title>
        <p>Reviewer's
confidence:
Relevance:
Impact of ideas
anonymous
4: (high)
and results:
Clarity and
quality of writing:
Related work:
Implementation
and soundness:
Evaluation:
3: (fair)
2: (poor)
1: (weak accept)
The paper addresses the need for a flexible and scalable platform for
handling integration of geospatial and sensor data. This work is an ongoing
research project called GEISER for the creation of a platform for extraction,
transformation, interlinking and fusion of such data. I understand that the
project is in its initial stage but still I would appreciate some details on how
the authors intend to extract, transform and link geospatial and sensor data as
well as how will they evaluate such platform.</p>
      </sec>
      <sec id="sec-8-2">
        <title>Strong points:</title>
        <p>1. The goal of the project is very sound and important to the community
2. The authors present three important use cases.</p>
        <p>Weak Points:
1. I would like to see more information about use cases such as; how do they
intend to integrate geographic information. What kind of data will they
integrate for the geomarketing? What kind of industrial data do they have /
are willing to have and how?
2. Authors state that due to the early stage of the project, have focused only
on functional evaluation according to the requirements gathered from the use
cases. However, as there is still space a reader would like to know what are
the requirements.
3. What was missing in the paper is a general workflow on how the authors
intend to integrate and process the data on the fly. Maybe with an example
its easer for the reader to understand the workflow. Are they willing to use
Google data or Geoname / Dbpedia? What kind of social networking data
will they use?
4. In the Related Work, you may also cite the EW-SHOPP project which has
a similar objective as GEISER project. You can find more information on
www.ew-shop.eu
5. Authors do not provide any additional information about how scalable this
platform will be and how are they willing to evaluate it.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Alan</given-names>
            <surname>Meehan</surname>
          </string-name>
          , Kaniz Fatema, Rob Brennan, Eamonn Clinton,
          <article-title>Lorraine McNerney</article-title>
          and
          <string-name>
            <surname>Declan O'Sullivan. License</surname>
          </string-name>
          and
          <article-title>Template Access Control for Geospatial Linked Data 4</article-title>
          .
          <string-name>
            <surname>Peru</surname>
            <given-names>Bhardwaj</given-names>
          </string-name>
          ,
          <source>Christophe Debruyne and Declan O'Sullivan. On the Overlooked Challenges of Link Discovery</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>