Reviews of GeoLD 2018 accepted papers Part of the joint proceedings of GeoLD-QuWeDa-2018 published at CEUR-WS 1. Ali Khalili, Peter van den Besselaar and Klaas Andries de Graaf. Using Linked Open Geo Boundaries for Adaptive Delineation of Functional Urban Areas Review 1 PC member: anonymous Reviewer's 4: (high) confidence: Relevance: 5: (excellent) Impact of ideas 3: (fair) and results: Clarity and quality of 2: (poor) writing: Related work: 1: (very poor) Implementation 4: (good) and soundness: Evaluation: 3: (fair) Overall -2: (reject) evaluation: With 15 pages the paper exceeds the official maximum page count for long papers (12 pages) in the CFP. This should lead to the rejection of this paper. However, I read the paper anyways and want to provide my feedback on the work. In their work, the authors address the delineation of functional urban areas (FUAs) based on open data. In contrast to the rigid definition of FUAs provided by the OECD and EC, they aim to provide means to adaptively define FUAs by weighting relevant factors differently. In the motivation, the authors claim that governments need a way to dynamically redefine different types of urban areas. However, the citation provided does not support that claim. The authors should either elaborate on that statement or provide a source supporting the claim. A contribution mentioned by the authors is the reconstruction of FUAs defined by the OECD based on open data sources. However, a presentation of how this reconstruction is achieved is missing as well as an evaluation of how well the OECD FUAs can be reconstructed by open data sources. It would be interesting to see, how these areas differ and why they differ. Furthermore, the authors argue that the access to the OECD shapefiles is limited and requires negotiation, however, the data seems to be openly available at: http://www.oecd.org/cfe/regional-policy/functionalurbanareasbycountry.htm . Unfortunately, the paper does not present related work which does not allow for the comparison of the work to previous research endeavors. A good starting point for a literature review in this area would be "The semantics of populations: A city indicator perspective". The authors present the integration of different open geo dataset and the linking between the data sets in detail. The datasets provide data on administrative boundaries on several levels which should be used to reconstruct FUAs. Furthermore, they present applications (SPARQL endpoint and an API) to access the integrated data. The data is integrated by transforming all datasets into GeoJSON and using a Mapping and Enrichment Function to transform the data to RDF. Details on the Mapping and Ontology used for the enrichment are not provided. The authors present two case studies. The first one evaluates the usability of a tool which allows for geocoding addresses with the different administrative boundaries from the datasets in a spreadsheet. The result of 10 participants filling out the questionnaire revealed a decent usability of the tool, however, the relation to how this allows for dynamically defining FUAs is not obvious. The second case study applies the tools to study the number of projects funded by the Netherlands Enterprise Agency in different FUAs. For this purpose, they use the OpenStreetMap data in combination with statistical data provided by the Netherlands to define different adaptive FUAs (based on population, businesses and both) and also use the FUAs defined by the OECD. For these FUAs they compare the number of projects and show that the OECD FUAs do not capture all relevant areas. The process how the FUAs in the second use case is derived from the OSM and combined with the statistics data is not presented thoroughly enough. Overall, the authors present a pipeline for integrating geodata from different data sources and provide means to query the data. The data is used to define FUAs based on the combination of different datasets on administrative areas and statistical data. A detailed description of methodologies to define these areas and for which use cases which methodology may be applied is not provided. Furthermore, they present two use cases of the implementation and investigate different research questions with the help of these tools. The implementations and demos of them are provided online. Unfortunately, some demos of the implementations are merely available for registered user or with a valid API key, which somehow is in contrast to making the data openly available. Layout/Writing: - Page count (15 pages) exceeds the Long Paper page limit of 12 pages - Missing page count in the PDF - Footnote spacing inconsistent (sometimes there is a space before the footnote) - Typos: * p.3: shpapefiles * p.6: Virtuosos * p.10: Fig. 4 An screenshot * p.11: seem --> seems Review 2 PC member: Kleanthi Georgala Reviewer's 4: (high) confidence: Relevance: 5: (excellent) Impact of ideas and 5: (excellent) results: Clarity and quality 5: (excellent) of writing: Related work: 5: (excellent) Implementation 5: (excellent) and soundness: Evaluation: 5: (excellent) 3: (strong accept) This paper introduces a dynamic approach of classifying urban areas using openly available spatial and non-spatial resources on the Web. The authors argue the current static factors of the Functional Urban Areas (FUAs) notion and propose a dynamically defining FUAs based on linked open data. The paper is clear and reads well. The use of English is advanced and highly scientific. The structure of the paper is well defined and helps the reader go through the paper, even if he lacks of previous knowledge in the domain. Their data discovery and collection step includes a brief but compact overview of the datasets in use with the right citations. Their Data Extraction & Conversion step is very well written and to the point. Their Overall evaluation: Data Storage & Querying is also brief but well explained. Regarding the 4th step of Data Linking, I have some concerns. Even though they describe well the query they used for linking, querying an endpoint in order to perform linking is not the most suitable method when it comes to link discovery. There are plenty of LD frameworks such as LIMES and SILK that can perform geospatial data linking with high accuracy and completeness. The remaining steps are also well described. Finally, their In-Use Case Study and Evaluation is well written and explained. I have no major comments regarding their evaluation procedure and how they explained the results. This paper needs no further improvement from my side. The authors know the problem at hand very well and their work is high quality. However, since the upper limit of the paper is 12, I would suggest that they should try to minimize the size of the paper. Review 3 PC member: anonymous Reviewer's 4: (high) confidence: Relevance: 5: (excellent) Impact of ideas and 4: (good) results: Clarity and quality of 3: (fair) writing: Related work: 2: (poor) Implementation and 4: (good) soundness: Evaluation: 3: (fair) Overall evaluation: 1: (weak accept) The paper proposes an implementation for managing and combining different linked open geo-data sources together with non-spatial resources on the Web and an approach (and implementation) for identifying and delineating Functional Urban Areas (FUAs). The framework for the management of linked geo-data covers all the phases of the LOD Lifecycle (Auer et al.) and the paper is structured in a way that follows these phases (or steps) too. Interesting contribution is the proposed approach for dynamically defining FUAs based on open data such as Flickr, OSM, GADM, etc. transformed into Linked Data and interlinked with LOD data sources such as DBpedia and Wikidata. The applications built on top of this infrastructure are also quite interesting and well designed, as shown in the screenshots provided and in the link to the online demos. These are validated with 2 use case studies which are clearly described at the end of the paper. The paper is well written and clear, despite some typos and grammatical errors. The paper goes well beyond the pages limitation for this workshop, and should be penalized for that. However, it is a very interesting contribution for the workshop and its audience and should be accepted. A short section for comparison with SOTA approaches and projects is missing and would e beneficial. In the usability evaluation section it is written that 20 users took part in the study but only 10 completed the survey, why? Also, in Fig 5., the answers to questions such as the 3rd one and the 9th one highlight some negative results, this should be explained. Section 9 could be shortened, as well as the datasets descriptions in section 2. Minor issues: Add a space before some reference numbers in the text. In Section 1: "OpenStreetMap3 is already interlinked" -> "OpenStreetMap3 are already interlinked" "cloud and provides the opportunity" -> "cloud and provide the opportunity" Section 2: "shpapefiles" -> "shapefiles" Section 2.1: add spaces before the "(" characters. Section 3: "spacial" -> "spatial" Section 4: "Virtuosoś" -> "Virtuoso" "specific addres"->"specific address" Section 8: "20 particiapants" -> "20 participants" Footnote 43 is a duplicate. 2. Alan Meehan, Kaniz Fatema, Rob Brennan, Eamonn Clinton, Lorraine McNerney and Declan O'Sullivan. License and Template Access Control for Geospatial Linked Data Review 1 PC member: Ivan Ermilov Reviewer's 4: (high) confidence: Relevance: 5: (excellent) Impact of ideas 3: (fair) and results: Clarity and quality of 4: (good) writing: Related work: 2: (poor) Implementation 4: (good) and soundness: Evaluation: 1: (very poor) Overall 0: (borderline paper) evaluation: This paper provides the description of an access control implementation for Ordnance Survey Ireland (OSi). It describes an engineering solution for the access control problem from OSi. There is no comparison with existing access control models nor critical analysis of the implemented access control model. The source code is also not made available for the general public. The main contribution of the paper is the technical description of the use case, which the authors have implemented for OSi. It is relevant to the topic of workshop and is a borderline paper in my opinion. The paper describes an authorization model for RDF data based on SPARQL query templates. The templates have to be created by an administrator dependending on a license model. The access control solution proposed in this work masquerades SPARQL interface with a custom RESTful API. This takes away the benefits of using RDF from the users in favor of providing fine-granular security model for the data provider. The motivation simply comes from the necessity of OSi to have an authorization model for its' users. The benefits of such system for other use cases are not outlined. Could it be used in general for geospatial data providers? If yes, then what are the benefits and drawbacks of such a solution? The authors provide a related work section, but do not compare their work with the existing access control models. Moreover, they claim that "Our access control approach is specialized for retrieving fine grain geospatial instance data, which can exploit GeoSPARQL functions (if an administrator creates a query to do so), while existing approaches do not offer such specific access control to data. For example, the security mechanisms in Apache Rya [1] allows fine granular authorization on per triple basis. In this case, the user can be restricted access to only a part of the graph by labeling necessary set of the triples. Then for the same SPARQL query, different users will have different responses based on their rights. It would be good to see why existing access control models were not used for this use case. The questions, which are not answered in the paper: - How much time does it take to create templates based on a license or a user? - What is the impact of such system on the users in comparison of using SPARQL? Other remarks: - Figure 2 should be visualized (the same as figure 1) - on page 8: Figure 7 shows a high architecture... --> should be Figure 5 - on page 8: example of a status call... --> do not include protocol and domain in the status call, i.e. /acon/status/{userID} [1] Punnoose, Roshan, Adina Crainiceanu, and David Rapp. "SPARQL in the cloud using Rya." Information Systems 48 (2015): 181-195. Review 2 PC member: Matthias Wauer Reviewer's 4: (high) confidence: Relevance: 4: (good) Impact of ideas 4: (good) and results: Clarity and quality of 3: (fair) writing: Related work: 3: (fair) Implementation 3: (fair) and soundness: Evaluation: 2: (poor) Overall 1: (weak accept) evaluation: What I would expect after the abstract: The work provides three contributions to the problem of access control to geospatial Linked Data: 1) an access control model including a vocabulary for describing a "licence" defining what data can be accessed and a "template" for actually accessing data, 2) an architecture for implementing this access control method using a proxy service, and 3) a case study of a prototypical implementation What I expect after the introduction: The introduction clearly presents the research question. However, it fails to motivate 1) why there is a need for fine grained data access control, and 2) how a geospatial data retrieval scenario differs from generic data access. Further, while motivating LD with enabling linking to other datasets etc.) it is uncertain at this point if customers using a RESTful API will still have this benefit. Summary (following the procedure described in https://violentmetaphors.com/2013/08/25/how-to-read-and-understand-a- scientific-paper-2): Identify the BIG QUESTION. → How to enable fine-grained access control for geospatial Linked Data. Summarize the background in five sentences or less. → The related work section presents many general access control approaches and methods specific to RDF data. While not particularly experienced in this specific field of research, many of the cited works appear to be in an early state (several workshop and “Towards…” papers). Identify the SPECIFIC QUESTION(S). → How to model access rights to geospatial Linked Data. How to enforce access restrictions to geospatial Linked Data. Identify the approach. → A separate licence and access model. Apparently the access model is a combination of a “facade” (the template) in addition to licence based checks. Read the methods section. → Section 4 provides a comprehensive explanation of the proposed approach. Unfortunately the link to the vocabulary is not accessible. A few details are left out, e.g., whether the geographical feature classes also support using other geospatial data types, such as multipolygons. Read the results section. Write one or more paragraphs to summarize the results. → The authors claim that the proposed architecture and vocabulary are able to represent the access restrictions given by the requirements. The method by which these requirements were gathered are not explained. A closed-source prototypical implementation was done to realize the proposed concept. While the authors stress the benefits of Linked Data in the introduction, they propose access only through a REST API in their architecture. While the result set to such a request may still include URIs that can be dereferenced, I’m wondering if an approach like GraphQL might be more reasonable here. Do the results answer the SPECIFIC QUESTION(S)? What do you think they mean? → Question 1 is likely solved by the given approach within certain limitations. For example, if a scenario requires a single query with two distinct values of a certain type (e.g., radius) the licence model can’t distinguish between those. Question 2 was already clarified in Section 2, and appears to be answered sufficiently by the proposed concept. Now, go back to the beginning and read the abstract. Does it match what the authors said in the paper? Does it fit with your interpretation of the paper? → Yes it does. Reasons to accept: - novel approach to access control for geospatial linked data, relevant to the workshop - real use case and application - comprehensive description of vocabulary and architecture - well-structured paper Reasons to reject: - vocabulary URL not accessible - very limited evaluation - some implementation details omitted (e.g., any information about the result format/content of the status call) - paper clearly hasn't been re-read before submission (see suggestions) Improvement suggestions: Clarify that the licence AND template are both necessary to enforce access control (in the example URI above Fig. 4 I first assumed the template ID might be sufficient because the system could resolve itself which licence would fit, but this is not necessarily the case). Discussion of potential alternative approaches and their drawbacks compared to this licence+template approach would be interesting. Writing: abstract: "an implementation architecture ... which implements..." introduction: "ISU is ... and _are_ tasked" next line: "...geospatial data -- ..." (remove long dash?) "..., and hence _ an access control..." (missing word?) "...and flexible enough to _be_ meet the (potential)..." (remove) "The remained of this paper..." (remainder?) "Section 2 presents _some_ geospatial access control ... from _a_ geospatial data organization" (imprecise, better: be more specific) requirements: "For example, OSI's building... in the country." (this is not a sentence.) "...construction companies _ utility companies _ etc." (missing commas) related work: "Role based access control [7]_,_ is..." (remove comma) "Steyskal and Polleres [9] _examines_ ..." (plural) "Such policies _specify specific_ ... to _specific_ datasets..." (repetition) "Our approach models ... Our approach..." (repetition) approach: "... and then present _ proposed ..." (missing word?) 4.1: URL not dereferencable, and should be provided as footnote instead of inline "...a user will be able _ access." (missing word) "...a floating point number of _what_ the permitted radius." (remove) Fig. 2 line 16: "geohiveb:Building" looks like a very unusual URI with the given prefix 4.2: "...and _the_ a Query Processor..." (remove) case study: "...would correctly reject a query call _then_ when values... allowed _to_ according to..." (remove) "...did not contain any _did_ data that..." (remove) conclusion: "...also allows _provides_ the specification..." (remove) "...how well it would _fair_ in situations..." (fare?) references: "Ireland? s Authoritative..." "21-Februrary-..." Review 3 PC member: anonymous Reviewer's 4: (high) confidence: Relevance: 4: (good) Impact of ideas and 3: (fair) results: Clarity and quality 3: (fair) of writing: Related work: 4: (good) Implementation and 2: (poor) soundness: Evaluation: 1: (very poor) Overall evaluation: 0: (borderline paper) The paper presents an approach for providing different levels of access control to geographical data. In particular, the approach propose a vocabulary which represents different levels of licensing to indicate different access control to the users. In addition, they provide a template to indicate how the access control can be provided to the users. I find this work interesting and quite relevant to the workshop although I have been used to see licensing as a process of associating one access control to the data for all users. The idea of providing different types of licenses to the same dataset for different users is a different way of exploiting licensing. Except for the above observation, I have a concern that is related to the experimentation. Although this is an initial work I would have expected some initial results that can be relevant to be presented and discussed in this workshop which to me are missing. correct link http://ontologies.geohive.ie/ -> http://ontologies.geohive.ie/osi/index.html. why will it be hosted in the future? why is it not yet hosted in this domain? Although the paper is relevant to the workshop, I believe that the paper is not well written and English need to be revised. In the following, I will provide a few examples of minor comments which do not represent an exhaustive list of errors. So please revise the whole paper according to similar examples. straight forward -> straightforward valid up until -> valid up to retrieve data, but -> retrieve data but will be able access -> will be able to access for a templates variables -> for templates variables 3. Finn Årup Nielsen, Daniel Mietchen and Egon Willighagen. Geospatial Data and Scholia Review 1 PC member: anonymous Reviewer's confidence: 3: (medium) Relevance: 5: (excellent) Impact of ideas and 2: (poor) results: Clarity and quality of 2: (poor) writing: Related work: 3: (fair) Implementation and 3: (fair) soundness: Evaluation: 2: (poor) 0: (borderline paper) This paper presents an extension to Scholia, the authors previous work presented in [5]. I like the topic of the paper. However, I have the following concerns needs to be addressed in the final camera-ready paper. 1. Motivation and story: The paper missing a clear connecting story and motivation. Overall evaluation: 2. Contributions: I was not able to detect what are the exact contributions of this paper. 3. User stories are presented in section 4. I think they can be handled by original Scholia as well? As such what are the stories that can only be handled due to the new extension? 4. The paper needs a through proof reading. I encourage authors to continue their good in this direction. Confidential remarks I am not expert in this topic and happy to change my score after for the program discussion. committee: Review 2 PC member: anonymous Reviewer's 3: (medium) confidence: Relevance: 4: (good) Impact of ideas and 3: (fair) results: Clarity and quality of 4: (good) writing: Related work: 3: (fair) Implementatio 4: (good) n and soundness: Evaluation: 1: (very poor) 1: (weak accept) The paper presents Scholia, a web site that visualizes bibliographic and and scientific datasets contained in WikiCite/Wikidata. A part of the visualization instruments in Scholia regards geospatial aspects of the data and, thus, are map- based. The authors provide a good overview of what type of visualizations Scholia can support. However, they could elaborate more on how these visualizations can be achieved, i.e. how the underlying data can be queried or browsed. Specifically, the authors just mention the WDQS functionality, but not elaborate on how this can be exploited (e.g. through https://query.wikidata.org/) to build all the examples presented in the latter parts of the paper; all these examples are just provided as fixed links, resembring pre-calculated and stored views for selected queries, on the database contents. The respective SPARQL queries are not present in the paper (of course expectedly, due to lack of space) but they are also not visible at the respective web pages that comprise the examples of the Overall paper (at least after a quick look). Eventually, it is not clear for the reader how evaluation: easily a user can browse-construct the exemplary views in the paper, from scratch. The authors should further elaborate a bit more on the architecture of the system and its scalability potential. The paper (and the contained examples) leave the impression that Wikidata (and all its affiliated initiatives) are still ongoing efforts, currently including a small amount of data compared to other initiatives (e.g. DBpedia). Even so, some of the contained links in the papaer require considerable time to fully load, despite the very smal amount of data they eventually visualize. The authors should elaborate on what are the efficiency/scalability plans for the time when Wikidata gathers orders of magnitude more data. Finally, I would like to have seen, in the related work, a small discussion as to what new is contributed by Wikidata (and the affiliated sites/software) compared to existing/established initiatives. Review 3 PC member: anonymous Reviewer's 5: (expert) confidence: Relevance: 5: (excellent) Impact of ideas 3: (fair) and results: Clarity and 2: (poor) quality of writing: Related work: 5: (excellent) Implementation 2: (poor) and soundness: Evaluation: 3: (fair) Overall 0: (borderline paper) evaluation: *General comments and Summary* Scholia caters information about scientific works from Wikidata by querying the Wikidata Query Service. The paper mostly describes a set of user stories to illustrate the applicability of Scholia for the consumption of Geospatial data using Wikidata. While the paper does not produce any technical contribution per se, it presents interesting user stories illustrating the question answering capability of Scholia on geospatial data. The system -- Scholia is built mostly (or dominantly) around handling “nearby” queries using the geof:distance function. While the application of the system is of great value, the coverage of the system in terms of GeoSPARQL query features is a concern. The presented system -- Scholia is novel, however, the method/technique using which these queries (representing users interests or information need) are generated is not mentioned clearly. Whether this process is automatic or manual is also of interest to a reader. At this point the system appears to be too static or in an extremely early stage to be of benefit for an interested user (the user is not an expert in SPARQL or GeoSPARQL for that matter). *Introduction* Pg 1, Paragraph 2, Last line -> “whereas, in GeoSPARQL, the function with the same name takes a third argument for the unit.” => which one? And how is it different from geof:distance? *User stories* While the user stories are narrative, the GeoSPARQL queries used to generate the answers are not mentioned. This is not mission critical but would be very interesting to see/have. Though one can find the query from the mentioned links (from below the results page -> edit this query). I wonder whether these queries were handcrafted or generated automatically. In the case of handcrafted queries, it would be near impossible for users who are not experts in GeoSPARQL to be able to use the Scholia with any ease. If these queries are automatically generated based on the user interests, it should be mentioned how and where. No mao is generated for story 1 - https://tools.wmflabs.org/scholia/country/Q33/topic/Q2539 which would be ideal for the geospatial aspect of knowing from which city/area is a particular researcher, or at least the location of the research lab/group. Also, the reason for co-citation graph is reported to be empty but a clear reason is not mentioned. No map is generated for the story 2 - https://tools.wmflabs.org/scholia/location/Q1748/topic/Q2539. Also, no additional graphs or tables are reported apart from the list of authors in the descending order of their scores. The scoring function, is to be assumed to be based on the distance from the center of the Copenhagen? In general, this case inclines more towards a regular (non-GeoSPARQL) user story than geospatial. No map is generated for the story 3 - https://tools.wmflabs.org/scholia/location/Q3806/topic/Q52 . Why is this? Also similar case as of user story 3. Furthermore, “links do not work” is reported in stories 2 and 3, which is worth investigating. User story 4 is quite interesting, however, appears to be flawed. While the story description states “relevant scientific meetings”, the results wrt to WWW conference visitors report rather irrelevant points of interests in terms of people and publications, such as (1) Alessandro Vespignani Italian physicist, (2) Luciano Floridi Italian philosopher and (3) Combining Participatory Influenza Surveillance with Modeling and Forecasting: Three Alternative Approaches, etc. Also, no map is generated highlighting in which part of the city or nearby suburbs are these events taking place. For story 5, the administrator would be more interested in observing the patterns from Denmark to South Korea and not the reverse. 4. Peru Bhardwaj, Christophe Debruyne and Declan O'Sullivan. On the Overlooked Challenges of Link Discovery Review 1 PC member: Mohamed Sherif Reviewer's 5: (expert) confidence: Relevance: 4: (good) Impact of ideas 3: (fair) and results: Clarity and 5: (excellent) quality of writing: Related work: 2: (poor) Implementation 3: (fair) and soundness: Evaluation: 1: (very poor) Overall 0: (borderline paper) evaluation: The paper highlight four lessons learned during the process of interlinking Ordnance Survey Ireland (OSi) and DBpedia. The first lesson was concerning the difficulty of ontology matching of the datasets to be interlinked. The second lesson was regarding the incomplete results of SPARQL endpoints. The third lesson was concerning finding proper measures for comparing resources. Finally, the fourth lesson is pertaining to the difficulty of finding proper link specification. The paper is well structured and written in good English, which ease the understanding of the paper. I am not completely agree with the authors pertaining to some points they claim in the paper. Here, I list my points that could help in extending this paper (may be a to a complete resource paper or a dataset paper for SWJ): Lesson 1: The problem here is well known in the literature as the ontology matching problem. There is a massive amount of work to deal with his problem. For example see the DL-Learner paper [a] Lesson 2: I think the problem here was that you tried to issue 1 SPARQL query to get the whole result set in one go, which is only possible if your result set is less than the internal endpoint limit (let us assume it is n). In case your result set is greater than the endpoint internal limit n, you get only n random resources. To overcome such a limit, you normally ask the endpoint programmatically using some paging techniques (using the LIMIT and OFFSET in your SELECT query). See (https://github.com/SmartDataAnalytics/jena-sparql-api). Lesson 3 and 4: I do not agree with your claim in Section 4.2 “Techniques to learn a link specification [7] proved to be unusable in our case study due to the non-overlapping nature of the two datasets to be interlinked.”. Have you tried to use machine learning in LIMES [c] to do your task? WOMBAT (the most recent machine learning approach in LIMES) provide supervised, active and unsupervised versions to do your task. I think even the unsupervised version of WOMBAT should give you good results. If not, try the supervised version of WOMBAT. Table 1: The number of returned links have no meaning without evaluating them. I know, that evaluating all the returned links would be a tedious task. Instead, you could pick a random 100 links to evaluate and compute accuracy for each of the measure. The same hold if you use WOMBAT and you want to evaluate its results. Other minor comments: - In Section 2, it would be good to reference [b] together with the original LIMES paper to emphasis the superiority of LIMES performance - In Section 4.2, you reference [8], while no such a reference exists [a] DL-Learner - A framework for inductive learning on the Semantic Web by Lorenz Bühmann, Jens Lehmann, and Patrick Westphal in Web Semantics: Science, Services and Agents on the World Wide Web [b] RADON - Rapid Discovery of Topological Relations by Mohamed Ahmed Sherif, Kevin Dreßler, Panayiotis Smeros, and Axel-Cyrille Ngonga Ngomo in Proceedings of The Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) [c] WOMBAT - A Generalization Approach for Automatic Link Discovery by Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo, and Jens Lehmann in 14th Extended Semantic Web Conference, Portoroz, Slovenia, 28th May - 1st June 2017 Review 2 PC member: Kleanthi Georgala Reviewer's 5: (expert) confidence: Relevance: 4: (good) Impact of ideas 2: (poor) and results: Clarity and quality of 3: (fair) writing: Related work: 2: (poor) Implementation 3: (fair) and soundness: Evaluation: 2: (poor) Overall 0: (borderline paper) evaluation: This paper gives an overview of the existing challenges of linking geospatial datasets. The authors begin by explaining the problem at hand, focusing on the pre- processing phase of Link Discovery (LD), thus the title of the paper itself does not describes well the content of their work. They continue by describing the issues involved with linking the OSi and DBpedia geospatial datasets. Then, they give a detailed overview of how they identified the needed resources from the two KBs and how they derived the datasets from the endpoints, along with the challenges they faced. Finally, the describe how they selected properties for matching and similarities measures to perform linking, along with the challenges they faced when using a declarative LD framework. Comments for each section: Abstract a case study in -> a case study of The word 'experience' is not a suitable word for research. Please replace it through out the whole document. The abstract reads well. However, the pre-processing phase of LD is far from being overlooked. In order to identify matching properties between two datasets, there has been plenty of related work in the domain of ontology matching. Please take a look at http://www.ontologymatching.org/relwork.html. Regarding the identification of LSs in order to link datasets, the framework you used (LIMES) incorporates 3 machine learning algorithms (wombat simple and complete, eagle) that can produce appropriate LSs. Introduction Provide citation in the first sentence. described in two parts -> divided in two parts The introduction reads well. For further comments, re-read my comments for your abstract. Section 2 I have no comments for Section 2. The preliminaries are well described and so are the challenges the authors faced. Section 3.1 Please try to avoid phrases like 'picking up' and 'figured out'. They are quite informal for a scientific document. The 'Counties_of_the_Republic_of_Ireland' is not present in DBpedia. It's equivalent is 'Counties_towns_in_the_Republic_of_Ireland'. You mention 5 different ways, however I read only 4. Also using Q_i is not appropriate, since you are not asking questions, but making remarks. Q_1: How did you deal with townlands that do not have the pattern in their article category? Q_2 and Q_4: How do you know that some resources are or aren't townlands? My general comments for this section is that you tried to perform ontology matching between predicates from your KBs. The whole idea seems to be quite arbitrary to me and based solely on assumptions and insights. As I mentioned above, you could have used an ontology mapping tool to carry out this work for you. Section 3.2 I completely disagree with Lesson 2. 1) SPARQL endpoints are not affected by which browser you use. 2) The reason you were getting different results sets has to do with how the query plan is executed internally. 3) There is no ordering in the retrieved result set of a SPARQL query unless you use the ORDER BY tag. This is not a problem of the Virtuoso or any other endpoint. This is how SPARQL queries operate. 4) It is not possible to return more than 1M of rows in SPARQL query result set over HTTP when using Virtuoso. There is a setting ResultSetMaxRows in virtuoso.ini, but if you specify something huge, you won't have more than 1048576 rows, which is 2^20. I believe this number is very high compared to the size of your retrieved datasets. If you run the same query from isql, you will get all the rows. So it is not a bug, it is a feature that has a very specific purpose. 5) If you wanted to retrieve all results with one query, you could set up Virtuoso locally and change the ResultSetMaxRows setting in the INI file. Section 4.1 The first paragraph contradicts Lesson 3. Section 4.2 You used LIMES for linking. LIMES incorporates 3 machine learning algorithms for learning LSs. Their unsupervised version does not require any training dataset and it is able to provide you with owl:sameAs links that you were targeting. Additionally, a very quick transformation of the WKT polygon to/from WKT points could have aided your work. Section 4.3 LIMES have a detailed manual of how a user can use the pre-processing functions (http://dice- group.github.io/LIMES/user_manual/configuration_file/data_sources.html). As you can see, there are plenty of examples of how to use the pre-processing functions. So your comment about not knowing that the quotes should be omitted is not reasonable if you have checked the manual. Additionally, each atomic LS which consists of one similarity function will retrieve a similarity between 0 and 1. so: 0 <= similarity_1 <= 1 (+) 0 <= similarity_2 <= 1 (=) 0 <= similarity_1 + similarity_2 <= 2 if you multiply the first two lines with 0.5, you get: 0 <= 0.5*similarity_1 <= 0.5 (+) 0 <= 0.5*similarity_2 <= 0.5 (=) 0 <= 0.5*similarity_1 + 0.5*similarity_2 <= 1 thus, if you have used the AND function in the correct way (http://dice- group.github.io/LIMES/user_manual/configuration_file/metric/metric_operati ons.html), then your similarity will be bounded to [0,1] My final comment on the paper is that the authors tried to assist a very interesting problem in linking geospatial datasets, however their work has not considered a lot of main points described in my comments above. It was a good initial work but it lacks insight at the problem at hand. Review 3 PC member: anonymous Reviewer's 4: (high) confidence: Relevance: 4: (good) Impact of ideas 2: (poor) and results: Clarity and quality of 4: (good) writing: Related work: 3: (fair) Implementation 3: (fair) and soundness: Evaluation: 3: (fair) 1: (weak accept) The paper describes a practical case of interlinking a subset of geospatial data from Ordnance Survey Ireland (OSi) with a reference public LOD dataset (DBpedia). The subset included entities of two types: counties and townlands. The authors describe the challenges arising with the selection of relevant data subsets to match, appropriate similarity measures, and special configuration parameters of the chosen link generation engine (LIMES). Overall Overall, the paper does not provide some novel research contribution. evaluation: However, it is interesting from the point of view of a practitioner’s experience with methods and tools developed by the research community. One aspect that is not fully clear to me is why the authors did not try the active learning extensions available for data interlinking tools (both LIMES and SILK as the better known ones). That could potentially save the effort of picking an appropriate string similarity measure. Overall, however, the paper shows well the issues arising with the usage of existing tools originated in the Semantic Web research community and highlights the need for the improved usability of tools. 5. Matthias Wauer and Axel-Cyrille Ngonga Ngomo. Towards a Semantic Message-driven Microservice Platform for Geospatial and Sensor Data Review 1 PC member: anonymous Reviewer's confidence: 4: (high) Relevance: 5: (excellent) Impact of ideas and 4: (good) results: Clarity and quality of 5: (excellent) writing: Related work: 4: (good) Implementation and 4: (good) soundness: Evaluation: 4: (good) 2: (accept) Profound description of the approach with high relevance for geo linked data. Evaluation of the approach still weak due to the ealy stage of the project. There are some remarks on the paper: 1. Keywords are missing. 2. Last paragraph of section 2 the mention of section 3 is missing The code snippets are not self-explaining. Can you reference some Overall evaluation: 3rd party publication or give some short explanation? Section 3: Do you have prove that it works? Did you already test the chain of services as explained in sec. 2.4? Section Acknowledgement - Please check regulations on publications for the funding program (Kommunikations-Toolbox) Spelling: Page 6: "value AMQP value" Page 9, section 4.2, 2nd paragraph: readonable -> reasonable Review 2 PC member: anonymous Reviewer's 3: (medium) confidence: Relevance: 4: (good) Impact of ideas and 4: (good) results: Clarity and quality 4: (good) of writing: Related work: 3: (fair) Implementation and 2: (poor) soundness: Evaluation: 2: (poor) 1: (weak accept) The paper presents a funded research project called GEISER that aims at developing a flexible and scalable platform for managing geospatial and sensor data. The proposed platform is based on a microservices architecture and integrates data using semantic technologies. It is based on the RabbitMQ implementation of the AMQP standard. The project is still at an early stage of development and the software implementation is still not complete. Hence, a proper evaluation of the platform is still not available. However, the paper and the project are definitely relevant and interesting for the workshop and its audience. The architecture of the platform is clearly described and some details of its current implementation are provided. The paper is well written and clear. An issue of the paper is that some sentences and claims are not properly backed up by a reference. E.g. in Introduction claims in the 2nd and 4th paragraphs should be supported by references. Similarly, in Section 2.3, 3rd paragraph, the choice of RabbitMQ as compared to Apache Kafka and other approaches is not fully explained or supported by objective evidence. As reg. the writing style, parts of Section 2 in particular, feel closer to Overall evaluation: project proposal writing rather than scientific paper writing. This could be easily tuned by adding more references or justifications for the design choices. Requirements for the platform have been collected by authors and project partners but they are not clearly discussed in the paper. The evaluation section is very short and only provides some pointers to papers where adopted tools have been evaluated independently. This section should be rather titled "Validation" and could be either extended a little or even included in Section 2 as a subsection. I am not sure the readers would be very interested in the Docker Compose configuration in Listing 1.3. I would say that probably they would be more interested in getting to know about the detailed requirements collected for the platform and how they are reflected into the design decisions. This could be interesting for the presentation at the workshop. Minor issues: - UnifiedViews has been replaced by Linked Pipes ETL, so its reference and comparison in Section 4.2 could be updated. - End of Introduction should also mention about the "Evaluation" and section 3. - The numbering of the Listings should be revised, not 1.x. Review 3 PC member: anonymous Reviewer's 4: (high) confidence: Relevance: 5: (excellent) Impact of ideas 4: (good) and results: Clarity and 4: (good) quality of writing: Related work: 3: (fair) Implementation 3: (fair) and soundness: Evaluation: 2: (poor) 1: (weak accept) The paper addresses the need for a flexible and scalable platform for handling integration of geospatial and sensor data. This work is an ongoing research project called GEISER for the creation of a platform for extraction, transformation, interlinking and fusion of such data. I understand that the project is in its initial stage but still I would appreciate some details on how the authors intend to extract, transform and link geospatial and sensor data as well as how will they evaluate such platform. Strong points: 1. The goal of the project is very sound and important to the community 2. The authors present three important use cases. Weak Points: 1. I would like to see more information about use cases such as; how do they Overall intend to integrate geographic information. What kind of data will they evaluation: integrate for the geomarketing? What kind of industrial data do they have / are willing to have and how? 2. Authors state that due to the early stage of the project, have focused only on functional evaluation according to the requirements gathered from the use cases. However, as there is still space a reader would like to know what are the requirements. 3. What was missing in the paper is a general workflow on how the authors intend to integrate and process the data on the fly. Maybe with an example its easer for the reader to understand the workflow. Are they willing to use Google data or Geoname / Dbpedia? What kind of social networking data will they use? 4. In the Related Work, you may also cite the EW-SHOPP project which has a similar objective as GEISER project. You can find more information on www.ew-shop.eu 5. Authors do not provide any additional information about how scalable this platform will be and how are they willing to evaluate it.