Aligning Unions of Concepts in Ontologies of Geospatial Linked Data Rahul Parundekar, José Luis Ambite, and Craig A. Knoblock Information Sciences Institute and Department of Computer Science University of Southern California 4676 Admiralty Way, Suite 1001, Marina del Rey, CA 90292 {parundek,ambite,knoblock}@usc.edu Abstract. It is evident from the recent growth in Geospatial Linked Data that even though the number of instances being generated and linked has increased drastically, the ontologies behind these sources re- main disconnected. Though we can agree that the instances being linked are equivalent, the alignments that are extrapolated from these links between the concepts may or may not agree with our intuitions. It is important to investigate how the concepts in the sources are actually aligned. Our previous work was successful in finding alignments, such as equivalence and subset relations, between concepts of two sources, us- ing the instances that are linked as equal. Such alignments need not be trivial, however, as a concept in the ontology might not have an exact equivalent class in the other source. In this paper we propose a method that uses the subset and equivalence relations between restriction classes found by our previous work to find new alignments, where one (larger) concept of a source is aligned to the union of multiple (smaller) concepts from another source. We also show that we can use these alignments to find inconsistencies and use them to identify the instances that may be erroneously aligned. 1 Introduction The Web of Linked Data has seen huge growth in the past few years. As of September 2010, the size of the Linked Open Data Cloud was about 28.5 billion triples with around 20.6% of the triples belonging to the geospatial domain.1 As of June 2009, the cloud had recorded an overall growth of about 300% with 91% growth in the geospatial domain.2 Out of the 16 geospatial data sources covered in the September 2010 count, there are around 16.5 million outgoing links to other sources. The sources of Geospatial Linked Data are most popularly con- nected using the owl:sameAs property, linking instances that are the same. As more alignments are generated in the Web of Linked Data at the instance level, a pattern of inter-linked data arises where the ontologies behind the sources 1 http://www4.wiwiss.fu-berlin.de/lodcloud/state/ 2 http://events.linkeddata.org/ldow2011/slides/ldow2011-slides-intro.pdf 2 Rahul Parundekar, José Luis Ambite, and Craig A. Knoblock remain un-linked. As described in our previous papers on Linking and Build- ing Ontologies of Linked Data [7] and Aligning Ontologies of Geospatial Linked Data [6], an extensional technique can be used to generate alignments between the ontologies behind these sources. In these papers, we introduce a concept of restriction classes, which is similar to that of single value constraints on property restrictions of the Web Ontology Language (OWL) to increase the expressivity of sources with a rudimentary ontology. By looking at the set containment rela- tionships of the instance sets of these restriction classes, we find equivalent and subset alignments between the two sources. Though the equivalent alignments found are precise in finding similar concepts between the two sources, the subset relations found, though informative, are too numerous to be effectively used. Reviewing these subset relations we discovered that there are potential equiv- alent alignments not found by our previous work, linking a larger concept to a union or aggregation of one or more of its subsets. Using this as motivation, the work described in this paper builds on the ontology alignment method of [7]. Picking up where we left off, the approach described in this paper uses the subset relations as hints to create a union of smaller restriction classes, by virtue of a common property and restriction classes with only a single property-value pair, which guides the aggregation and then performs set containment operations with the larger restriction class from the other source. Using this method, we explore three Geospatial Linked Data sources - GeoNames, DBpedia, & Linked- GeoData and try to find new alignments between GeoNames & DBpedia and LinkedGeoData & DBpedia, where a larger subsuming restriction class from one source can be explained by an aggregation of smaller restriction classes from the other source. The scope of this paper is in the domain of Geospatial Linked Data, where we find alignments between three sources: GeoNames, DBpedia and LinkedGeo- Data. We first find equivalences and subset relations as described in our previous work, and then use these to find the new union alignments. The nature of each of the three sources investigated is briefly mentioned here and they are described in more detail in [7]. GeoNames is a geographic source with a flat-file like ontology where all instances belong to a single concept of Feature and have associated Feature Class & Feature Code property to identify the instances as mountains, lakes, etc. Although DBpedia is a Linked Data source that covers domains other than the geospatial domain, there are a large number of instances from GeoN- ames linked to those in DBpedia using the owl:sameAs property. We also try to find alignments between the ontologies behind LinkedGeoData and DBpedia. RDF data in LinkedGeoData is derived from the Open Street Map initiative and has links to DBpedia.3 This paper is organized as follows. We first describe briefly our alignment algorithm from [7] along with the limitations of the results that were generated. We then explain our approach to finding alignments between a larger concept from one source and the union set of multiple smaller concepts from the other source. This is followed by identifying the outliers of these alignments that high- 3 http://linkedgeodata.org/Datasets Aligning Unions of Concepts in Ontologies of Geospatial Linked Data 3 light the inconsistencies and the instances that are erroneously linked. We then describe the experimental results that contain the new alignments discovered in these data sources, along with their outliers. Finally, we describe other related work and conclude with our observations and future work. 2 Aligning geospatial ontologies on the Web of Linked Data The work described in this paper follows our previous work on aligning ontologies of Linked Open Data, which uses an extensional approach to find alignments between restriction classes in two different sources. Though the results generated by our previous algorithm found equivalent alignments between the two sources, a large number of subset alignments were also found. A pattern was observed in these results, where a group of concepts from one source were subsets of the same larger concept from the other source. In many cases these smaller concepts taken together were able to completely explain the larger source. We used this insight as motivation for consuming the subset relations, which were too numerous to be useful by themselves, to find alignments between the larger concept and the union of the group of concepts. Our approach uses this group of smaller concepts and introduces a disjunction operator on these subsets to try to define the common subsuming concept. 2.1 Our previous work on linking and building ontologies of Linked Data Ontologies of Linked Data sources can be quite rudimentary. For example, GeoN- ames only has a single concept (Feature) to which all of its instances belong. On the other hand, in DBpedia, we find a rich ontology with a hierarchy of concepts and well-defined properties. In the traditional sense of ontology alignment, we would have found at most a single alignment between Feature on the GeoNames side and a similar broad concept from DBpedia. In order to get a richer set of alignments, we introduced the concept of a restriction class. A restriction class is a concept that is derived extensionally and defined by the set of instances ob- tained by restricting a single property to a single value (called a property-value pair and represented by (pi = vi )) in a source. For example, a restriction class for schools can be constructed in GeoNames by forming a set of instances that have their geonames:featureCode restricted to ‘S.SCH ’. This restriction class is repre- sented as geonames:featureCode=S.SCH. The scope of the definition of a restric- tion class includes the conjunction operator, which produces a more specialized set of instances, constructed using two or more restriction classes. Thus, a restric- tion class {geonames:featureCode=S.SCH & geonames:countryCode=US}, built from the restriction classes geonames:featureCode=S.SCH and geonames:countryCode- =US, can be defined by the intersection of the two sets and forms a concept extensionally described by the set of schools in the US in GeoNames. Our algorithm aligns restriction classes from two sources, using an exten- sional technique, as follows. A pre-processing step first performs an inner-join 4 Rahul Parundekar, José Luis Ambite, and Craig A. Knoblock on the two sources to be aligned based on an instance equivalence property like owl:sameAs. As inverse functional properties can only result in restriction classes with a single instance belonging to it, the pre-processing step elimi- nates them. The crux of the algorithm uses a top-down tree exploration of the space of alignment hypotheses. At the topmost level, a seed hypothesis is gen- erated by aligning a restriction class with one property-value pair from the first source with another restriction class with one property-value pair from the sec- ond source. At each level in the search space, a new restriction class is formed from one restriction class of one of the sources by adding another property-value pair constraint on that restriction class. A new alignment hypothesis is thus constructed from the new restriction class and the restriction class from the other source. Each alignment hypothesis is tested for set containment relations between the intersection set of the restriction classes from both sources. This is done with the help of two scoring functions - P & R. If r1 and r2 are the two restriction classes in the alignment hypothesis, we first define Img(r1 ) as the set of instances in the second source that instances of r1 are linked to. We then define P as |Img(r 1 )∩r2 | |r2 | , and R as |Img(r 1 )∩r2 | |Img(r1 )| . We mark the relation of the alignment hypothesis as either i) equivalent (P = 1, R = 1), ii) subset, with the restriction class from the first source as extensionally subsuming the restriction class from second source (R = 1), iii) subset, with restriction class from second source extensionally subsuming the restriction class from first source (P = 1) or iv) no relation between the two restriction classes. To compensate for missing and misaligned instances, we relax our subset scores by defining P 0 and R0 that reduce the required fraction of support to be greater than 0.9 instead of equal to 1. For an optimal exploration of the search tree, we employ certain pruning mechanisms that include i) using ordered exploration to avoid exploring a node twice, ii) pruning a node if the intersection set of the restriction classes of the hypothesis has size less than a minimum support size (we used 10 in our ex- periements), iii) pruning a node if the added restriction class does not change the set of instances, etc. After the brute-force exploration of the search space of alignment hypotheses, we use a post-processing step on the results generated, which removes redundant assertions by virtue of set containment of instances of two hypotheses where one is the immediate parent of the other in the search tree. At the end of the above three steps of processing, the algorithm was able to find equivalent relations between restriction classes from two sources as well as subset relations in either direction. As this algorithm was not specific to any particular domain, we explored candidate sources for alignments in three domains: Geospatial, Genetics and Zoology. In these three domains, our algo- rithm found alignments of 5 pairs of sources. For example, we were able to find alignments between GeoNames and DBpedia in the Geospatial domain. One such alignment was the equivalent relation between {geonames:countryCode=ES} and {dbpedia:country=Spain} (i.e. correctly aligning the concepts for the country Spain). We also found subset relations like {geonames:featureCode=S.SCH} sub- set of rdf:type=dbpedia:EducationalInstitution. More such results are described in [7]. Aligning Unions of Concepts in Ontologies of Geospatial Linked Data 5 Limitations The approach above produced a large number of equivalent align- ments that gave an exact mapping between the two restriction classes from the two sources. It also, however, produced a large number of subset rela- tions that were not as useful. This was mainly because the subset relations, by themselves, did not contribute to a useful equivalence alignment between two classes. In all, in the GeoNames and DBpedia alignment, there were 1647 subset relations found. Though it is understandable that in many cases there might never exist an exact equivalence between two restriction classes, be- cause they were auto-generated using property-value pairs, we decided to look for additional useful alignments, if any, that these subset relations might be able to provide us. For example, in the GeoNames and DBpedia alignment, we found that {geonames:featureCode=S.SCH}, {geonames:featureCode=S.SCHC} and {geonames:featureCode=S.UNIV} (i.e. Schools, Colleges and Universities from GeoNames) are all subsets of {rdf:type=dbpedia:EducationalInstitution}. Taken individually, though each of these alignments are correct and insight- ful, they are not particularly useful in understanding the relationships between GeoNames and DBpedia. Taken together, however, we found that the union of these three restriction classes completely define rdf:type=dbpedia:EducationalInstitution. The limitation of our approach was in the expressivity of our restriction classes. Though it included restriction classes containing single property-value pairs and the conjunction operator on those restriction classes, it did not include a dis- junction operator and hence was unable to make use of the subset relations. 2.2 Identifying spatial concept coverings As explained above, we were able to identify a pattern where a group of restric- tion classes from one source were aligned as subsets of a common concept from the other source. By using these alignments as hints, we were able to construct the union of the smaller restriction classes and detect if the union was able to define the larger class entirely. The following section describes this method in detail. In those cases where we are not able to define the larger class entirely, our approach is also able to find and explain the missing instances (outliers). Mapping a restriction class from one source with a union of smaller restriction classes from the other source Since the problem of finding alignments with conjunctions and disjunctions of property-value pairs of restric- tion classes is combinatorial in nature, we focus only on subset relations where both restriction classes have a single property-value pair and where one is a sub- set of the other. This helps us find the simplest definitions of concepts and also makes the problem tractable. Alignments generated by our previous work that satisfy the single property-value pair constraint are first grouped according to the subsuming restriction classes. We then identify a strategy for selecting the smaller restriction classes from within such a group to form the union that best describes the larger restriction class. Since restriction classes are constructed by forming a set of instances that have one of the properties restricted to a single value, aggregating restriction classes from the group according to their 6 Rahul Parundekar, José Luis Ambite, and Craig A. Knoblock properties builds a more intuitive definition of the union. We can now define the disjunction operator that constructs the union concept from the smaller restriction classes in these sub-groups. The disjunction operator is defined for restriction classes, such that i) the concept formed by the disjunction of the restriction classes represents the union of their set of instances, ii) each of the restriction classes that are aggregated contain only a single property-value pair and iii) the property is the same for all those property-value pairs. We then try to find the alignment between the larger common restriction class and a set of restriction classes from the other source that are aggregated by the disjunction operator by using an extensional approach similar to our previous paper. We call such an alignment as union alignment. We first build candidates for aggregation using the results from our previ- ous algorithm as hints. We group alignments by the larger common restriction class. Grouping the subset relations is trivial. Equivalence relationships are sub- sets in both directions and thus are easily integrated into the groups. For each alignment, {p1 =v1 } is the r1 part and {p2 =v2 } forms the r2 part (each with a single property-value pair ) as explained in the previous section. Sub-groups are formed by aggregating according to the property of the property-value pairs of the smaller restriction classes. Such a sub-group is identified by {Property of the larger restriction class(p1 ), Value of the larger restriction class(v1 ), property of the smaller restriction classes(p2 )}. Values of the different smaller restriction classes can be denoted by a list List(v2 s). The disjunction of the smaller re- striction classescreates a set of instances that extensionally identifies the union concept. We can now either confirm or refute the hypothesis that the larger restriction class is equivalent to the union concept. We can do this by using a scoring mechanism similar to the use of P & R in our previous paper. Us- ing the same terminology, UA is defined as the set of disjunctive instances (i.e. Union(Img(r1 ) ∩ r2 ))), UL is defined as the set of instances of the larger class taken by itself (i.e. Img(r1 )) and US is defined as the set of instances that is the union of individual smaller restriction classes(i.e. Union(r2 )). The scoring mechanism defines PU as U UA 0 0 US and RU as UL . PU & RU are defined as fractions A 0 0 with relaxed scoring assumptions similar to P & R from our previous paper. For example, our previous algorithm finds that {geonames:featureCode = S.SCH}, {geonames:featureCode = S.SCHC}, {geonames:featureCode = S.UNIV} are subsets of {rdf:type=dbpedia:EducationalInstitution}. In this case, the sub- group can be identified as {rdf:type, dbpedia:EducationalInstitution, geonames:featureCode} and list as (S.SCH, S.SCHC, S.UNIV). As can be seen in the Venn diagram of Figure 1, UL is the restriction classImg({rdf:type = dbpedia:EducationalInstitution}), US is {geonames:featureCode = S.SCH} ∪ {geonames:featureCode = S.SCHC} ∪{geonames:featureCode = S.UNIV} and UA is: {Img({rdf:type = dbpedia:EducationalInstitution}) ∩ {geonames:featureCode = S.SCH}} ∪ {Img({rdf:type = dbpedia:EducationalInstitution}) ∩ {geonames:featureCode = S.SCHC}} ∪ {Img({rdf:type = dbpedia:EducationalInstitution}) ∩ {geonames:featureCode = S.UNIV}} Aligning Unions of Concepts in Ontologies of Geospatial Linked Data 7 Ideally, for an exact equivalence alignment, PU0 & RU 0 should both be 1.0, if the larger restriction class covers the union of the smaller restriction classes completely and vice-versa. However, similar to the relaxed score assumption from our previous paper to accommodate errors in the dataset, we consider it a complete coverage when the score is greater than a relaxed score of 0.9. (i.e. the union alignment is considered to be equivalent if PU0 > 0.9 & RU 0 > 0.9). Due to the minimum support score constraint for subsets from our previous paper, we are assured that U 0 4 US i.e. PU is always going to be greater than 0.9. Thus, we can A 0 say that a union alignment is equivalent if RU > 0.9. With the educational insti- 0 tutions example, RU for the alignment of dbpedia:EducationalInstitution to the union of S.SCH, S.SCHC & S.UNIV is 0.98. We can thus confirm the hypoth- esis and consider this union alignment equivalent. The scores for other union alignments found are described in the results section. Key: Img(r1) : Educational Institutions from Dbpedia Union(r2): Schools, Colleges and Universities from Geonames. S.SCH S.SCHC Schools from Geonames. Colleges from Geonames. S.UNIV Universities from Geonames. Img(r1) Outliers. EducationalInstitution Union(r2) Fig. 1. Spatial covering of Educational Institutions from DBpedia Using mappings to identify outliers As mentioned above, the score for the alignment of {rdf:type = dbpedia:EducationalInstitution} to the union of {S.SCH, S.SCHC & S.UNIV} is approximately 0.98. For {rdf:type = dbpe- dia:EducationalInstitution}, 396 instances out of the 403 Educational Institu- tions were accounted for as having their geonames:featureCode as one of S.SCH, S.SCHC or S.UNIV to give this score. An interesting question to pose then is, how are the remaining 2% of the dbpedia:EducationalInstitutions (i.e. 7 in- stances) classified in GeoNames? While calculating the disjuncted restriction classes, we also keep track of other instances with the same {p1 , v1 , p2 } but not previously considered as sub- sets. These had been pruned in the exploration stage as they either had a size of less than the minimum support size constraint of ten instances or had P 0 less than 0.9. For the first type of restriction classes, those with low support size but yet having P 0 greater than 0.9 are now re-classified as subsets. The 4 It should also be noted that each of the smaller subsets also satisfy the minimum support size of 10 instances. 8 Rahul Parundekar, José Luis Ambite, and Craig A. Knoblock re-classification of the relation as a subset can now be justified due to increased evidence in suggesting subsumption as other values for the same property are also aligned as subsets of the larger restriction class from the first source. The second type of restriction classes that had P 0 less than 0.9 along with the ones that were not re-classified above (i.e. with less than 10 instances and P 0 less than 0.9) form the outliers. For example, as mentioned before, schools, colleges and universities from GeoNames make up 396 out of 404 Educational Institutions from DBpedia. From the other eight instances, 7 have their feature codes as ei- ther S.BLDG (3 buildings), S.EST (1 establishment), S.HSP (1 hospital), S.LIBR (1 library) or S.MUS (1 museum). The eighth instance does not have a geon- ames:featureCode property asserted. The P 0 score of these restriction classes is less than 0.9. One of the instances classified as dbpedia:EducationalInstitution in DBpedia is linked to an instance in GeoNames that has geonames:featureCode as ‘S.HSP’. 5 There are 31 instances in {geonames:featureCode=S.HSP}, however, and because this restriction class does not meet the relaxed subset score thresh- old, it cannot be considered in the union of restriction classes. Another example of outliers was found in the {dbpedia:country = Spain ≡ geonames:countryCode = ES} alignment. This equality was found using the relaxed subset assumption, where 3917 of the 3918 instances of dbpedia:country=Spain were accounted for as having geonames:countryCode=ES, resulting in a subset score of 0.9997. The one instance not having country code ES was actually classified as having country code IT (Italy). This single instance needs to be inspected further and it needs to be determined if the owl:sameAs link is correct. It is evident from the above examples that the outliers help in understanding the nature of the sources more explicitly, showing why the alignments failed to completely describe the larger restriction class. These, along with a few other examples, are described in detail in the next section. 3 Experimental Results From the approach described in Section 2.2, we were able to get a total of 752 union alignments for the GeoNames-DBpedia alignment and 5843 for the Linked- GeoData-DBpedia alignment. From the 752 in GeoNames-DBpedia, 318 are such that the larger restriction class is from DBpedia, while the other 434 have the larger restriction class from GeoNames. Similarly, 3097 from the 5843 union alignments in LinkedGeoData-DBpedia have the larger restriction class from DBpedia, while the other 2746 have the larger restriction class from GeoNames. Tables 1, 2, 3, & 4 list a few interesting examples of these union alignments between GeoNames-DBpedia and LinkedGeoData-DBpedia (in either direction), which we describe here. The tables are organized as follows. Column 2 describes the sub-group, i.e. (p1 ,v1 ,p2 ). Column 3 contains the list of the value part of the property-value pairs in the restriction classes of the smaller sets (i.e. List(v2 )). The score of the union is noted in column 4 (RU 0 = |U A| |UL | ) followed by |UA | and 5 Intuitively, it would make sense to the reader that this instance might perhaps be a hospital of a medical school. Aligning Unions of Concepts in Ontologies of Geospatial Linked Data 9 |UL | in columns 5 and 6. Column 7 describes the outliers, i.e. values of v2 that form restriction classes that aren’t direct subsets of the larger restriction class. Each of these values also has a fraction with the number of instances that do belong to the larger restriction class of the total number of instances of the restriction class (or |Img(r |r2 | 1 )| ). It can be seen that the fraction is less than our relaxed subset score. If the value of this fraction was greater than the relaxed subset score (i.e. 0.9), the set would have been included in column 3 instead. The last column mentions how many of the total UL instances we were able to explain using UA and the outliers. For example, the union alignment of #1, is the Educational Institution example described before. It shows how educational institutions from DBpedia can be explained by schools, colleges and universities 0 in GeoNames. Column 4, 5 and 6 explain the alignment score RU (0.98), the size UA (396) and the size of UL (404). The seven of the eight outliers found (S.BLDG, S.EST, S.LIBR, S.MUS, S.HSP) are mentioned along with their P 0 fractions in column 7. We also found some other interesting alignments. #2 shows the details of the Spain example mentioned briefly in Section 2.2. #3 shows a union alignment that aligns smaller sets or parts from GeoNames to a complete set. The region of Basse-Normandie in France is made up of three departments. The restric- tion classes of these three regions are constrained by the geonames:parentADM2 property. #4 shows that Airports and Airbases make up 99% of the airports in DBpedia. From its outliers, one might argue that Airfields (S.AIRF) should also be included, but it was not as its P 0 score was lower than the threshold. Outliers also show that there is a Hill in geonames that has been classified as an airport. Even though this instance may be an airport in the hills, ontologically it doesn’t make sense that a hill can be an airport. A similar case is observed in #8 where we find that there is at least one water tower in LinkedGeoData that is aligned with an Educational institution in DBpedia. The union alignment #5 should have been as straightforward as alignment #2. Our approach was able to detect a pattern, however, that might have been overlooked after looking at individual instances. Netherlands from GeoNames, for example, should be aligned with the country Netherlands from DBpedia. However we have possible alias names, such as The Netherlands and Kingdom of Netherlands, as well a possible linkage error to Flag of the Netherlands.svg gener- ated while importing Wikipedia data into DBpedia (the error seems systematic, see Jordan in #6). Alignment #7 was able to explain 8 of the 10 license plate codes in the state (bundesland) of Saarland6 . The ones that it missed were Ottweiler (OTW) and the police vehicle codes (SAL). Since the vehicle code SAL is not associated with any populated places in Saarland, it is quite possible that it does not get men- tioned in LinkedGeoData. Our approach thus provides a deeper insight into the nature of the sources. #9 tries to find the composition of the state of New Jer- sey. 100% of the instances in New Jersey from LinkedGeoData can be accounted 6 http://www.europlates.com/publish/euro-plate-info/german-city-codes 10 Rahul Parundekar, José Luis Ambite, and Craig A. Knoblock for in the 9 counties. New Jersey actually has 21 counties7 . This suggests that instances in New Jersey in LinkedGeoData that are linked to DBpedia are not a complete representation resulting in an equivalent alignment. The quality of the results generated by our extensional approach are tied to the quality of the instances in the dataset. We find, however, that such alignments, even though they might be partially incorrect, give an accurate representation of the actual instances in the dataset and highlight the practical quality of the links in the Web of Linked Data.8 Finally, alignment #10 describes how the concept Wa- terways in LinkedGeoData can be defined as the union concept of Streams and Rivers in DBpedia. The complete set of alignments discovered by our algorithm are available on our group page.9 4 Related Work Ontology alignment has been a well explored area of research since the early days of ontologies. It has received renewed interest in recent years with the rise of the Semantic Web. Euzenat & Shvaiko [3] provide a comprehensive dis- usssion on Ontology Matching approaches. A closely related area of study to ontology alignment is schema matching. Bernstein et al. [1] summarize the de- velopments in this field in the past ten years. Though most work done in the Web of Linked Data is on linking instances across different sources, an increasing number of authors have looked into aligning the sources ontologies in the past couple of years. Jain et al. [4] describe the BLOOMS approach which uses a central forest of concepts derived from topics in Wikipedia. An update to this is the BLOOMS+ approach [5] that aligns Linked Open Data ontologies with an upper-level ontology called Proton. Though we employ a simple set subsump- tion technique to identifying alignments, our use of restriction classes is able to find a large set of alignments in cases like aligning GeoNames with DBpedia or Proton, while BLOOMS & BLOOMS+ are unable to find alignments because of the small number of classes in GeoNames that have vague declarations. Cruz et al. [2] describe a dynamic ontology mapping approach called AgreementMaker that uses similarity measures along with a mediator ontology to find mappings using the labels of the classes. Building ontologies of Linked Data sources using a statistical method has also been described in Völker et al. [8]. This work induces schemas for RDF data sources by generating OWL 2 axioms using intermediate associativity table of instances and concepts (called transaction datasets) and mining associativity rules from it. 7 http://en.wikipedia.org/wiki/List of counties in New Jersey 8 In [7] we compared the extensional versus intensional perspective on ontology align- ment. In a nutshell, the extensional alignment gives a precise characterization of the current relationship between the data in the sources, regardless of the intended meaning of the concept definitions. For example, a source may define instances as universities, but linkage can show that it only contains American universities. 9 http://www.isi.edu/integration/data/UnionAlignments Table 1. Example alignments from the GeoNames and DBpedia datasets, with larger sets from DBpedia and smaller sets from GeoNames 0 A| # Sub-group {p1 , v1 , p2 } List(v2 ) RU = |U |UL | |UA | |UL | Outliers # Explained Instances 1 {rdf:type, S.SCH, S.SCHC, 0.9801 396 404 S.BLDG (3/122), S.EST (1/13), 403 dbpedia:EducationalInstitution, S.UNIV S.LIBR (1/7), S.HSP (1/31), geonames:featureCode} S.MUS (1/43) 2 {dbpedia:country, ES 0.9997 3917 3918 IT (1/7635) 3918 dbpedia:Spain, geonames:countryCode} 3 {dbpedia:region, geonames:2989247, 1.0 754 754 754 dbpedia:Basse-Normandie, geonames:2996268, geonames:parentADM2} geonames:3029094 4 {rdf:type, S.AIRB, S.AIRP 0.9924 1981 1996 S.AIRF (9/22), S.FRMT (1/5), 1996 dbpedia:Airport, S.SCH (1/404), S.STNB (2/5) geonames:featureCode} S.STNM (1/36), T.HLL (1/61) Table 2. Example alignments from the DBpedia and GeoNames datasets, with larger sets from GeoNames and smaller sets from DBpedia 0 A| # Sub-group {p1 , v1 , p2 } List(v2 ) RU = |U |UL | |UA | |UL | Outliers # Explained Instances 5 {geonames:countryCode, dbpedia:Netherlands, 0.9802 1939 1978 dbpedia:Kingdom of 1940 NL, dbpedia:The Netherlands, the Netherlands dbpedia:country} dbpedia:Flag of the Netherlands.svg 6 {geonames:countryCode, dbpedia:Jordan 0.95 19 20 20 JO, dbpedia:Flag of Jordan.svg Aligning Unions of Concepts in Ontologies of Geospatial Linked Data dbpedia:country} 11 Table 3. Example alignments from the LinkedGeoData and DBpedia datasets, with larger sets from DBpedia and smaller sets from LinkedGeoData Rahul Parundekar, José Luis Ambite, and Craig A. Knoblock 0 |UA | # Sub-group {p1 , v1 , p2 } List(v2 ) RU = |UL | |UA | |UL | Outliers # Explained Instances 7 {dbpedia:bundesland, HOM, IGB, MZG, 0.93 46 49 46 Saarland, NK, SB, SLS, lgd:OpenGeoDBLicensePlateNumber} VK, WND 8 {rdf:type, lgd:Amenity, lgd:K2543, 0.9901 2609 2610 2609 dbpedia:EducationalInstitution, lgd:School, lgd:University, rdf:type} lgd:WaterTower Table 4. Example alignments from the LinkedGeoData and DBpedia datasets, with larger sets from LinkedGeoData and smaller sets from DBpedia 0 |UA | # Sub-group {p1 , v1 , p2 } List(v2 ) RU = |UL | |U A | |UL | Outliers # Explained Instances 9 {lgd:gnisST alpha, Atlantic, Burlington, 1.0 214 214 214 NJ, Cape May, Hudson, dbpedia:subdivisionName} Hunterdon, Monmoth, New Jersey, Ocean, Passaic 10 {rdf:type, dbpedia:Stream, 0.97 33 34 dbpedia:Place(1/94989) 34 lgd:Waterway, dbpedia:River rdf:type 12 Aligning Unions of Concepts in Ontologies of Geospatial Linked Data 13 5 Conclusions and Future Work We described an approach to identifying union alignments in geospatial data sources on the Web of Linked Data. By extending our definition of restriction classes with the disjunction operator, we were able to find alignments of union concepts from one source to larger concepts from the other source. Our approach produced union alignments as results that found that concepts at different levels in the ontologies of two sources can be mapped even when there was no direct equivalence. We were also able to find outliers that enable us to identify inconsis- tencies in the instances that are linked by looking at the alignment pattern. The results provide deeper insight into the nature of the alignments of Geospatial Linked Data. Though the scope of this paper is the geospatial domain, our algorithm can be used in other domains as well. Our next step is to explore other domains like zoology and genetics for union alignments. Other possible future work is in the mapping and understanding of the properties in the sources. Our preliminary findings show that the results of this paper can be used to find patterns in the properties. For example, the countryCode property in GeoNames is closely associated with the country property in DBpedia, though their ranges are not exactly equal. We believe that an in-depth analysis of the alignment of ontologies of sources is warranted with the recent rise in the links in the Linked Data cloud. This is an extremely important step for the grand Semantic Web vision. Acknowledgements This research is based upon work supported in part by the National Science Foundation under award number IIS-1117913. References 1. Bernstein, P., Madhavan, J., Rahm, E.: Generic schema matching, ten years later. Proceedings of the VLDB Endowment 4(11) (2011) 2. Cruz, I., Palmonari, M., Caimi, F., Stroe, C.: Towards on the go matching of linked open data ontologies. In: Workshop on Discovering Meaning On The Go in Large Heterogeneous Data. p. 37 (2011) 3. Euzenat, J., Shvaiko, P.: Ontology matching. Springer-Verlag (2007) 4. Jain, P., Hitzler, P., Sheth, A., Verma, K., Yeh, P.: Ontology alignment for linked open data. The Semantic Web–ISWC 2010 pp. 402–417 (2010) 5. Jain, P., Yeh, P., Verma, K., Vasquez, R., Damova, M., Hitzler, P., Sheth, A.: Contextual ontology alignment of lod with an upper ontology: A case study with proton. The Semantic Web: Research and Applications pp. 80–92 (2011) 6. Parundekar, R., Knoblock, C.A., Ambite, J.L.: Aligning geospatial ontologies on the linked data web. In: Proceedings of the GIScience Workshop on Linked Spatiotem- poral Data. Zurich, Switzerland (2010) 7. Parundekar, R., Knoblock, C.A., Ambite, J.L.: Linking and building ontologies of linked data. In: Proceedings of the 9th International Semantic Web Conference (ISWC 2010). Shanghai, China (2010) 8. Völker, J., Niepert, M.: Statistical schema induction. The Semantic Web: Research and Applications pp. 124–138 (2011)