=Paper=
{{Paper
|id=Vol-1317/om2014_poster9
|storemode=property
|title=Enabling semantic search for EO products: an ontology matching approach
|pdfUrl=https://ceur-ws.org/Vol-1317/om2014_poster9.pdf
|volume=Vol-1317
|dblpUrl=https://dblp.org/rec/conf/semweb/KarpathiotakiDK14
}}
==Enabling semantic search for EO products: an ontology matching approach==
Enabling Semantic Search for EO Products: an Ontology Matching Approach? M. Karpathiotaki1 , K. Dogani1 , and M. Koubarakis1 National and Kapodistrian University of Athens, Greece {mkarpat,kallirroi,koubarak}@di.uoa.gr Access to Earth Observation (EO) products remains difficult for end-users. To address this, we developed the Prod-Trees platform1 [2], a semantically en- abled search engine for EO products. Users guide their search through a number of ontologies related to EO domain. To facilitate users in finding terms that fit better to their needs, we created mappings between these ontologies. In this pa- per, we present Pythia, an ontology matching system that utilizes and combines various matching techniques [1,3,4] to create mappings between two ontologies. Pythia is a combination of a string-based technique utilizing Apache Lucene’s features, a language-based technique based on WordNet, and a graph-based technique that uses the structure of the ontology and the mappings produced by the two previous techniques. The system supports SKOS ontologies. There- fore, the mappings are also expressed in SKOS using the defined properties for matching concepts: skos:exactMatch, skos:relatedMatch, skos:broadMatch, and skos:narrowMatch. Based on these, we create four different types of mappings. A terminological matcher is responsible for implementing the string- and language-based techniques, both applied on the concepts labels (skos:prefLabel, skos:altLabel and skos:hiddenLabel ). The mappings created by this component can either be skos:exactMatch or skos:relatedMatch. The string-based technique uses Lucene for indexing and searching. With Lucene, one can create documents and add fields of a specific type to these docu- ments. When searching the documents, the user can specify which field he wants to search. Taking advantage of Lucene capabilities, the terminological matcher indexes the target ontology. A new document is created for each concept and each available property of the concept is added as a new field. String normalization functions are applied to the field and unnecessary stop words are removed. When searching for concepts similar to concept A (from the source on- tology), the prefLabel, altLabel, and hiddenLabel fields of the indexed ontol- ogy are searched using the prefLabel of concept A. The search results fetched back, are ranked according to the string similarity of the compared strings (e.g., skos:prefLabel of A and the prefLabel field of a document). This is feasible due to the string similarity functions implemented in Lucene. Also, since each field is indexed, only the index of the specified field is searched, and not all the concepts. Lucene returns multiple related results. If the two strings are the same, a skos:exactMatch is created between A and the corresponding concept from the ? This work was supported by the Prod-Trees project funded by ESA ESRIN. 1 A video demonstrating the functionalities of the Prod-Trees platform is available at http://bit.ly/ProdTreesPlatform. target ontology. Otherwise, and only if one string is a substring of the other (e.g., “Elevation” and “Digital Elevation Model”), a skos:relatedMatch is created. The language-based technique uses WordNet, a lexical database for En- glish. The technique is optional and can be bypassed, as it adds noise to the results. Putting WordNet to use, a new field, called relLabel, is created in the Lucene document of each concept. relLabel enhances each concept’s labels, by adding synonyms and other related words found in WordNet. During the search, the relLabel fields of the documents are searched, and if a similarity is discovered, a skos:relatedMatch relation is created between the corresponding concepts. In case there are concepts from the source ontology with no skos:exactMatch mappings, a structural matcher is invoked. This component implements a graph-based technique creating either skos:narrowMatch or skos:broadMatch map- pings. Taking as input a concept A from the source ontology, the matcher finds all the broaders and narrowers of A. Afterwards, it checks whether a skos:exactMatch was created by the terminological matcher for one of these concepts. If it did, then a new mapping can be derived. For example, if a skos:exactMatch exists between concept B (which is a broader of A) and con- cept B’(from the target ontology), then it can be derived that B’ will be a skos:broadMatch of A. Similarly, we can create a skos:narrowMatch. The matcher also checks whether the concepts B and N hold skos:narrowMatch or skos:broadMatch relations with concepts from the target ontology. If a skos:broadMatch exists between B and a concept B”, then it is safe to con- clude that B” will also be a skos:broadMatch of A. This means that when a skos:broadMatch exists between a concept B from the source ontology and a concept B” from the target ontology, then this relation can be propagated to concept’s B narrowers. Similarly, a skos:narrowMatch between a concept N and a concept N”, can be propagated to concept’s N broaders. In any other case, no mappings can be derived. When all the concepts are examined, if new map- pings were created by the structural matcher, the described process is repeated. Otherwise, Pythia proceeds with the exportation of the mappings to RDF. Despite the simplicity of the techniques, the results are quite satisfying. Es- pecially, the performance of the language-based technique, which allows tuning WordNet. By stating the types of relations WordNet discovers for a given word, it gives control over the percentage of valid mappings. A higher degree of trust for the final results can be gained with extensions such as a user-evalutation process and the use of domain-specific vocabularies coupled with Wordnet. References 1. Euzenat, J., Shvaiko, P.: Ontology Matching (2007) 2. Karpathiotaki, M., et. al.: Prod-Trees: Semantic Search for Earth Observation Prod- ucts. In: ESWC. LNCS, Springer (2014) 3. Nagy, M., Vargas-Vera, M.: Towards an Automatic Semantic Data Integration: Multi-agent Framework Approach. In: Semantic Web. InTech (2010) 4. Pirro, G., Talia, D.: An approach to Ontology Mapping based on the Lucene search engine library. DEXA ’07 (2007)