=Paper= {{Paper |id=Vol-1220/paper2 |storemode=property |title=Integrating Distributed Configurations With RDFS and SPARQL |pdfUrl=https://ceur-ws.org/Vol-1220/02_confws2014_submission_3.pdf |volume=Vol-1220 |dblpUrl=https://dblp.org/rec/conf/confws/Schenner0PS14 }} ==Integrating Distributed Configurations With RDFS and SPARQL== https://ceur-ws.org/Vol-1220/02_confws2014_submission_3.pdf
                        Integrating Distributed Configurations
                              with RDFS and SPARQL
                 Gottfried Schenner1 and Stefan Bischof1 and Axel Polleres2 and Simon Steyskal1,2


Abstract. Large interconnected technical systems (e.g. railway net-         Figure 1 shows a typical scenario from the railway domain. The
works, power grid, computer networks) are typically configured with      individual stations of a network are built by different vendors (A, B,
the help of multiple configurators, which store their configurations     C). Vendors A and B use proprietary configurators (A, B) and store
in separate databases based on heterogeneous domain models (on-          the configurations of these stations in separate projects. Vendor C does
tologies). In practice users often want to ask queries over several      not use a configurator, therefore there is no (digital) data available to
distributed configurations. In order to reason over these distributed    integrate.
configurations in a uniform manner a mechanism for ontology align-          In the railway scenario the railway company owning the railway
ment and data integration is required. In this paper we describe our     network wants to obtain information about the whole network in a
experience with using standard Semantic Web technologies (RDFS           vendor-independent way. To achieve this, some form of ontology and
and SPARQL) for data integration and reasoning.                          data integration is necessary. We can identify three steps: (i) create
                                                                         a vendor-independent ontology, (ii) map or align the vendor-specific
                                                                         ontologies or schemas to the vendor-independent ontology, and (iii)
1     INTRODUCTION                                                       provide the vendor-specific data in terms of the vendor-independent
Product configuration [9] is the task of assembling a system from        ontology.
predefined components satisfying the customer requirements. Large           This paper investigates, how to use standard Semantic Web tech-
technical systems are typically configured with the help of multiple     nologies (RDFS, SPARQL and OWL) for data integration. Our ap-
configuration tools. These configurators are often specific to a tech-   proach uses SPARQL CONSTRUCT queries to generate a linked
nology or vendor and therefore use heterogeneous domain models           system view of the distributed configurations as depicted in Figure 2.
(ontologies).                                                            This system view can then (i) be queried in a uniform manner, (ii)
   For large interconnected systems (e.g. railway networks, power        be checked for contraint violations taking all relevant configurations
grid) the configuration of the overall system may be stored across       into account and (iii) be used for reasoning and general consistency
separate databases, each database containing only the information for    checks (cf. Figure 3).
a sub-system.
   The domain models and databases of these configurators are a
valuable source of information about the deployed system. But there
must be a way to access the information in an uniform and integrated
manner in order to exploit this.




                                                                         Figure 2: Integrating configurations with SPARQL CONSTRUCT
                                                                         queries into a linked system view.
                  Figure 1: Data integration approach
                                                                            The remainder of this paper is structured as follows: Chapter 2
1                                                                        discusses the preliminaries of this paper, especially the used Semantic
     Siemens AG Österreich, Siemensstrasse 90, 1210 Vienna, Austria
    {gottfried.schenner|bischof.stefan}@siemens.com                      Web technologies. Chapter 3 introduces the working example of this
2    Vienna University of Economics & Business, 1020 Vienna, Austria     paper, Chapter 4 shows how to derive an integrated view of the system
    {axel.polleres|simon.steyskal}@wu.ac.at                              from the individual configurator specific databases, in Chapter 5 we
                                                                             2.2      Querying with SPARQL
                                                                             SPARQL Protocol And RDF Query Language (SPARQL) [14] is the
                                                                             standard query language for RDF, which has become a W3C Rec-
                                                                             ommendation in version 1.1 in 2013. Its syntax is highly influenced
                                                                             by the previous introduced RDF serialization format Turtle [1] and
                                                                             SQL [4] a query language for relational data3 .
                                                                                Besides basic query operations such as union of queries, filtering,
                                                                             sorting and ordering of results as well as optional query parts, ver-
                                                                             sion 1.1 extended SPARQL’s portfolio by aggregate functions (SUM,
                                                                             AVG, MIN, MAX, COUNT,. . . ), the possibility to use subqueries,
                                                                             perform update actions via SPARQL Update and several other heavily
                                                                             requested missing features [23].
                                                                                Furthermore, it is possible to create entirely new RDF graphs based
                                                                             on the variable bindings constituted in graph patterns which are
                                                                             matched against one or more input graphs, using SPARQL CON-
                                                                             STRUCT queries. Using such CONSTRUCT queries offers the pos-
                                                                             sibility to easily define transformations between two or more RDF
                                                                             graphs/ontologies, which serves as a basic building block for the
                                                                             present paper.


                                                                             2.3      Semantic heterogeneity
Figure 3: Using a linked system view for querying and reasoning over
distributed configurations.                                                  In order to be able to integrate two or more ontologies into one in-
                                                                             tegrated knowledge base, it is mandatory to define correspondences
discuss, how to reason about the overall system with SPARQL queries          between the elements of those ontologies to reduce semantic hetero-
and we discuss related work in Chapter 6. Finally, we conclude our           geneity among the integrated ontologies [8].
paper in Chapter 7.                                                             The problem of semantic heterogeneity can be caused by several
                                                                             facts, e.g. that different ontologies model the same domain in dif-
                                                                             ferent levels of precision or use different terms for the same con-
2     PRELIMINARIES                                                          cepts [26] (e.g. a concept Computer is equivalent to another concept
The proposed approach builds heavily on Semantic Web standards               Device). Such “simple” differences can be detected by most of the
and technologies. Instance data is represented as RDF triples, domain        current state-of-the-art ontology matching systems like YAM++ [21]
models are mapped to domain dependent ontologies/vocabularies and            or LogMap [18]. However more complex heterogeneities (e.g. a con-
queries are formulated in SPARQL.                                            cept Subnet is equivalent to the union of the concepts Computer and
                                                                             Switch; or a property hasPort, which links a Computer to its Port,
                                                                             is equivalent to an attribute ownsPort, which contains the respective
2.1    Data representation with RDF                                          port as string representation) are not only more difficult to detect but
                                                                             also not supported by the majority of ontology matching tools [13,27],
                                                                             although a few approaches to tackle those problems exist [5, 6, 26]. A
                                                                             slightly different approach was followed by [24], where the authors
                                                                             propose a framework which defines executable semantic mappings
                                                                             between ontologies based on SWRL [16] rules and string similarity.
                    Figure 4: A simple RDF triple.                              Nevertheless, based on the absence of ontology matching tools
                                                                             which are capable of detecting such complex correspondences, we
                                                                             assume the presence of already known correspondences between
    The Resource Description Framework (RDF) [15] is a framework
                                                                             entities of the ontologies for our integration scenario.
for describing and representing information about resources and is
both human-readable and machine-processable. These abilities offer
the possibility to easily exchange information in a lightweight manner       3     WORKING EXAMPLE
among different applications.
                                                                             As working example4 a fictitious computer network is used and rep-
    In RDF every resource is identified by its URI and represented
                                                                             resented as UML class diagrams. Figure 5 shows the customer view
as subject - predicate - object triples, where subjects and
                                                                             (system view) of the network.
predicates are URIs and objects can either be literals (strings, integers,
                                                                                The following additional constraints hold for the system view:
. . . ) or URIs as shown in Figure 4. Additionally, subjects or objects
can be defined as blank nodes, these blank nodes do not have a               • In the computer network every computer has a unique address
corresponding URI and are mainly used to describe special types of           • A computer can be part of 1-2 subnets
resources without explicitly naming them. For example the concept            • A computer is part of exactly one project
mother could be represented as a female person having at least one
child.                                                                       3 All listings within this paper are serialized in Turtle syntax.
                                                                             4 The example ontologies and queries are available upon request from the first
                                                                                 author.
                                                                                  Table 1: Convert object-oriented data models to ontologies

                                                                            UML                      RDF/OWL

                                                                            class C                  URI(C) rdf:type owl:Class .
                                                                            C1 extends C             URI(C1) rdfs:subClassOf URI(C) .
                                                                            attribute A              URI(A) rdf:type owl:DatatypeProperty ,
                                                                                                     owl:FunctionalProperty ; rdfs:domain URI(C);
                                                                                                     rdfs:range TYPE(A) .
                                                                            assoc A(C1,C2)           URI(A) rdf:type owl:ObjectProperty; rdfs:range
                                                                                                     URI(C1); rdfs:domain URI(C2) .
                                                                            object O of class C      URI(O) rdf:type URI(C) .
                                                                            attributevalue A         URI(O) URI(A) VALUE(A) .
                                                                            for every tuple(O1,O2)   URI(O1) URI(A) URI(O2).
                         Figure 5: System Ontology                          in assoc A


• A project is some arbitrary subdivision of the whole network (e.g.               owl : DatatypeProperty ,
  building)                                                                        owl : FunctionalProperty ;
                                                                                  rdfs : domain ontoA : Device ;
• A subnet can be part of multiple projects                                       rdfs : range xsd : unsignedInt .

In the example there are 2 vendors (A and B), each providing their          ontoA : Device_slot1Connected rdf : type
own configurator. A project can be configured either with configurator          owl : ObjectProperty ;
A or configurator B. In both cases there is one configurator database          rdfs : range ontoA : Device ;
                                                                               rdfs : domain ontoA : Device .
for every project. None of the domain models contains the concept of
a subnet as found in the system view.                                       # instance data
   Figure 6a shows the domain model of configurator A. In the domain        ontoA : A1 rdf : type ontoA : InternalDevice ;
                                                                               ontoA : Device_address " 1 " ^^ xsd : unsignedInt ;
model of configurator A computers are called devices. Internal devices         ontoA : Device_slot1Connected
are the devices configured in the current project. External devices                      ontoA : A2 , ontoA : A3 ;
are devices of other projects that are directly connected to a internal        ontoA : Device_slot2Connected
device. These are needed to configure the network cards of the internal                  ontoA : B3 , ontoA : B4 .
device.                                                                     ontoA : A3 rdf : type ontoA : InternalDevice ;
   Figure 7a shows the domain model of configurator B. Vendor B                ontoA : Device_address " 3 " ^^ xsd : unsignedInt ;
realizes the computer network with switches. Computers can have                ontoA : Device_slot1Connected
                                                                                         ontoA : A1 , ontoA : A2 .
1 or 2 ports, which must be connected to a port of an switch. The
attribute external is set to ’true’ for elements that are external to the   ontoA : B1 rdf : type ontoA : ExternalDevice ;
current project.                                                                    ontoA : Device_address " 4 " ^^ xsd : unsignedInt ;
                                                                                    ontoA : Device_slot1Connected
                                                                                          ontoA : B2 , ontoA : A1 .
3.1    Converting object-oriented models to
                                                                            ontoA : B2 rdf : type ontoA : ExternalDevice ;
       ontologies                                                                   ontoA : Device_address " 5 " ^^ xsd : unsignedInt ;
Although using Description Logics for configuration has a long his-                 ontoA : Device_slot1Connected
                                                                                          ontoA : B1 , ontoA : A1 .
tory [10, 20, 28] in our experience large scale industrial configurators
mostly use some form of UML-like object-oriented formalisms. For
this paper we use the approach for converting object-oriented data
models and their instance data into RDF/OWL shown in Table 1. Be-           3.2     Unique Name Assumption and Closed World
cause of the clear correspondance between UML class diagrams and                    Assumption
OWL ontologies we depict ontologies also as UML class diagrams.             When converting the instance data of a configurator to RDF an iden-
   This conversion captures the bare minimum that is required for our       tifier (URI) for every object must be generated. Most product con-
data integration approach. See [29] for a more elaborate approach for       figurators impose the Unique Name Assumption, i.e. objects with
representing product configurator knowledge bases in OWL.                   different object-ID refer to different objects of the domain. In the
   Listing 1 shows a fragment of the class model of Figure 6a and the       example above we therefore know that ontoA:A1 and ontoA:A2 refer
instance data of Figure 6b in RDF & OWL5 .                                  to different Devices.
                                                                                RDF/OWL does not impose the Unique Name Assumption. This
                Listing 1: Ontology A with instance data
                                                                            is a desirable feature when reasoning about linked data. If one wants
# object model                                                              to integrate instance data from different sources using heterogeneous
ontoA : Device rdf : type owl : Class .
                                                                            ontologies, these ontologies will often refer to the same entity under
ontoA : InternalDevice rdf : type owl : Class ;                             different URIs. The same can happen, when we integrate multiple
   rdfs : subClassOf ontoA : Device .                                       interconnected configurations into one configuration.
                                                                                Figures 6b and 7b show the configurations of two projects (A and
ontoA : Device_address rdf : type
                                                                            B). Although every computer/device is only represented once in each
5 For the sake of simplicity, we omitted owl:DatatypeProperty and respec-   configuration, some computers/device are known in both projects
  tive project definitions.
                          (a) Ontology A                                                    (b) Instance data of Project A

                                       Figure 6: Ontology and instance data of Project A (Ontology A)




                          (a) Ontology B                                                    (b) Instance data of Project B

                                       Figure 7: Ontology and instance data of Project B (Ontology B)

i.e. the ExternalDevice ontoA:B1 and the Computer ontoB:B1 are          4.1     Creation of the system view
referring to the same real world object under different URIs.
   As a pragmatic solution for the Unique Name Assumption for this      As a first step in our data integration approach a system view of the
paper all URIs are treated as different, unless explicitly stated by    configurator specific instance data is created. This system view reflects
owl:sameAs.                                                             the view of the owner of the configured system and is completely
   Similar considerations apply to the Closed World Assumption. In      self contained i.e. does not contain any URIs of the domain specific
a configurator database one assumes that all components relevant        ontologies. To derive the system view from the proprietary configura-
to the current context are known. For instance in our example all       tor data we use SPARQL CONSTRUCT queries. Figure 6b shows a
the computers in the current project are known and one can use the      configuration of configurator A, Figure 7b shows a configuration of
Closed World Assumption to conclude that there are no other internal    configurator B. The projects of the two configurations are connected
computers. The same applies to external computers that are directly     via the subnet containing A1(C1), B1(C4) and B2(C5).
connected to a internal computer. But we cannot apply the Closed
World Assumption to the whole computer network, since we have           4.1.1    Creating instances
no information about how many projects and computers there are in
total.                                                                  To map an instance of the source ontology to a new instance of the
                                                                        target ontology we can either generate a new URI in the namespace
                                                                        of the target ontology or use blank nodes.
4   DATA INTEGRATION WITH SPARQL                                           The following example (cf. Listing 2) creates a computer in the
We followed an approach proposed in [7] which motivates the use         system ontology for every device of the source ontology A by creating
of SPARQL CONSTRUCT queries to perform data integration (i.e.           a new unique URI using a unique identifier of the target object (in this
based on known correspondences between ontologies, we are able to       case the attribute address).
translate their instance data to be conform with the structure of the      One advantage of using that approach is that for every instance
integrated ontology).                                                   only one URI will be created in the instance data and the order of
                                      Figure 8: Equivalence relations of subnets derived from Ontology A

                                                                           4.1.2    Complex mapping
                                                                           Sometimes it is more to convenient to use multiple URIs for the same
                                                                           instance, especially if there is no explicit representation of the concept
                                                                           of the object in the source ontology. These multiple URIs will then be
                                                                           related using owl:sameAs.
                                                                              In our example the concept of a subnet is not directly represented
                                                                           in ontology A. To create the subnets for instance data of ontology A a
                                                                           more complex query is necessary as depicted in Listing 4.

                                                                                      Listing 4: Creating subnets from instance data
                                                                           # C = abbreviation for URI of Computer
                                                                           # SIRI = abbreviation for URI of subnets
                                                                           CONSTRUCT {
                                                                             ? sub1 ontoSys : Subnet_computers ? c1 .
                                                                             ? sub2 ontoSys : Subnet_computers ? c2 .
                                                                             ? sub1 rdf : type ontoSys : Subnet .
Figure 9: Instance data of Project A and Project B (System Ontology)         ? sub2 rdf : type ontoSys : Subnet .
                                                                             ? sub1 owl : sameAs ? sub2 .
                                                                           }
executing the CONSTRUCT queries does not matter.                           WHERE {
                                                                             { ? d1 ontoA : Device_slot1Connected ? d2 .
             Listing 2: Instance creation with new URI                          ? d1 ontoA : Device_address ? a1 .
CONSTRUCT {                                                                     BIND ( CONCAT ( STR (? a1 ) , " _1 " ) AS ? sid1 )
  ? computer rdf : type ontoSys : Computer .                                 } UNION {
  ? computer ontoSys : Computer_address ? address .                             ? d1 ontoA : Device_slot2Connected ? d2 .
}                                                                               ? d1 ontoA : Device_address ? a1 .
WHERE {                                                                         BIND ( CONCAT ( STR (? a1 ) , " _2 " ) AS ? sid1 ) . }
  ? device ontoA : Device_address ? address .                                { ? d2 ontoA : Device_slot1Connected ? d1 .
  BIND ( URI ( CONCAT ( URISYS , STR (? address )))                             ? d2 ontoA : Device_address ? a2 .
        AS ? computer )                                                         BIND ( CONCAT ( STR (? a2 ) , " _1 " ) AS ? sid2 )
}                                                                            } UNION {
                                                                                ? d2 ontoA : Device_slot2Connected ? d1 .
                                                                                ? d2 ontoA : Device_address ? a2 .
   If in contrast blank nodes are used, every CONSTRUCT query                   BIND ( CONCAT ( STR (? a2 ) , " _2 " ) AS ? sid2 ) . }
generates a new blank node for a source object. Therefore we use             BIND ( URI ( CONCAT (C , STR (? a1 ))) AS ? c1 )
blank nodes only, when it is not possible or inconvenient to create a        BIND ( URI ( CONCAT (C , STR (? a2 ))) AS ? c2 )
unique URI for an instance. In our example, since there is no identifier     BIND ( URI ( CONCAT ( SIRI , STR (? sid1 ))) AS ? sub1 )
                                                                             BIND ( URI ( CONCAT ( SIRI , STR (? sid2 ))) AS ? sub2 )
for projects in the source ontology, new projects can be created with      }
the CONSTRUCT query shown in Listing 3.

            Listing 3: Instance creation with blank node
CONSTRUCT {
                                                                           5   USING THE INTEGRATED MODEL
    _ : p rdf : type ontoSys : Project .                                   The data of the different systems is available and expressed in terms of
    _ : p ontoSys : origin ? project .
}                                                                          a common ontology. We can now access the data in a uniform manner
WHERE {                                                                    and perform different kinds of operations. This section presents two
    ? project rdf : type ontoA : Project .                                 classes of use cases, namely posing queries over the whole system
}
                                                                           and checking constraints concerning several systems.
  By using the special object-property ontoSys:origin, we can
keep track what led to the construction of the blank node. This infor-
mation can then reused in subsequent CONSTRUCT queries.
5.1    Queries                                                            of subnet-URIs creates a URI for every connected port of a com-
                                                                          puter (Figure 8). A naive SPARQL query would count all distinct
After the data-integration the former heterogeneous data can now          URIs that refer to the same subnet (ontoSys : S11 , ontoSys : S22 ,
be queried in a uniform manner using only concepts of the system          ontoSys : S31 ) i.e. resulting in 3 instead of the expected answer 1.
ontology.                                                                 To fix this, one has to choose one representative for every element
           Listing 5: Example Quering the system model                    equivalence class induced by owl:sameAs and count the number of
                                                                          representatives. In our approach this is done by choosing the lexico-
# return all the addresses used in project
SELECT ? p ? address                                                      graphically smallest element.
WHERE {
  ? p ontoSys : Project_computers ? c .                                                 Listing 8: Example counting predicates
  ? c ontoSys : Computer_address ? address .                              # query without special treatment of sameAs
}                                                                         SELECT ( COUNT ( DISTINCT ? subnet ) AS ? numberofsubnets )
                                                                          WHERE {
                                                                            ? subnet a ontoSys : Subnet .
                                                                          }
5.2    Checking constraints                                               # result : numberofsubnets = 6
If one wants to query information specific to a domain ontology, this     # query with special sameas treatment
data is still accessible via the ontoSys:origin link. One use case for    # chooses the lexicographic first element
using the ontoSys:origin property, is to detect inconsistencies in        # as representation of the equivalence class
                                                                          SELECT ( COUNT ( DISTINCT ? first ) AS ? numberofsubnets )
the source data. For example if a subnet is part of two projects, for     WHERE {
every computer in that subnet, there must be two representations in the     ? subnet a ontoSys : Subnet .
source ontologies (In one of these projects the computer is external).      # first subquery
The following query checks this property.                                   { SELECT ? subnet ? first
                                                                               WHERE {
                                                                                ? subnet (( owl : sameAs |^ owl : sameAs )*) ? first .
                  Listing 6: Checking constraints                              OPTIONAL {
SELECT ? c ? o                                                                  ? notfirst (( owl : sameAs |^ owl : sameAs )*) ? first .
WHERE {                                                                         FILTER ( STR (? notfirst ) < STR (? first ))}
  ? project ontoSys : Project_computers ? c .                                   FILTER (! BOUND (? notfirst ))}
  ? sub ontoSys : Subnet_computers ? c .                                    }
  ? sub (( owl : sameAs |^ owl : sameAs )*) ? other .                     }
  ? project2 ontoSys : Project_subnets ? other .                          # result : numberofsubset = 2
  FILTER (? project !=? project2 )
  {                                                                          An alternative approach would be to replace all cliques of the RDF-
     ? c ontoSys : origin ? o .                                           graph linked by owl:sameAs with a new unique URI. We did not
  } MINUS {
     ? c ontoSys : origin ? o1 .                                          consider that because it requires a proprietary implementation and by
     ? c ontoSys : origin ? o2 .                                          replacing URIs, one loses information about the source of information.
     FILTER (? o1 !=? o2 )                                                For instance, if the instance data of two different configurators refer
  }
                                                                          to the same real-world object but have conflicting data-values for that
}
                                                                          object, both values and their sources must be communicated to the
  So far we checked the integrity of the instance data by writ-           end-user.
ing special SPARQL queries. Whenever these queries are not
empty a constraint violation is detected. Alternatively SPARQL            6   RELATED WORK
CONSTRUCT queries can be used to derive a special property
ontoSys:constraintviolation and record the reason for the in-             In order to successfully perform data or information integration using
consistencies.                                                            Semantic Web technologies two main issues have to be addressed,
                                                                          namely:
                  Listing 7: Constraint violations
CONSTRUCT {                                                               Ontology Mapping Tackling the difficulties of Ontology Mapping
  _ : cv ontoSys : constraintviolation ? c .                                (i.e. defining alignments between ontologies) extensive studies have
  _ : cv ontoSys : description                                              been taken out over the last couple of years [2, 11, 12, 19], mainly
      " inconsistent data " .                                               focusing on resolving heterogeneity among different ontologies or
}
...                                                                         data sources by detecting similarities amongst them.
                                                                          Ontology Integration Two main approaches can be identified for
                                                                            integrating different ontologies [22], (i) define an upper ontology
                                                                            which contains general concepts and properties for those in the
5.3    Special treatment of owl:sameAs
                                                                            underlying more specific ones and define mappings between them
As discussed in Chapter 3.2 OWL does not impose the unique name             and (ii) define alignments directly between underlying ontologies
assumpion (UNA). Therefore it is common to have different names             and use query rewriting for query support [3, 25].
(URIs) refer to the same real-world object. In that case they can be
linked via owl:sameAs. SPARQL is unaware of the special seman-               With its W3C Recommendation for version 1.1 in 2013 [14], in-
tics of owl:sameAs. This can be a problem, especially when using          troducing e.g. UPDATE queries and a revised entailment regime,
counting aggregates, since one usually wants to count the number          SPARQL has become more feasible to be used within information
of real-objects and not the number of URIs referring to it. Take for      integration scenarios and not only as query language for RDF data.
example a query counting the number of subnets. Our construction
  Our approach can be used for data integration of distributed config-        [7] Jérôme Euzenat, Axel Polleres, and François Scharffe, ‘Processing on-
urations and for reasoning about the consistency of the integrated sys-           tology alignments with sparql’, in Complex, Intelligent and Software
                                                                                  Intensive Systems, 2008. CISIS 2008. International Conference on, pp.
tem. It can not be used to solve (distributed) configuration problems.            913–917. IEEE, (2008).
For a CSP-based approach on how to solve distributed configuration            [8] Jérôme Euzenat, Pavel Shvaiko, et al., Ontology matching, volume 18,
problems see [17].                                                                Springer, 2007.
                                                                              [9] A. Felfernig, L. Hotz, C. Bagley, and J. Tiihonen, Knowledge-based
                                                                                  Configuration: From Research to Business Cases, Elsevier Science,
7   CONCLUSIONS                                                                   2014.
                                                                             [10] Alexander Felfernig, Gerhard Friedrich, Dietmar Jannach, Markus
When we started out writing this paper, we were looking for a                     Stumptner, and Markus Zanker, ‘Configuration knowledge represen-
lightweight approach for data integration for distribute configurations           tations for semantic web applications’, AI EDAM, 17(1), 31–50, (2003).
using standard Semantic Web technologies.                                    [11] Chiara Ghidini and Luciano Serafini, ‘Mapping properties of heteroge-
                                                                                  neous ontologies’, in Artificial Intelligence: Methodology, Systems, and
   In the present paper we show that using solely SPARQL and RDFS                 Applications, 181–193, Springer, (2008).
is sufficient for an approach that relies only on standards and makes        [12] Chiara Ghidini, Luciano Serafini, and Sergio Tessaris, ‘On relating
it easy to introduce new concepts and individuals on the fly using                heterogeneous elements from different ontologies’, in Modeling and
SPARQL queries. This is especially important for practical use cases,             Using Context, 234–247, Springer, (2007).
where it is unpredictable which information a customer will request          [13] Bernardo Cuenca Grau, Zlatan Dragisic, Kai Eckert, Jérôme Euzenat,
                                                                                  Alfio Ferrara, Roger Granada, Valentina Ivanova, Ernesto Jiménez-Ruiz,
about the configured system.                                                      Andreas Oskar Kempf, Patrick Lambrix, et al., ‘Results of the ontology
   We tested our approach with real-world data. On a standard                     alignment evaluation initiative 2013’, in Proc. 8th ISWC workshop on
Windows-7 laptop with 8 GB using the SPARQL-API of JENA 2.11.1                    ontology matching (OM), pp. 61–100, (2013).
a large database (>50000 instances) can be integrated in less than 5         [14] Steve Harris and Andy Seaborne, ‘Sparql 1.1 query language’, W3C
                                                                                  Reccomendation, 14, (2013).
minutes resulting in a RDF-graph with more than 500000 triples.              [15] Patrick Hayes and Brian McBride. Rdf semantics. W3C Recommenda-
   We also considered using OWL reasoners but could not find a                    tion, February 2004. http://www.w3.org/TR/rdf-mt/.
solver-independent way of creating new individuals. Nevertheless, for        [16] Ian Horrocks, Peter F Patel-Schneider, Harold Boley, Said Tabet, Ben-
future work we plan to look into using richer OWL ontologies, which               jamin Grosof, Mike Dean, et al., ‘Swrl: A semantic web rule language
would offer the possibility to use configurator specific concepts such            combining owl and ruleml’, W3C Member submission, 21, 79, (2004).
                                                                             [17] Dietmar Jannach and Markus Zanker, ‘Modeling and solving distributed
as part-subpart, resource, (hardware-)component etc.                              configuration problems: A csp-based approach’, Knowledge and Data
   As can be seen in the example SPARQL queries in this paper, some               Engineering, IEEE Transactions on, (99), 1–1, (2011).
queries ( especially the ones that take into account owl:sameAs prop-        [18] Ernesto Jiménez-Ruiz and Bernardo Cuenca Grau, ‘Logmap: Logic-
erties), are only understandable for a SPARQL expert. One approach                based and scalable ontology matching’, in The Semantic Web–ISWC
                                                                                  2011, 273–288, Springer, (2011).
for making queries more accessible for a SPARQL beginner would be            [19] Yannis Kalfoglou and Marco Schorlemmer, ‘Ontology mapping: the
to hide the special treatment of owl:sameAs from the inexperienced                state of the art’, The knowledge engineering review, 18(01), 1–31,
user by using query rewriting.                                                    (2003).
                                                                             [20] Deborah L. McGuinness and Jon R. Wright, ‘An industrial strength de-
                                                                                  scription logics-based configurator platform’, IEEE Intelligent Systems,
ACKNOWLEDGEMENTS                                                                  13(4), 69–77, (July 1998).
                                                                             [21] DuyHoa Ngo and Zohra Bellahsene, ‘Yam++: a multi-strategy based
Stefan Bischof and Simon Steyskal have been partially funded by                   approach for ontology matching task’, in Knowledge Engineering and
the Vienna Science and Technology Fund (WWTF) through project                     Knowledge Management, 421–425, Springer, (2012).
ICT12-015.                                                                   [22] Natalya F Noy, ‘Semantic integration: a survey of ontology-based ap-
                                                                                  proaches’, ACM Sigmod Record, 33(4), 65–70, (2004).
   Simon Steyskal has been partially funded by ZIT, the Technology           [23] Axel Polleres, ‘Sparql1. 1: New features and friends (owl2, rif)’, in Web
Agency of the City of Vienna (Austria), in the programme ZIT13 plus,              Reasoning and Rule Systems, 23–26, Springer, (2010).
within the project COSIMO (Collaborative Configuration Systems               [24] Han Qin, Dejing Dou, and Paea LePendu, ‘Discovering executable
Integration and Modeling) under grant number 967327.                              semantic mappings between ontologies’, in On the Move to Meaningful
                                                                                  Internet Systems 2007: CoopIS, DOA, ODBASE, GADA, and IS, 832–
                                                                                  849, Springer, (2007).
REFERENCES                                                                   [25] Bastian Quilitz and Ulf Leser, ‘Querying distributed rdf data sources
                                                                                  with sparql’, in The Semantic Web: Research and Applications, 524–538,
[1] David Beckett, Tim Berners-Lee, Eric Prud’hommeaux, and Gavin                 Springer, (2008).
    Carothers. Turtle – Terse RDF Triple Language. W3C Candidate             [26] Dominique Ritze, Christian Meilicke, O Sváb-Zamazal, and Heiner
    Recommendation, February 2013. http://www.w3.org/TR/2013/                     Stuckenschmidt, ‘A pattern-based ontology matching approach for de-
    CR-turtle-20130219/.                                                          tecting complex correspondences’, in ISWC Workshop on Ontology
[2] Namyoun Choi, Il-Yeol Song, and Hyoil Han, ‘A survey on ontology              Matching, Chantilly (VA US), pp. 25–36. Citeseer, (2009).
    mapping’, ACM Sigmod Record, 35(3), 34–41, (2006).                       [27] Pavel Shvaiko and Jérôme Euzenat, ‘Ontology matching: state of the
[3] Gianluca Correndo, Manuel Salvadores, Ian Millard, Hugh Glaser, and           art and future challenges’, Knowledge and Data Engineering, IEEE
    Nigel Shadbolt, ‘Sparql query rewriting for implementing data integra-        Transactions on, 25(1), 158–176, (2013).
    tion over linked data’, in Proceedings of the 2010 EDBT/ICDT Work-       [28] Timo Soininen, Juha Tiihonen, Tomi Männistö, and Reijo Sulonen,
    shops, p. 4. ACM, (2010).                                                     ‘Towards a general ontology of configuration’, Artif. Intell. Eng. Des.
[4] Chris J Date and Hugh Darwen, SQL. Der Standard.: SQL/92 mit den              Anal. Manuf., 12(4), 357–372, (September 1998).
    Erweiterungen CLI und PSM., Pearson Deutschland GmbH, 1998.              [29] Dong Yang, Rui Miao, Hongwei Wu, and Yiting Zhou, ‘Product config-
[5] Robin Dhamankar, Yoonkyong Lee, AnHai Doan, Alon Halevy, and                  uration knowledge modeling using ontology web language’, Expert Syst.
    Pedro Domingos, ‘imap: discovering complex semantic matches be-               Appl., 36(3), 4399–4411, (April 2009).
    tween database schemas’, in Proceedings of the 2004 ACM SIGMOD
    international conference on Management of data, pp. 383–394. ACM,
    (2004).
[6] AnHai Doan and Alon Y Halevy, ‘Semantic integration research in the
    database community: A brief survey’, AI magazine, 26(1), 83, (2005).