<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Open eBusiness Ontology Usage:</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jamshaid Ashraf</string-name>
          <email>jamshaid.ashraf@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Digital Enterprise Research</string-name>
          <email>richard@cyganiak.de</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sean O'Riain</string-name>
          <email>sean.oriain@deri.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maja Hadzic</string-name>
          <email>m.hadzic@curtin.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Digital Ecosystem and, Business Intelligence Institute, (DEBII), Curtin University of, Technology</institution>
          ,
          <addr-line>Perth</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Digital Enterprise Research, Institute (DERI)</institution>
          ,
          <addr-line>National</addr-line>
          ,
          <institution>University of Ireland</institution>
          ,
          <addr-line>Galway</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute (DERI)</institution>
          ,
          <addr-line>National</addr-line>
          ,
          <institution>University of Ireland</institution>
          ,
          <addr-line>Galway</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>The GoodRelations Ontology is experiencing the first stages of mainstream adoption, with its appeal to a range of enterprises as the eCommerce ontology of choice to promote its product catalogue. As adoption increases, so too does the need to review and analyze current implementation of the ontology to better inform future usage and uptake. To comprehensively understand the implementation approaches, usage patterns, instance data and model coverage, data was collected from 105 different web based sources that have published their business and product-related information using the GoodRelations Ontology. This paper analyses the ontology usage in terms of data instantiation, and conceptual coverage using SPARQL queries to evaluate quality, usefulness and inference provisioning. Experimental results highlight that early publishers of structured eCommerce data benefit more due to structured data being more readily search engine indexable, but the lack of available product ontologies and product master datasheets is impeding the creation of a semantically interlinked eCommerce Web.</p>
      </abstract>
      <kwd-group>
        <kwd>GoodRelations</kwd>
        <kwd>Linked analysis</kwd>
        <kwd>Business ontology</kwd>
        <kwd>Structured Ontology usage</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The Web of data and open ontologies (e.g. FOAF, SIOC, SKOS)
promotes the establishment of a shared understanding between
data providers and consumers in a common format that allows the
automated processing of information by software agents. Where
accepted by the community, an ontology offers the opportunity for
enhanced dissemination and commercial use of information. The
GoodRelations Ontology (GRO) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], developed specifically for
Web-based eCommerce, is an example of such an ontology that
allows businesses to describe their product offerings, entities and
descriptions. The resulting semantically annotated structured data
is then accessible for use in different Semantic Web applications
and inclusion in search engine indexes.
      </p>
      <p>PingTheSemanticWeb.com1 has ranked GRO second to FOAF as
the most widely used ontology. Available since 2008, GROs’
schema is mature but uptake reflects that of early adoption. A
review and analysis of the current community implementations of
the GRO within its eCommerce environment is timely as it will
provide insight into its applicability, conceptual coverage and
actual usage within its application domain. This paper reports on
the current implementation status of the GRO after investigating
105 publically available data sets. In this first large scale
investigation of its kind into the GRO, data providers are
categorized, dataset characteristics discussed and the usefulness of
currently available data sets analysed through different use cases.
Implicit data available through axiomatic triples is also
considered.</p>
      <p>The remainder of the paper is organized as follows. Section 2
introduces the motivation, and background is discussed in Section
3. In Section 4, we discuss the dataset collection and its
characteristics. Section 5 describes the dataset investigation and
use cases, along with results, observations and impact of
reasoning. Related work is presented in Section 6 and Section 7
concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. MOTIVATION</title>
      <p>
        The semantic web provides a level of semantically annotated
structured data that enhances the level of user experience by more
accurately sourcing and identifying information of interest.
Enabled primarily through ontological alignment, semantic
annotation is a major factor contributing to the increasing interest
in ontology usage by the wider community and one which had
also attracted the attention of early business adopters. Over the
last two years, the GRO has witnessed this sectoral appeal with
mainstream adoption by eRetailers such as BestBuy.com,
Overstock.com and Oreilly.com. Announcements from search
engine providers Google2 and from Yahoo3 to index the GRO
will, for its corporate users, extend consumer reach to a larger
audience with their increased appearance in search results. A
measure of the popularity of any ontology is its community
acceptance, which reflects ‘some’ level of use but not the extent
of adherence to the schema or the extent of instantiation. A more
accurate examination of popularity should therefore consider the
1 Last accessed on Dec 16, 2010
2http://googlewebmastercentral.blogspot.com/2010/11/rich-snippets-forshopping-sites.html
3 http://developer.yahoo.com/searchmonkey/smguide/gr.html
overall ontology population. To date, however, the literature does
not present evidence of systematic analysis of GRO usage that
could provide this insight into its adoption and usage status in the
emerging eCommerce Web of data.
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] defines ontology population as having occurred when an
ontological term (i.e. concept, property or individual) is used to
annotate data. An analysis of these terms usage within the GRO
would be beneficial for:
eCommerce information producers and consumers: by providing
insight into structured data usage as a means to improve the
quality and quantity of data being made available to the business
consumer;
Ontology engineering: better incorporating stakeholders’
perspectives in ontology evolution [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and ontology maintenance
by analysing the ontology population and model coverage to help
ontology engineers understand usage patterns;
Ontology Mapping: interaction between different ontology
concepts would benefit from understanding the models used and
instance data generated. An analysis of the eCommerce Web of
data landscape and use of ontologies [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] would also be useful.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. GoodRelations ONTOLOGY OVERVIEW</title>
      <p>In the following section, we describe our high level categorization
of data providers, and present a brief overview of the GR
conceptual schema and use of the GRO in search indexes.</p>
    </sec>
    <sec id="sec-4">
      <title>3.1 Data Providers</title>
      <p>Looking at the structured eCommerce data landscape, we can
categorize users into three groups based on their publishing
approach, usage pattern and data volume.</p>
      <sec id="sec-4-1">
        <title>3.1.1 Large Size Retailers</title>
        <p>This group includes large online e-retailers and retailers who are
traditionally premises-based and have only recently entered the
eRetailing business. Such data sources provide more detailed
(rich) product description which is useful for entity consolidation
and interlinking with other datasets. Such companies include
BestBuy.com, Overstock.com, Oreilly.com, and Suitcase.com.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.1.2 Web shops</title>
        <p>A large number of the data sources included in our dataset
comprises small to medium web shops, offering their products and
services mainly through web channels. Most of these web shops
use web content management packages4 such as Maganto,
osCommerce and Joomla to add RDFa data in html pages. This
approach works well since no special infrastructure arrangement
is required in most cases.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.1.3 Data Service providers</title>
        <p>To leverage the benefits offered by semantic eCommerce data,
businesses are offering data services that build on consolidated
semantic repositories. Moreover, the providers use APIs to access
and transform proprietary data into RDF before making them
available through their repositories. For example, Linked Open
4 Complete list of their references are available at
http://www.ebusinessunibw.org/wiki/GoodRelations#Shop_Software
Commerce (LOC)5 contains Amazon.com
Amazon.com has not yet published RDF/RDFa.
data
although</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3.2 Conceptual Schema</title>
      <p>
        The latest version6 of the GRO comprises 27 concepts (classes),
49 object properties, 43 data properties and 43 named individuals.
To cater for backward compatibility, the ontology model was
recently updated with the addition of new object and data
properties based on implementation feedback. Note, gr is the
GRO prefix used in general practice and throughout this paper.
For the full specification of the GRO, the reader is referred to
http://purl.org/goodrelations/v1.
3.2.1 Axioms
The GRO comprised classes, properties, individuals and axioms.
Axioms allow information to be inferred from a knowledge base
through the use of a reasoning engine known as a reasoner. The
expressivity of the GRO is based on an OWL DLP fragment and
contains subclass and subproperty axioms to express the
subsumption behaviour in the model. Axiomatic triples in the
GRO are given in Table 1 to shed light on the possible inference
on eCommerce data which has been annotated using the GRO and
applicable rule sets. RDFS and OWL elements such as
rdfs:domain and rdfs:range, which are available in the ontology,
are omitted from the table as they were not included in the
reasoning experiment.
Elements such as rdfs:subClassOf, rdfs:subPropertyOf,
owl:inverseOf, owl:TransitiveProperty and
owl:SymmetricProperty were considered as they are associated
with new knowledge. They can be used both in forward-chaining,
to materialize the implied statements thereby making them
explicit, and in backward-chaining performing query rewrites to
expand query scope and include inferred knowledge.
owl.DisjointClasses differs because it is used primarily for data
quality and checking for inconsistencies. Constructs mentioned in
Table 1 are covered by almost all of the rule sets including RDFS,
pD* [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and OWL2RL7. In our investigation, we employed an
RDFS-based reasoning engine with RDFS rules because it is
generally available in most semantic repositories.
      </p>
      <sec id="sec-5-1">
        <title>5 http://www.linkedopencommerce.com</title>
        <p>6 Latest version and model used in our experiment last updated on Nov
26, 2010.</p>
      </sec>
      <sec id="sec-5-2">
        <title>7 http://www.w3.org/TR/owl2-profiles/</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>3.3 Use of the GRO by Search Engines</title>
      <p>
        The adoption of the GRO is driven by the level of enhanced
visibility that a company’s products and general profile can
receive as a result of its GRO marked-up data being included in
the search engine indexes of large providers such as Google and
Yahoo [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Yahoo and Google currently include price, availability
(Google only), description and product pictures drawn from the
GRO annotated structured data as part of their enhanced search
results. BestBuy.com, a major implementer of the GRO, has
announced8 an increase of 30% traffic across their store’s pages
which contain the GRO annotated structured information.
However, the literature does not contain any further study where
the BestBuy findings are benchmarked and compared with others.
      </p>
    </sec>
    <sec id="sec-7">
      <title>4. DATA SET</title>
      <p>The eCommerce data sets constructed were collected primarily
from those annotated with the GRO and represent the maximum
number of accessible data sets. Throughout this paper, we use
GoodRelations Dataset or GRDS to refer to the RDF graph
collected from the various websites and stored in a triple store for
querying and reasoning, and Data Source to refer to the websites’
unique domain name server (DNS) included in GRDS, which
contains eCommerce data in RDF (any serialization format) or an
RDFa format based on the GRO model.</p>
    </sec>
    <sec id="sec-8">
      <title>4.1 Data Set Collection</title>
      <p>To analyse the adoption, usage patterns and uptake of the GRO in
general, and by the eCommerce community in particular, data sets
were collected from multiple sources and consolidated to create
the GoodRelations Dataset9 (GRDS). Potential data sources that
used the GRO to describe the offerings or company (Business
Entity) were the primary identification drivers. Different semantic
search engines such as Sindice10 and Watson11 which index RDF
documents were used to obtain a list of potential data sources.
Traditional search engines such as Google were also used to
retrieve RDF documents by using the filetype:rdf attribute of
advanced search to access RDF documents over the web.
Additionally, we also considered the list of data publishers
maintained at the GoodRelations’s developer wiki site12. For our
empirical investigation, data was collected from 105 different data
sources complying with the stated criteria. The complete list of
these data sources is provided in the Appendix.</p>
      <p>During the collection process, we noticed that 90% of the
websites (data sources) were using the RDFa13 standard to add
structured information to existing HTML documents. The
majority of sources had sitemap.xml files that allowed search
engines to crawl the web pages and build indexes. However, the
links (URLs) provided in the sitemap files were often linked to a
list of products pages and not to the actual product pages
themselves. Being interested in accessing the web pages that had
8http://www.nytimes.com/external/readwriteweb/2010/07/01/01readwriteweb-howbest-buy-is-using-the-semantic-web-23031.html
9 http://debii.curtin.edu.au/~jamshaid/GRDS-dump-v0.1.rar
10 http://www.sindice.com
11 http://watson.kmi.open.ac.uk/WatsonWUI
12 http://www.ebusiness-unibw.org/wiki/GoodRelations
13 http://www.w3.org/TR/rdfa-syntax/
embedded RDFa code, we relied upon crawlers to build the list of
such URLs before manually verifying the constructed list. With a
valid list of URLs REST-based web services, Any2314 and RDFa
Distiller15 were used to parse RDFa snippets from online HTML
documents and generate RDF graphs (in RDF/XML syntax).
Graphs were then loaded into the OpenLinks Virtuoso16 triple
store to create the GRDS experimental data set. From an RDF
data management perspective, named graphs are used to group
triples from one data source under a uniquely named graph URI,
allowing the dataset to be queried vertically and horizontally.
Linked Open Commerce17 (LOC) represents an emerging data
space which collates eCommerce data from the Web and makes
information available for retrieval and viewing through a
SPARQL endpoint. Despite its presence, collection of data sets
within this environment proved difficult owing to: i) the
unavailability of several data sources in the LOC; and ii) the
presence of several triples using non-authentic URIs, resulting in
an inability to de-reference the URI, use it in a query or obtain
provenance details. The LOC environment has approximately 34
data sources which publish and make their data available in
RDF/RDFa format. Moreover, LOC contains a nominal number of
data sources that are made available in RDF/RDFa but through
the use of middleware APIs’ such as Amazon.com. Invalid URIs
such as those starting with “localhost.localdomain….” were also
found18 to be problematic, but overall represented minor data
quality issue that was easily overcome.</p>
      <p>The inclusion and use of these two datasets (i.e. LOC and GRDS)
in our experiment provides the best opportunity for an optimal
search space covering the maximum possible width (GRDS) and
depth (LOC) of the structured eCommerce web-of-data. In
essence, GRDS covers a greater number of data sources while
LOC has greater coverage of data from data sources. Hence, both
datasets complement one another with LOC providing an ability
to cross check or find additional information which is useful for
the analysis of results.</p>
    </sec>
    <sec id="sec-9">
      <title>4.2 Dataset Composition Characteristic</title>
      <p>During GRDS data set composition, the different characteristics
of the datasets such as use of different namespaces, GR
vocabulary and annotation properties were considered. The results
from each investigation are described below.</p>
      <sec id="sec-9-1">
        <title>4.2.1 Namespace Usage</title>
        <p>Table 2 lists all vocabularies and their prefixes found in GRDS.
Apart from gr, the top three most used vocabularies are dc
(Dublin Core), foaf and vCard. Some vocabularies, such as vCard
and dc were found to be used with multiple prefixes due to the
availability of a new version with new namespace URIs. A larger
percentage of focused vocabularies were used by data sources to
annotate the data relevant to their businesses; for example, frbr is
used by O’Reilly to annotate bibliographic data.
14 http://any23.org
15 http://www.w3.org/2007/08/pyRdfa/
16 Open Source Edition, http://sourceforge.net/projects/virtuoso/
17 http://linkedopencommerce.com
18 Observations were made as of 16 OCT 2010</p>
      </sec>
      <sec id="sec-9-2">
        <title>4.2.2 GR Vocabulary Usage</title>
        <p>Here, an analysis of the GR vocabulary usage by different
implementers was undertaken. The straightforward approach was
used to calculate the number of instances each concept has in the
dataset and calculate their properties used in implementation.
While this approach helps to identify both the most and least
populated terms, it does not provide sufficient understanding
about usage patterns across different data sources. For example, if
one particular class is used by a large implementer, e.g.
BestBuy.com for their two hundred thousand plus products, then
the count of instances of that class will be high. It is also equally
possible that this particular implementation has used this concept
in GRDS. Therefore, we consider the usage of ontology terms
based on the percentage of data sources where it is in fact used,
rather than on the total number of triples in the dataset.
Our study also analysed the usage of GR schema from the
perspectives of concept usage related to the instances and data
sources found within the GRDS.
19 W3C has now RDF based vCard however 25.71% of data source are
still using deprecated namespace
20 Yahoo search monkey project defined these namespaces to provide
vocabulary to assist developers.
21 This is Google vocabulary published to be used for structured data with
RDFa and microformat.
22 Facebook Open Graph protocol. Only used by www.lovejoys-ltd.co.uk
23 Vocabulary for expressing reviews and ratings. Only used by
www.overstock.com
24 Vocabulary for Functional Requirements for Bibliographic Records
(FRBR). Only used by www.oreilly.com
25 Vocabulary for latitude, longitude and altitude in the WGS84 geodetic
reference datum. Only used by www.bestbuy.com
This allowed a basic understanding about the nature of available
data and the frequency of concept and/or property use by different
data providers. However, statistical representation based on
simple instance and data sources calculation does not provide
insight into the relationships that exist between entities and the
implementation of the ontological model in practice. To achieve
this level of visibility, we investigated the level of GR model
usage by examining the conceptual coverage of three main pivotal
concepts (Business Entity, Offering, Product or Service) and
description richness available by exploring (traversing) the
relationships available with other concepts through GR properties
(see Section 5).
Figure 1 positions concepts on the diagram based on the
percentage of their use across different data sources. Concepts are
shown as the two groups of :Offering26 and :BusinessEntity, with
the groups intended to assist with visualizing the specific content
and use of a particular data fragment. Several concepts appear on
the edge of the outermost circle, indicating that several data
providers have not made available any fine grained information
about their offerings, although they have provided basic data sets
for eligible customer types or business functions for activities
such as selling, leasing or renting. The lower half of Figure 1
details concepts linked directly or indirectly to the business entity
(:BusinessEntity). Overall, 60% of the data sources provided
information regarding their stores (office/branches) and 40% of
the data sources have made further details available such as the
opening and closing times of stores. One of the pivotal concepts
i.e :ProductOrService is not shown in Figure 1 as no formal
product ontology is currently being used by GRDS. It was found
that the :ProductOrService subconcepts were used throughout the
26Throughout this paper, we assume that http://purl.org/goodrelations/v1#
is the default namespace and use prefixes mentioned in Table 2 for
other namespaces
offering data to describe whether the product referred to in the
offering is the actual instance or existentially quantified.</p>
      </sec>
      <sec id="sec-9-3">
        <title>4.2.3 Use of annotation properties</title>
        <p>GRO recommends the use of annotation properties to provide
additional information about resources. Almost all the entities in
GRDS are annotated with rdfs:label and rdfs:comment properties.
eCommerce deployment frequently used these in queries to
retrieve resources of interest due to the properties being highly
usable. For example, one of the instance of type :Offering has
rdfs:label set to “13 pieces of product "Cash Bases Cost Plus Flip
Lid 460, weiss" are on stock”. One possible solution is to use “Lid
460” in the FILTER clause of the SPARQL query to limit the
result set to potential candidate offers.
As indicated, en27 (IETF's BCP 47 code of English Language) is
the natural language most commonly used for providing textual
description of the resources.
27 http://www.w3.org/International/articles/language-tags/</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>5. ANALYSIS</title>
      <p>One of the main purposes of making structured data available on
the semantic web is to allow users to access accurate (exact)
information [9]. The key to accessing exact information is the
availability of a conceptual description based on the ontological
model. The GRO contains concepts and descriptions that help in
the publishing and consuming of eCommerce data on the web. We
investigated the GRDS by considering common eCommerce use
and observed the data response to these requirements. Following
the GR conceptual model and focusing on pivotal concepts, we
issued targeted queries against the dataset and analysed the
results. In our investigation, we firstly analyzed the overall
conceptual coverage of the model in order to understand the data
landscape. Secondly, we performed a focused analysis so as to
understand the richness of data in GRDS. As part of the focused
analysis for each use case scenario, we firstly discussed the
common understanding of concepts and the set of basic questions
one can ask of the dataset. Secondly, queries were constructed
from these questions to retrieve information and provide a better
understanding of data. Finally, an analysis was conducted of each
use case.</p>
    </sec>
    <sec id="sec-11">
      <title>5.1 Analysis of Concept Coverage</title>
      <p>To understand the overall distribution of data and the conceptual
coverage of the GR model in GRDS, different queries in different
combinations were used. Figure 2 depicts the results in chart
format. The y-axis represents the number of data sources, and
xaxis shows the number of used queries (the queries are listed in
Table 5). The shaded area reflects the information space available
in GRDS. For example, point 6 of x-axis shows the number of
data sources which provided data for the concepts listed in 6th
row of Table 5. The query used for 6th row is available in listing
1.
The highlighted area in the chart details the type of structured
information currently available in eCommerce. Broadly speaking,
we observe that, on average, every data publisher has provided
business entity, offering and price details. However, almost no
data source has provided any formal specification of the products
being offered.</p>
      <p>Listing 1: Query (representing the concept involved in point 6
of chart’s x-axis)</p>
    </sec>
    <sec id="sec-12">
      <title>5.2 Use Case-Based Analysis</title>
      <p>As previously mentioned, we use generic use case scenarios to
illustrate the extensive use of semantic eCommerce data.</p>
      <sec id="sec-12-1">
        <title>5.2.1 Finding a Company (Business Entity)</title>
        <p>Finding a company is a very common and useful requirement in
multiple situations, particularly when seeking a company in a
specific vertical industry, a company offering a specific product, a
company with a specific business role (buyer/seller), or even
competitors. Intuitively, one could ask many questions to obtain
the required information from the eCommerce information space.
We have intentionally limited our search to the following
questions as they are very basic and cover most user requirements.
- Find a company with a specific name
- Find a company in a particular location
- Find a company in a particular line of business (or
service)
These questions also contain basic parameters that, if used in
different combinations, can address more advanced requirements.
To obtain a view of the structured information published by
different data providers, we accessed the GRDS using SPARQL
query shown in Listing 2.</p>
        <p>Listing 2: Query (retrieving company description)
Result and Observations
With reference to Table 6, within GRDS, 93.34% of the data
sources provided a business name using the :legalName property.
This property is very helpful when searching for a company with a
specific name using the SPARQL filter option. A few data
sources28 were found which did not supply a value for the legal
name property. Further investigation of these providers’ datasets
found the presence of rdfs:label, vcard:fn properties but also with
no attribute value.
The unique identification of a Company (:BusinessEntity) on the
Semantic Web using a string value is complicated as multiple
companies often have the same name. Entity disambiguation [10]
is required to distinguish identically named companies from each
other. Despite the fact that the GRO has useful attributes that
assist in identifying a company easily and accurately, we found
only one data source29 in the GRDS that provided both the
:ISICv4 code value (i.e. 4652) and company name. We did not
find any value for the other predicates mentioned in the
OPTIONAL clause of the SPARQL query above (see Listing 2).
In the GRDS, the second-most widely used schema (after GRO) is
vCard30 which provides the location and specific address of a
company or shop. 99.5% of the data sources provided information
28 www.sachse-stollen.de, www.golfhq.com, ww.hagemann24.de,
www.globalautoimports.com.br, www.xtremeimpulse.com
www.cardgameshop.com, ww.discountofficehomefurniture.com
29 www.jarltech.com
about the country and locality, and 85.3% also provided a street
address with postcode (postal address).</p>
        <p>Location of Store31, indicating where the service/product is
provided, is annotated using the
:LocationOfSalesOrServicesProvisioining concept . It has
relationships with both :BusinessEntity (through :hasPOS) and
:Offering (through :availableAtOrFrom). This allows information
about the shop to be accessed by referring to the Business Entity
or Offering. Shop location-related information is very helpful in
many situations such as when visiting the shop, requesting online
delivery or searching for a particular item in a particular location.
In GRDS, 71.42% of data sources have provided shop
information using :availableAtOrFrom and 44.76% have provided
shop information using :hasPOS predicate. 39.04% data sources
have provided information using both predicates. 34.28% of the
data sources do not have working time details, or number of
operating days per week. All of the 65.72% who have provided
opening hour details also provided :open and :closes time.
However no data source provided :validFrom or :validThrough
opening hour specification. We also observed that 96.6% of the
data sources have provided opening and closing times in UTC
format and added ‘Z’ after time (e.g 10:10:10Z).</p>
      </sec>
      <sec id="sec-12-2">
        <title>5.2.2 Finding an Offer (Offering)</title>
        <p>Making offering-related information available on the web in a
structured format is one of the core objectives of GRO. The
:Offering concept has 13 data properties that describe offering
attributes and 16 object properties allowing several relationships
to be established with other related concepts such as price
specification, delivery options, payment or delivery charges,
payment options, quantity and quality of products (included in
offer and warranty). As previously stated, :Offering is the most
widely used concept after :BuisnessEntity and is found in almost
all eCommerce use case scenarios. Such as:</p>
        <sec id="sec-12-2-1">
          <title>Find offering of a specific price range</title>
          <p>Find offering of a specific product and the available
quantity
Find delivery, warranty and payment charges of
particular offering
Responding to these question is dependent on the offering data
landscape as illustrated in Table 7 and 8 with the different query
and data patterns found.
31 It is important to note that herein, when we mention ‘Store’, it refers to
the store, shop, branch office, office or any physical location, where the
service or product is being provided on behalf of the store’s Company
(:BusinessEntity)</p>
          <p>From the perspective of data retrieval, useful information is
available through relationships between different offerings and
other related concepts. However, in order to either filter or restrict
the search based on string matching, the textual description of the
offer instance has to be relied upon. Another finding was that the
prevalence of terms overlapping across vocabularies (such as
:name and v:name), which also had to be considered when
querying or generating customized rules.
:availableAtOrFrom
:acceptedPaymentMethods
:includesObject
:availableDeliveryMethods
:hasPriceSpecification
:includes
Relationship patterns between offerings and other model concepts
were looked at. The GR model provided two means of linking
offering to products. When an offer has a single product, :includes
is used, and for complex product bundling, :includesObject is
used. In GRDS, 59.99% of data sources linked offers with
product, and the remaining 40.01% use the offering concept to
attach supplementary information such as eligible customer type,
shop location information and supported payment methods.
Tables 8 outlines the availability of relationships across the data
sources with some not being used at all. 30.48% of companies
provided price specification details, while 80.95% of the data
sources identified the eligible customer type of the offer using GR
predefined individuals such as :BusinessUser, :Endusers and
:PublicInstitution. Within the GRDS, no data source provided
information on the inventory level, advance booking requirement,
delivery lead time, eligible duration of offer, eligible quantity to
buy or eligible transaction volume. These omissions are not
uncommon as this kind of information is required only for specific
(unique) products not offered by web shops.</p>
          <p>Consumers normally like to find offers containing some specific
product and if found, they look for the product price, delivery
method, payment details etc. The GR Primer36 mentions that at
minimum, “the basic structure of an offering is always a graph
that links (1) a business entity to (2) an offering. The offering
itself is linked to one or multiple type and quantity nodes and one
or more price specification nodes. Each type and quantity node
holds the quantity. The unit of measurement for the quality, and
the product or service that is included in the offering”. Retrieving
offers with a specific product in mind would therefore require
accessing concepts that relate product with offering and provide
details on quantity and unit of measurement. As no formal product
ontology is currently being exploited in the GRDS, we could only
query offering and filter records based on a textual description
attached to the offering. Tables 7 and 8 show an ‘offering’ data
landscape. The following observations can be made: i) offerings
can be retrieved with their price, quantity and offer start and end
date; and ii) a filter clause can be applied to properties with literal
values (such as :name, :description, rdfs:label and rdfs:comment)
to narrow the search for specific offering.</p>
        </sec>
      </sec>
      <sec id="sec-12-3">
        <title>5.2.3 Finding a specific product (Product Findability)</title>
        <p>GRO provides three means of describing products. Each approach
has different structural requirements, allowing users to select the
most appropriate. The first recommends using an appropriate
product/service ontology to describe products referred to in an
offering. The second and less structural approach allows
lightweight product ontology tailored to individual specific needs
using the GR top level Product or Service (:ProductOrService)
concept and related vocabulary. The third, non-structural
approach allows the description of product information lexically.
This approach allows users to restrict their search to products with
specific terms in their textual description. As in the previous use
cases, we evaluated how GRDS responded to product related
requests such as:</p>
        <sec id="sec-12-3-1">
          <title>Find a particular product (e.g. TV or Shoes) Find a product with specific requirement (e.g. TV set of 24 inches, HD resolution)</title>
          <p>Result and Observations
The query in Listing 3 was used to ‘ask’ the question which
highlighted the lack of any formal product ontology to annotate
products and their properties. We did however find that 2.86% of
data sources used the second approach of using proprietary
product ontology to describe quantitative properties. 97.14% of
the data sources follow the third approach, publishing textual
description of the product rather than the ontology. Two
properties found in use for lexical information are rdfs:comment
and rdfs:label. In general however we found no evidence within
the GRDS of data sources using an appropriate product ontology.
Since GR has aits core the description of offers, products can be
searched either by exploring their offer data or through products
included in the offers. In the absence of a proper product ontology
we queries particular products by matching the keyword against
the lexical information available in offer or product data.</p>
          <p>Listing 3: Query (retrieving product description)
The query in Listing 3 finds products containing “Cup” in their
description, and displays price and associated currency, returning
returned 58 products from three data sources37 with price and
currency value.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>5.3 Analysis of Axioms for Reasoning</title>
      <p>RDFS and OWL define a set of forward chaining rules [11] which
can be used to infer implicit knowledge and provide valid query
results. The inclusion of implicit knowledge in the query result is
36 http://www.heppnetz.de/projects/goodrelations/primer/
37 www.jarltech.de, www.overstock.com, www.corsetsandcurves.com.au
achieved by using a reasoner with axiomatic triples available in
an ontology. Despite the availability of customized rules for
deductive reasoning, we focused on the axioms (listed in Table 3)
available in GRO only. Based on ontological restrictions, the
reasoning process helps to identify inconsistencies in instance
data. We first looked at the instance data by applying the
axiomatic triple using the RDFS rule set before using the class
disjointness axioms, to perform disjointness checking in GRDS.</p>
      <sec id="sec-13-1">
        <title>5.3.1 Inferencing</title>
        <p>
          Implied information in the GRDS was investigated by applying an
axiomatic triple using the RDFS rule set to analyze the availability
of implied information at the data instance level. Using the RDFS
entailment rules [
          <xref ref-type="bibr" rid="ref13">12</xref>
          ] rdfs938 and rdfs739, we were able to retrieve
additional information using more generic concepts. This was not
possible for those queries evaluated without reasoning.
The concepts mentioned in the first column of Table 9 reflect the
more generic concepts (superclass) of their specialised concepts
(subclasses). With reasoning, we are able to use generalized
concepts to access subclass membership.
        </p>
        <p>In addition to subclass axioms, the GRO contains subproperty
axioms which allow two resources, which are related through
subproperty, to be implicitly related by superproperty. Figure 3
represents the subPropertyOf subsumption and transitive
behaviour of data type properties. Of the 4 object properties
recently added (2010-09-16), no instance data was found. As
RDFS-style reasoning based upon backward chaining was used, it
was necessary to rewrite queries so as to include implicit
knowledge generated through RDFS entailment rules. In Figure
3(a) :hasCurrencyValue has two superproperties demonstrating
that with RDFS-style reasoners any query with
:hasCurrencyValue in its predicate will return three triples, two
additional triples entailed by applying rule 7 and one original set
of triples having the :hasCurrencyValue predicate. Figure 3(b)
shows the results of applying reasoning over GRDS by using
quantitative value concept’s data properties.</p>
        <p>In web eCommerce, price data is often presented as a fixed price
value and the user focuses on the :hasCurrencyValue property.
Figure 4(a) draws attention to the fact that, apart from for one
instance40, all data sources have provided only a fixed price value
for their offerings.
Data consumers will most likely use this property to access price
value and, with the RDFS-style reasoner, its superproperties can
return the same data. However, in a specific case where price
range i.e. :hasMinCurrencyValue and :hasMaxCurrencyValue is
provided and not :hasCurrencyValue, then using
:hasCurrencyValue with or without reasoning will not return any
value. Here, custom rules can be applied to return Max price value
when there is no :hasCurrencyValue property value available. To
handle similar kinds of situations, the GR website provides a set
of GoodRelations Optional Axioms41 to allow users to obtain
additional information from the dataset with minimal side-effects.</p>
      </sec>
      <sec id="sec-13-2">
        <title>5.3.2 Disjointness checking</title>
        <p>In Table 1, we saw that the disjoint class axioms in the GRO offer
model consistency at the instance level. By making two classes
disjoint, the same individual cannot be an instance of both
(disjoint) classes simultaneously. For example, an individual
declared to be an instance of class :Offering cannot be declared as
an instance of :BusinessEntity since, in the GR model, both
classes are defined as disjoint classes. The SPARQL query in
Listing 4 finds such individuals and within GRDS we identified
one data source42 violating the GR model.
38IF(uuu rdfs:subClassOf xxx AND vvv rdf:type uuu) THEN (vvv rdf:type xxx)
39IF(aaa rdfs:subPropertyOf bbb AND uuu aaa yyy) THEN (uuu bbb yyy)
40 http://plushbeautybar.com/services.html#PriceSpec_10
41
http://www.ebusiness</p>
        <p>unibw.org/wiki/GoodRelationsOptionalAxiomsAndLinks
42 http://www.overstock.com/#company</p>
        <p>Listing 4: SPARQL query
In Figure 4, the same URI is used as an instance of type
:BusinessEntity and :BusienssEntityType; whereas in the model,
both classes are declared as disjoint classes.</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>6. RELATED WORK</title>
      <p>
        A large amount of research work has been done on ontology
evaluation and a survey of different approaches is covered in [
        <xref ref-type="bibr" rid="ref14">13</xref>
        ].
In earlier papers, the focus was on conceptual model analysis
coverage of the ontology using test data only. There is little
evidence in the literature of work that focuses on cases where
instance data based upon actual field implementation has been
used and analyzed from an ontological model perspective. Generic
instance data Evaluation Process (GEP) [
        <xref ref-type="bibr" rid="ref15">14</xref>
        ] evaluates instance
data in knowledge management systems. The Wine ontology is
used with test instance data to discuss the different symptoms,
their causes and ways to generate potential issues. Findings are
categorized into logical inconsistencies, syntax issues and detailed
discussion around hypothetical potential issues. The study is
generic in nature and the instance data is evaluated using an
ontology primarily developed for learning purposes but which
does not reflect the actual usage or state of the instance data on
the semantic web.
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] has analyzed the social and structural relationship available on
semantic web by considering FOAF vocabulary. The study was
performed on approximately 1.5 million FOAF documents to
analyze instance data available on the web and their usefulness in
understanding social structures and networks. Additionally, the
use of different namespaces, concepts and properties is discussed
in order to provide a perspective on different FOAF
implementations. This research provides only a limited analysis
since the primary focus was on social network-related instance
data.
[
        <xref ref-type="bibr" rid="ref16">15</xref>
        ] provided a detailed study on the quality and state of
published RDF data on the semantic web. Linked data principles
were used to measure the noise and inconsistency available in a
dataset, and reasoning was performed. While highlighting the
issues and findings, the researchers have provided guidelines for
both data publishers and data consumers to assist in generating
and consuming high quality semantic data. Although the
experiment was performed on the instance data collected from the
web and has provided details on inconsistency and ontology
hijacking in general, no particular ontology was considered for
data analysis. In summary, these studies examine the instance
data from a quality perspective or the use of test data for ontology
evaluation. Our study, performed on data sets from early adopters
of open eBusiness ontologies, represents a timely contribution and
insight into community usage of the GRO.
      </p>
    </sec>
    <sec id="sec-15">
      <title>7. CONCLUSIONS AND FUTURE WORK</title>
      <p>
        In this paper, we analysed the implementation of the GRO by
consolidating 105 GR data sources into a single data set. We
analyzed the use of other ontologies with the GRO and
categorized data providers. Different use cases were used to
better understand and illustrate the schema usage and coverage
through ontological instantiation. Data sources provide structured
data aimed at improving search ranking only with no interlinking
currently available between eCommerce datasets or with LOD
[
        <xref ref-type="bibr" rid="ref17">16</xref>
        ]. The availability of links between disparate entities and the
use of open eBusiness ontologies (such as the GRO) could well
assist to integrate disparate information sources.
      </p>
      <p>Overall, the analysis points to early adoption and usage of an
ontology that is beginning to achieve mainstream adoption with
implementers using the GRO in an à la carte fashion rather than
semantics a la mode.</p>
      <p>
        In our future work, we intend to progress in two directions: i)
toward a more comprehensive analysis of an expanded dataset.
For this, we plan to collect datasets at intervals for a duration of
six months to determine whether the status quo remains
unchanged and, if not, how implementation develops with
increased maturity; ii) evaluating the usefulness of structured data
on the web. Here, we plan to investigate the impact of eCommerce
structured data (annotated using GRO and other eBusiness
ontologies) in search engine indexes (like Google, Yahoo!, etc)
and measure the increase in business activity, as has already been
evident in the traffic increase for BestBuy [
        <xref ref-type="bibr" rid="ref18">17</xref>
        ].
      </p>
    </sec>
    <sec id="sec-16">
      <title>8. ACKNOWLEDGMENTS</title>
      <p>The authors would like to thank Michael Hausenblas and Aidan
Hogan for useful discussion and advice. The work presented in
this paper has been funded in part by the Science Foundation
Ireland under Grant No. SFI/08/CE/I1380 (Lion-2) and the EU
FP7 Activity ICT-5-4.3 under Grant Agreement No. 256975,
LOD Around-The-Clock (LATC) Support Action and Activity
ICT-4-2.2 under Grant Agreement No. 248458, Multilingual
Ontologies for Networked Knowledge (MONNET).</p>
    </sec>
    <sec id="sec-17">
      <title>Appendix: List of data sources in GRDS</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Hepp</surname>
          </string-name>
          , Martin, “
          <article-title>GoodRelations: An Ontology for Describing Products</article-title>
          and
          <article-title>Services Offers on the Web,” in Proceedings of the 16th International Conference on Knowledge Engineering and Knowledge Management (EKAW2008), Acitrezza</article-title>
          , Italy,
          <year>2008</year>
          , vol.
          <volume>5268</volume>
          , pp.
          <fpage>332</fpage>
          -
          <lpage>347</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Li</given-names>
            <surname>Ding</surname>
          </string-name>
          , “
          <article-title>How the Semantic Web is Being Used: An Analysis of FOAF Documents,”</article-title>
          <source>in Proceedings of the 38th Annual Hawaii International Conference on</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>J</given-names>
          </string-name>
          , “
          <article-title>E-Business Interoperability through Ontology Semantic Mapping,” in Processes and Foundations for Virtual Organizations</article-title>
          ,
          <year>2003</year>
          , vol.
          <volume>262</volume>
          , pp.
          <fpage>315</fpage>
          -
          <lpage>322</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>De</given-names>
            <surname>Leenheer</surname>
          </string-name>
          , P. “
          <source>PhD Dissertation</source>
          ,
          <article-title>On community-based Ontology Evolution</article-title>
          ,” Vrije Universiteit Brussel,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>H. J. ter Horst</surname>
          </string-name>
          , “
          <article-title>Completeness, decidability and complexity of entailment for rdf schema and a semantic extension involving the owl vocabulary</article-title>
          ,
          <source>” Journal of semantic web</source>
          , vol.
          <volume>3</volume>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>115</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Steiner</surname>
          </string-name>
          , “
          <article-title>How Google is using Linked Data Today and Vision for Tomorrow,” Future Internet Assembly</article-title>
          , Ghent, Belgium,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Gruber</surname>
          </string-name>
          , Tom, “
          <article-title>Where the Social Web meets the Semantic Web</article-title>
          , Web Semantics,” Web Semantics: Science,
          <source>Services and Agents on the World Wide Web</source>
          , vol.
          <volume>6</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>4</fpage>
          -
          <lpage>13</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Klyne</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. McBride,</surname>
          </string-name>
          “
          <article-title>Resource description framework (RDF): Concepts and abstract syntax</article-title>
          .”
          <source>World Wide Web Consortium</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Madhavan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Yu</surname>
          </string-name>
          , “
          <article-title>Structured data meets the web: A few observations,” presented at the IEEE Data Eng</article-title>
          , Bull.,,
          <year>2006</year>
          , vol.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          29, pp.
          <fpage>19</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Tummarello</surname>
          </string-name>
          , Stefan, “Sig.ma:
          <article-title>Live views on the Web of Data,”</article-title>
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          , vol.
          <volume>8</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>355</fpage>
          -
          <lpage>364</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Edward</given-names>
            <surname>Thomas</surname>
          </string-name>
          , “
          <article-title>Lightweight Reasoning and the Web of Data for Web Science</article-title>
          ,” in Intertional Conference on Web Science (WebSci
          <year>2010</year>
          ),
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Patric</surname>
            <given-names>Hayes</given-names>
          </string-name>
          , “RDF Semantics, W3C Working Draft.”
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Janez</surname>
            <given-names>Brank</given-names>
          </string-name>
          , “
          <article-title>A survey of ontology evaluation techniques,” presented at the SIKDD 2005 at multiconference IS 2005</article-title>
          , Ljubljana, Slovenia.
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Jiao</surname>
            <given-names>Tao</given-names>
          </string-name>
          ,
          <article-title>“Instance Data Evaluation for Semantic Web-Based Knowledge Management Systems</article-title>
          ,” in System Sciences,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          , “
          <article-title>Weaving the pedantic web,” presented at the In 3rd International Workshop on Linked Data on the Web (LDOW2010) at WWW2010</article-title>
          , Raleigh, USA,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hausenblas</surname>
          </string-name>
          , “
          <article-title>Exploiting linked data to build applications,” IEEE Internet Computing</article-title>
          , vol.
          <volume>13</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>68</fpage>
          -
          <lpage>73</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>[17]“http://www.wilshireconferences.com/semtech2010/RWW_070110.p df.”.</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>