<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>How Google is using Linked Data Today and Vision For Tomorrow</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thomas Steiner?</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Raphael Troncy</string-name>
          <email>raphael.troncy@eurecom.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Hausenblas</string-name>
          <email>michael.hausenblas@deri.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DERI, NUI Galway IDA Business Park</institution>
          ,
          <addr-line>Lower Dangan Galway</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>EURECOM</institution>
          ,
          <addr-line>Sophia Antipolis</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Google Germany GmbH</institution>
          ,
          <addr-line>ABC-Str. 19, 20354 Hamburg</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this position paper, we rst discuss how modern search engines, such as Google, make use of Linked Data spread in Web pages for displaying Rich Snippets. We present an example of the technology and we analyze its current uptake. We then sketch some ideas on how Rich Snippets could be extended in the future, in particular for multimedia documents. We outline bottlenecks in the current Internet architecture that require xing in order to enable our vision to work at Web scale.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        ? While the author is a liated with Google, this paper is in no way to be seen as an
o cial product strategy, nor as a future roadmap. Opinions are solely the respective
author's.
the presentation layer and that aims to make sense of the data in such a way
that it establishes interoperability between di erent applications [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>In this paper, we rst describe the Google Rich Snippet technology as a way
of adding more semantics to Web applications, the variety of formats supported,
and some rough analysis of its usage (section 2). We then present some ideas
of future extension of this technology, in particular for multimedia content
(section 3). Finally, we take the initial steps towards a complete overhaul of parts
of the current Internet architecture by introducing the thought-experiment of
Triple-centric Networking, inspired by Jacobson's Content-Centric Networking
(section 4).
2</p>
    </sec>
    <sec id="sec-2">
      <title>Google Rich Snippet Formats and Support</title>
      <p>What are the costs and bene ts of publishing data and semantic markup on the
Web4? One needs to carefully examine these issues from di erent perspectives:
consumer and publisher. We are starting to see a number of domains, such as
the Public Sector Information/eGov area, Life Sciences, eCommerce, and others
to not only be aware of the bene ts of publishing Linked Data, but start to
exploit it on a large scale. In this section, we present XHTML code samples
with embedded RDFa mark-up and we show how Google uses this information
in its Rich Snippets technology.
2.1</p>
      <sec id="sec-2-1">
        <title>Which Formats are Supported in Rich Snippets?</title>
        <p>
          Google introduced Rich Snippets on May 12, 2009 as a means of displaying
structured data in search result pages with the objective of highlighting the
searchedfor properties to the user in a visually outstanding way [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The Rich Snippet
feature was built from the beginning on open standards or community-agreed-on
approaches such as RDFa [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], Microformats5, and more recently Microdata[6].
While there is no guarantee that Rich Snippets will be displayed by Google if
a Web page contains semantic mark-up, there is an expression-of-interest form
available6 where webmasters can indicate their consent and interest for Rich
Snippets to be shown for their pages.
        </p>
        <p>
          Multiple semantic mark-up formats are supported by Google (RDFa,
Microformats, Microdata), the search company taking a very practicable approach: A
lot of previous work on structured data has focused on debates around encoding.
Even within Google, we have advocates for microformat encoding, advocates for
various RDF encodings, and advocates for our own encodings. But after working
on this Rich Snippets project for a while, we realized that structured data on
the web can and should accommodate multiple encodings: we hope to emphasize
this by accepting both microformat encoding and RDFa encoding. Each encoding
4 http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
5 http://microformats.org/wiki/Main_Page#Specifications
6 http://www.google.com/support/webmasters/bin/request.py?contact_type=rich_snippets_
-feedback
has its pluses and minuses, and the debate is a ne intellectual exercise, but
it detracts from the real issues [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Together with the May 12 announcement
on public Rich Snippets, an additional feature was launched: Rich Snippets in
Custom Search7 which allows for creating even richer snippets based on custom
mark-up and snippet creation rules to Custom Search users, somewhat similar
to Yahoo!'s BOSS8 (Build your Own Search Service) initiative.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Rich Snippets by Example</title>
        <p>The Rich Snippets technology support various vocabularies. For example, details
about an o ering such as the ticket booking of an upcoming event can be marked
up in the body of a Web page in order to help understanding the location,
schedule, price or reviews of the event. The code example in Figure 1 illustrates
the mark-up for an event at a certain business location.</p>
        <p>The example begins with a namespace declaration using xmlns. In the rst
line, typeof="v:Event" indicates that the marked-up content describes an Event.
The dimensions that composed the event (description, type, starting time) are
described with properties. The property name is pre xed with v: (&lt;span
property="v:description"&gt;). Google does not display information that isn't visible
to the user, with a few exceptions. Geo information (latitude and longitude of
the location) can be included in the HTML markup. Typically, this information
is not visible on a Web page about an event, but providing it can help ensure
that the location is accurately mapped.</p>
        <p>Figure 2 shows how a search result using this semantic markup is then
displayed on Google search result page.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>How Much Semantic Markup is out There?</title>
        <p>
          Goel et al. have compiled some statistics with regards to semantic mark-up on
the Web in June 2010 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. A random sample of one million Web pages have
been harvested in order to compare the use of Microformats and RDFa markup.
Then, they examined how much of this mark-up data was actually used for Rich
Snippets. It is remarkable and surprising how few semantic mark-up was live on
the Web overall, and even more, that only a tiny fraction of all this semantic
mark-up was then used for Rich Snippets at the time of this experiment (Table 1).
        </p>
        <p>Further analysis of the dataset crawled has shown a number of pitfalls:
incorrect labeling (e.g. marking up the date of an event as part of the event
description), or incorrect inclusion of unrelated words in the structured mark-up
(e.g. marking up \written by John Doe" rather than just \John Doe" as value
of the property v:reviewer). Furthermore, they observe a general confusion
with what parts of a document should be marked up at all. Although some web
pages include RDFa event markup, none of them are used by the Rich Snippet
technology as of today.
7 http://googlecustomsearch.blogspot.com/2009/05/enabling-rich-snippets-in-custom-search.</p>
        <p>html
8 http://developer.yahoo.com/search/boss/boss_guide/
&lt;div xmlns:v="http://rdf.data-vocabulary.org/#" typeof="v:Event"&gt;
&lt;a href="http://www.example.com/events/poisel_offenback.hmtl"
rel="v:url"
property="v:summary"&gt;Philipp Poisel in Offenbach&lt;/a&gt;
&lt;span property="v:description"&gt;See Philipp Poisel in Offenbach&lt;/span&gt;
When:
&lt;span property="v:startDate" content="2011-01-16T19:00-01:00"&gt;</p>
        <p>Jan 16, 7:00PM&lt;/span&gt;
&lt;span property="v:endDate" content="2011-01-16T21:00-01:00"&gt;
9:00PM&lt;/span&gt;
Where:
&lt;span rel="v:location"&gt;
&lt;span typeof="v:Organization"&gt;
&lt;span property="v:name"&gt;Capitol&lt;/span&gt;,
&lt;span rel="v:address"&gt;
&lt;span typeof="v:Address"&gt;
&lt;span property="v:street-address"&gt;Kaiserstra e 106&lt;/span&gt;,
&lt;span property="v:locality"&gt;Offenbach am Main&lt;/span&gt;,
&lt;/span&gt;
&lt;/span&gt;
&lt;span rel="v:geo"&gt;
&lt;span typeof="v:Geo"&gt;
&lt;span property="v:latitude" content="50.10945"&gt;&lt;/span&gt;
&lt;span property="v:longitude" content="8.76579" &gt;&lt;/span&gt;
&lt;/span&gt;
&lt;/span&gt;
&lt;/span&gt;
&lt;/span&gt;</p>
        <p>
          Category: &lt;span property="v:eventType"&gt;Concert&lt;/span&gt;
&lt;/div&gt;
Improving how search results are displayed in snippets preview involve a huge
market. Tickets aggregator Web sites such as Giga-Music.de place so-called
afliate links to the nal ticket vendors. These sites, very often, make their living
from accumulating useful metadata around a certain event, and then, provide
only links to vendors in order to nally get paid, e.g. on a pay-per-click model.
In [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], Goel and Gupta gave some insights into the click and impression
behavior for Rich Snippets. The overall tendency being an increasing number of
impressions for Rich Snippets-enabled pages, and a higher click-through rate for
pages with Rich Snippets. Taking into account the prior remark, it is thus clear
that adding or removing Rich Snippets as a whole, or Rich Snippets features
especially, has a huge impact on the labile search ecosystem. Web sites can
suffer signi cant sales collapses by going down a position in their natural search
ranking.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>A Vision for Tomorrow's Rich Snippets in Search</title>
    </sec>
    <sec id="sec-4">
      <title>Engines</title>
      <p>Search engines serve mainly as entry points to the Web. Let us imagine a user
wants to see a concert of the German singer and songwriter Philipp Poisel.
A straightforward query would be \philipp poisel konzerte9". At the time of
writing, this search results in: the concerts section of the artist's o cial Website
as the rst result10, a couple of fan club sites11, some concert review sites12, and
some ticket aggregator sites13.</p>
      <p>In the following, we purely focus on how Linked Data could enrich the search
experience, for example, for a Web search for Philipp Poisel concerts. We
anticipate a huge uptake of semantic mark-up by Web site operators over the coming
9 \konzerte" is the German word for \concerts".
10 http://www.philipp-poisel.de/termine/
11 E.g. http://www.philipppoiselfanclub.de/?tag=konzert
12 E.g. http://www.ciao.de/Philipp_Poisel_Konzert_Tickets__8025011
13 E.g. http://www.giga-music.de/philipp-poisel-konzerte-live-tour/
months. Especially, new business-related vocabularies such as the Tickets
Ontology[5], are expected to see broader and broader usage and implementation.
In the following, we assume that ticket vendors had implemented the Tickets
Ontology on their Web pages. Therefore, we consider a couple of pages with
the following triples (for the sake of clarity we omit the necessary pre xes and
simplify the xsd:dateTime format):
foo:ticket a tio:TicketPlaceholder ;
rdfs:label "Tickets for Philipp Poisel"@en ;
tio:accessTo &lt;http://data.events.example.org/123&gt; .
foo:ExampleTicketVendor gr:offers foo:offer .
foo:offer a gr:Offering ;
gr:name "Tickets for Philipp Poisel in Bremen"@en ;
gr:description "Philipp Poisel in Bremen"@en ;
gr:includes foo:ticket ;
gr:hasBusinessFunction gr:Sell ;
gr:hasPriceSpecification
[a gr:UnitPriceSpecification ;
gr:hasCurrency "EUR"@en ;
gr:hasCurrencyValue "24.10"^^xsd:float ;
gr:validThrough "2010-11-11T23:59"^^xsd:dateTime].</p>
      <p>Each ticket instance has access to a tio:Event, which subclasses a lode:Event[8],
an event:Event, and a dul:Event, and where the particular concert event dates
are described. We assume that the Web site also implements v:Event. This would
allow for comparative Rich Snippets. The obvious next step would be to bring
the social experience into play, granted the user has given access to her social
graph . This would mean to carry part of the \Facebook experience" right into
the search experience. Prior research by Troncy et al. has shown that users
generally start an event search on a general search engine, and that the decision
criteria whether or not to attend an event is often dependent on whom of the
user's friends plan to attend [10, 9].</p>
      <p>An entirely di erent class of extended Rich Snippets could be based on
multimedia semantics in order to provide richer video search results. We believe that
there is high potential for semantically annotated multimedia content to improve
content search. In Figure 4, we show a mock-up of a person highlighted, which
could be based on media fragment URIs. Such media fragment URI could look
like:
The components of this URI are rst a temporal dimension (t=428,434), which
selects seconds 428 to 434 of the whole video, and then a spatial dimension
(xywh=150,60,50,70 and xywh=240,50,50,70), which creates two bounding boxes
at the x, y parameters with a width w and a height h.</p>
      <p>Fig. 4. Sketch of an extended Rich Snippet featuring semantically highlighted video
preview (still frame or moving images), actor information, and in-video links.</p>
    </sec>
    <sec id="sec-5">
      <title>Triple-centric Networking</title>
      <p>The vision of extended Rich Snippet outlined above features information from
more than just one data source which is di erent from today's Rich Snippets
where the content is exclusively determined by the information in one
particular Web page. It is obvious that in order to combine information coming from
various data sources, an information sharing mechanism must be established.
In the following, we sketch a thought-experiment derived from Content-centric
Networking introduced by Jacobson [7]. In Content-centric Networking, there
are two notions of packages involved: Interest and Data packages. Interests
get broadcast by consumers, and as soon as a node can satisfy an interest, it
responds with the data. Otherwise, it rebroadcasts the interest. The main
advantage over common host-based networking is that data packages are not only
exclusively thought for the initially interested node, but can be shared between
nodes with common interests. A certain piece of information satis es an interest
if the content name in the interest package is a pre x of the content name in
the data packet. Applied to RDF triples, this could mean that the content name
would correspond to the subject. Figure 5 is adapted from Figure 2 in [7] and
illustrates how the interest and data packages could look like if we applied the
principle of Content-centric Networking to Triple-centric Networking.</p>
      <p>Experiments by Jacobson et al. have shown that Content-centric Networking
is useful when many parties are interested in the same content. This is de
nitely the case if we think Web scale triple propagation for popular Web pages.
One of the potential bottlenecks of rolling out multi data sources Rich Snippets
could thus be avoided by moving triple propagation to this new Triple-centric
Networking layer. We are at the early stages of this thought-experiment and
have not carried out any experimentation to justify our assumption. However,
we believe that there is a potential improvement to the host-to-host networking
infrastructure that dominates today's Internet tra c.
5</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>We have shown why and how Rich Snippets formats are currently supported
by Google and we have provided code samples in RDFa. We have then brie y
discussed the potential business impacts of Rich Snippets to Web site operators.
It has become visible that Rich Snippets are a very sensible element in the
Linked Data value chain due to the high visibility and the con rmed change
of user click-through behavior. We have outlined potential extensions to Rich
Snippets, driven by a concrete use case of an event-based Web search. We have
interlinked the social graph of a user with common event-related data in the
Linked Data cloud. In our mock-up, we neglected the business impact of Rich
Snippets entirely. However, it is obvious that for the online ticket search example,
the decision what ticket vendor to include, and what vendor to exclude from the
vendors shown in the Rich Snippets is a crucial one.</p>
      <p>
        The suggested addition of a Linked Data layer [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as layer 7a between the
current application and presentation layer to the ISO/OSI 7-layer architecture
could help establish the links between the data providers and facilitate the work
of, e.g., search engines, to make sense of these data and present them in an
e cient way. In addition to that, we have also introduced the thought-experiment
of Triple-centric Networking inspired by Jacobson's Content-Centric Networking.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>The research leading to this paper was partially supported by the project
AAL2009-2-049 \Adaptable Ambient Living Assistant" (ALIAS) co-funded by the
European Commission and the French Research Agency (ANR) in the Ambient
Assisted Living (AAL) programme, and by the projects FP7-216444
\Peer-topeer Tagged Media" (Petamedia), FP7-248296 \I-SEARCH" and FP7-256975
\LOD Around-The-Clock" (LATC) Support Action.
5. M. Hepp. Tickets Ontology v1.0, November 17, 2010. http://purl.org/tio/ns.
6. I. Hickson. HTML Microdata. W3C Working Draft, October 19, 2010. http:
//www.w3.org/TR/microdata/.
7. Jacobson, D. Smetters, J. Thornton, M. Plass, N. Briggs, and R. Braynard.
Networking Named Content. In 5th ACM International Conference on Emerging
Networking Experiments and Technologies (CoNEXT'09), Rome, Italy, 2009.
8. R. Shaw, R. Troncy, and L. Hardman. LODE: Linking Open Descriptions Of
Events. In 4th Asian Semantic Web Conference (ASWC'09), pages 153{167,
Shanghai, China, 2009.
9. R. Troncy, A. Fialho, L. Hardman, and C. Saatho . Experiencing Events through
User-Generated Media. In 1st International Workshop on Consuming Linked Data
(COLD'10), Shanghai, China, 2010.
10. R. Troncy, B. Malocha, and A. Fialho. Linking Events with Media. In 6th
International Conference on Semantic Systems (I-SEMANTICS'10), Graz, Austria,
2010.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>B.</given-names>
            <surname>Adida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Birbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McCarron</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. Pemberton.</surname>
          </string-name>
          <article-title>RDFa in XHTML: Syntax and Processing</article-title>
          .
          <source>W3C Recommendation, October</source>
          <volume>14</volume>
          ,
          <year>2008</year>
          . http://www.w3.org/TR/ rdfa-syntax/.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>S.</given-names>
            <surname>Decker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hauswirth</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          .
          <article-title>The Future Internet Assembly. Linked Data in the Future Internet</article-title>
          , December,
          <year>2010</year>
          . http://www.future
          <article-title>-internet.eu/home/ future-internet-assembly/ghent-dec-2010/session-i-linked-open-data-i.html.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>K.</given-names>
            <surname>Goel</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Gupta</surname>
          </string-name>
          . Google Rich Snippets.
          <source>Semantic Technology Conference</source>
          ,
          <year>2010</year>
          . http://semtech2010.semanticuniverse.com/sessionPop.cfm?confid=42&amp;proposalid=
          <fpage>2745</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>K.</given-names>
            <surname>Goel</surname>
          </string-name>
          , G. Ramanathan, and
          <string-name>
            <given-names>H.</given-names>
            <surname>Othar</surname>
          </string-name>
          . Introducing Rich Snippets.
          <source>Google Webmaster Central Blog, May</source>
          <volume>12</volume>
          ,
          <year>2009</year>
          . http://googlewebmastercentral.blogspot.com/
          <year>2009</year>
          /05/introducing-rich-snippets.html.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>