<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploiting Tag Clouds for Database Browsing and Querying</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stefania Leone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Geel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Moira C. Norrie</string-name>
          <email>norrieg@inf.ethz.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Information Systems, ETH Zurich CH-8092 Zurich</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We show how tag clouds can be used alongside more traditional query languages and data visualisation techniques as a means for browsing and querying databases. Our approach is based on a general, extensible framework that supports di erent modes of visualisation as well as di erent database systems. A number of demonstrator databases and interfaces will be used to show how tag clouds can be used to visualise and browse data or metadata and even a mix of both in object databases and relational databases. Further, we will demonstrate synchronised browsing based on tag clouds as well as ways in which tag clouds can be combined with other forms of querying and data visualisation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Tag clouds are widely used in Web 2.0 applications for visualising user-generated
tags and folksonomies of speci c web sites such as Flickr1. The presentation and
layout of tags can be controlled so that features such as the size, font and colour
can be used to give some measure of the importance of a given tag, while the
positioning of tags may be based on pure aesthetics or some form of relationship
between tags.</p>
      <p>
        Given the exibility of tag clouds in terms of information representation
together with the simplicity of the associated style of navigation, it is natural
that database researchers should consider exploiting the concept of tag clouds to
address the longstanding problems of database usability [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The use of a query
language requires the user to master not only the query language but also the
database schema. To allow users to view the data in a natural way, a higher-level
presentation of the database content such as a visual schema browser and query
interface is needed. Another approach is to focus on the data rather than the
schema as supported in keyword search interfaces to databases. Tag clouds have
been proposed as a means of summarising and re ning the results of keyword
searches as presented in [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. In this case, the term data cloud is used to refer
to their particular adaptation of tag clouds for this purpose. An interesting
      </p>
      <sec id="sec-1-1">
        <title>1 http://www. ickr.com</title>
        <p>
          feature of their approach is that since it was developed for relational databases,
the developer of a data cloud application speci es how application entities can
be composed from the relations in the database in order that keyword search
can be applied to entities rather than simple attributes or tuples. The keyword
search is based on a traditional information retrieval approach where entities
are considered as documents and attribute values as weighted terms. Another
project that uses tag clouds for summarising query results is PubCloud [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for
searching the PubMed biomedical literature database. In this case, the tag clouds
are generated from words extracted from the abstracts returned by the query.
        </p>
        <p>Our goal was to investigate the extent to which tag clouds could be exploited
to support more traditional forms of database browsing and querying, either
replacing existing query languages and other modes of data visualisation or
being used alongside them. Our tag clouds therefore mainly represent data and
metadata values rather than terms occurring within them. To support our
investigations, we have developed a general, extensible framework that supports
di erent modes of data visualisation, including customisable tag clouds. We have
also designed it so that di erent types of databases can be accessed and currently
have implementations for both object databases and relational databases.</p>
        <p>A key advantage of the tag cloud approach is that it is data-driven rather
than schema-driven which is particularly bene cial to users with no experience
of databases and query languages. Our initial user studies have shown that even
users with low computer literacy and no previous experience of tag clouds were
able to nd the results of non-trivial queries using our system. At the same time,
expert users also gave favourable feedback about the system and particularly
liked the fact that it could be combined with query expressions.</p>
        <p>Our contributions include:
{ A data browser that allows any data source to be browsed and queried using
tag clouds.
{ Experimentation with text and position features of tags in a tag cloud to
make clouds more informative.
{ A tool that serves di erent purposes: Novice users are able to access
structured data sources without knowing the query language and schema, while
expert users can browse a data source in order to get to know the schema
and thus be enable to express complex queries over the data source.
{ An extensible and exible platform for experimentation where new data
sources and new visualisation techniques can be added.</p>
        <p>In the following sections, we provide an overview of the data browser, the
architecture and also the demonstration.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Data Browser</title>
      <p>
        As highlighted in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], tag clouds serve multiple purposes. They can be used for
searching for speci c information, browsing a data collection without a speci c
target, as a tool for impression formation and gisting, and to recognise what a
data collection is about. In the Web, tags of a tag cloud are usually hyperlinks
that lead to a collection of items that are associated with a tag. Tag clouds are
graphically appealing due to di erent visualisation features. Tag cloud features
include text features, such as the tag content, the size, font style and colour
as well as the positioning and order of tags in a cloud. A lot of studies, such
as [5{7] have experimented with tag cloud features and positioning and their
impact on users. According to [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ], font size, font weight and intensity are the
most important features. While topic-based layouts of tags can improve search
performance for speci c search tasks compared to random arrangements, they
still perform worse than alphabetic layouts according to [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        We adapted these concepts to browse structured data where tags represent
attribute values. Clicking on a tag initiates a selection for data items with the
corresponding attribute value. In the case of object databases, the result would
be a collection of objects, while in the case of a relational database it would
be a collection of tuples, i.e. a relation. We note that concepts similar to those
proposed for data clouds in [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] could be adopted to return entities rather than
tuples for speci c applications. Similarly, it is possible to mix di erent attribute
values in a single tag cloud or to form tag clouds from combined attribute
values. In addition, we use these concepts to also browse metadata and have even
experimented with a mix of metadata and data within tag clouds.
      </p>
      <p>We now explain these concepts further by means of an example based on
a database with information about contacts and their locations. Generally, we
de ne a data source to be a set of data collections, where each collection contains
data items of a speci c type. These collections are either class extents or sets of
objects of a speci c type in object databases, while they are relations in relational
databases.</p>
      <p>The metadata that de nes the schema of a database can itself be represented
by a tag cloud as shown in Figure 1. On the lefthand side of this gure, the tag
cloud gives the names of the various collections of data items within the database.
The default is to have the size of the tags represent the relative cardinality of
the collection.</p>
      <p>A user can start browsing a database either by entering a query expression
in the window below the tag clouds or by selecting one or more of the tags
in the schema tag cloud. Each collection can have a default attribute or set of
attributes speci ed for its visualisation as a tag cloud. However, the user can
also specify this by means of a simple selection of attributes through checkboxes.
Alternatively, one can display the attributes themselves as a tag cloud in the
lefthand window and allow the users to select one or more attributes as tags.
In this way, we support synchronised browsing across the metadata and data
through the adjacent tag clouds.</p>
      <p>In the example of Figure 1, the attribute lastname is displayed in the tag
cloud on the right as indicated by the navigation path shown on top of the cloud
window. The size of the tags in this tag cloud represents how many data items
have that attribute value. In this way, the tag cloud can be considered as a
visualistaion of the attribute value frequency. The user can now click on a tag
and further re ne their selection. When hovering over a lastname tag, a user
gets detailed information about the number of objects that have this attribute
value, or in the case of only a single object, we get the set of attribute values.</p>
      <p>We o er di erent modes for visualisation, as depicted in Figure 2. In the
tag cloud in the upper-left corner, the contacts are displayed by lastnames. In
the upper-right corner, two attributes are bound to the tag content feature,
namely the attribute lastname of contacts as well as the attribute city of
the associated location objects. The tags thus represent the number of contacts
with a given name that live in the same city. In this example data set, the
tag Froidvaux-Zurich represents the set of contacts with lastname `Froidvaux'
who live in `Zurich'. As one can see in this example, more people with the name
`Froidvaux' live in `Uster', than in `Zurich'. In the lower-left corner of Figure 2, we
added colour as an additional visualisation dimension: The attribute lastname is
bound to the tag content, while the attribute city from the associated location
is bound to the colour feature. As one can see from the index on the righthand
side of the gure, each distinct attribute value of the city attribute is assigned
a speci c colour. We have experimented with these di erent tag features in a
user study. Care has to be taken in choosing the right attributes to bind to the
colour feature. It only makes sense, if the distinct set of values is not too large,
since otherwise the index becomes very large and the tag colours are not very
informative.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Architecture</title>
      <sec id="sec-3-1">
        <title>Data Browser User Interface</title>
      </sec>
      <sec id="sec-3-2">
        <title>Visualisation Library</title>
        <p>Text
ion r
ta ge
is a
lua an
is M
V</p>
      </sec>
      <sec id="sec-3-3">
        <title>Database Adapter</title>
        <p>Manager
Relational
Database
OO Database
...</p>
        <sec id="sec-3-3-1">
          <title>2 http://www.db4o.com/</title>
          <p>3 http://maven.globis.ethz.ch/projects/avon/
4 http://www.globis.ethz.ch/research/oms/platforms/omspro
interface which has to be implemented to add a new technique to the
visualisation library. We currently provide a tag cloud visualisation, and are working on
a bubble chart visualisation. Our data browser application is exible and
congurable and is currently used as a platform for experimentation in our research
group.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Demonstration</title>
      <p>In our demonstration, we will show how users can browse both relational and
object databases using our data browser. The demonstration will include showing
tag clouds over data, metadata and a mix of data and metadata. We will provide
a set of demonstrator databases including a contacts database and a publications
database implemented using both relational and object databases. Visitors will
be able to freely browse these databases, pose queries and exploratively get an
impression of the schema and the data. We will also provide a list of query
tasks from our user study so that users can experience how query results can
be obtained using only the bowser, using only query expressions and using the
browser in conjunction with query expressions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Jagadish</surname>
            ,
            <given-names>H.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elkiss</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jayapandian</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nandi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Making Database Systems Usable</article-title>
          .
          <source>In: Proc. ACM SIGMOD'07</source>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Koutrika</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zadeh</surname>
            ,
            <given-names>Z.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Molina</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Data Clouds: Summarizing Keyword Search Results over Structured Data</article-title>
          .
          <source>In: Proc. EDBT'09</source>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Koutrika</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zadeh</surname>
            ,
            <given-names>Z.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Molina</surname>
          </string-name>
          , H.:
          <article-title>CourseCloud: Summarizing and Re ning Keyword Searches over Structured Data</article-title>
          .
          <source>In: Demo Proc. EDBT'09</source>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kuo</surname>
            ,
            <given-names>B.Y.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hentrich</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Good</surname>
            ,
            <given-names>B.M..</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilkinson</surname>
          </string-name>
          , M.D.:
          <article-title>Tag Clouds for Summarizing Web Search Results</article-title>
          .
          <source>In: Proc. WWW'07</source>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Rivadeneira</surname>
            ,
            <given-names>A.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gruen</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muller</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Millen</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          :
          <article-title>Getting our Head in the Clouds: Toward Evaluation Studies of Tag Clouds</article-title>
          .
          <source>In: Proc. CHI '07</source>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bateman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutwin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nacenta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Seeing Things in the Clouds: The E ect of Visual Features on Tag Cloud Selections</article-title>
          .
          <source>In: Proc. 19th ACM Conf. on Hypertext and Hypermedia</source>
          . (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Schrammel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leitner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tscheligi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Semantically Structured Tag Clouds: An Empirical Evaluation of Clustered Presentation Approaches</article-title>
          .
          <source>In: Proc. CHI'09</source>
          . (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>