<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Tag clouds and retrieved results: The CloudCredo mobile clustering engine and its evaluation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stefano Mizzaro</string-name>
          <email>mizzaro@uniud.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Sartori</string-name>
          <email>sartori.uni@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giacomo Strangolino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics and Computer Science University of Udine Udine</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We discuss the use of tag clouds as a way of visualizing the results of a clustering search engine. We brie y present a speci c tag cloud approach and its implementation in the CloudCredo prototype. Then we describe an experimental user study aimed at demonstrating that tag cloud visualization is: (i) as e ective as classical tree like visualization; and (ii) particularly e ective on small screen devices. Towards the aim (i), we compare CloudCredo with a similar system, Credino; towards (ii), in the experiment the two systems are compared on iPhone and iPad, two similar devices di ering mainly in their size. Results, although preliminar, support the hypotheses.</p>
      </abstract>
      <kwd-group>
        <kwd>Clustering</kwd>
        <kwd>Mobile devices</kwd>
        <kwd>Tagcloud</kwd>
        <kwd>Evaluation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>On the Web, there is a growing number of clustering search engines, namely
search (or, more often, meta-search) engines that present the retrieved
documents organized in clusters: similar documents are grouped together under a
meaningful label; clusters are organized hierarchically (i.e., clusters are divided
into sub-clusters, and so on) and usually shown in a tree-like manner; and the
end user can browse the retrieved results by focusing on speci c clusters. Some
examples of these systems are: Yippy (formerly known as Clusty and Viv simo)
www.yippy.com or CREDO credo.fub.it.1 Even classical search engines like
Google show some signal of a clustering approach, although they are still much
more oriented towards the classical ranked list.</p>
      <p>
        The cluster approach seems particularly adequate and e ective for mobile
devices, since it allows to use the limited screen space in a more e ective way.
This approach has been proposed and evaluated for the CREDO system, and its
mobile versions Credino and SmartCREDO [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. Indeed, mobile search engines
are an important and hot research topic: as it is well kwon, several statistics
1 At the time of writing CREDO is not available.
show that Internet tra c in general, and queries to search engines in particular,
generated by means of mobile devices are quickly increasing. It is foreseen that
by 2015 there will be more mobile users than desktop Internet users.
      </p>
      <p>However, the classical tree-based visualization of document clusters is not the
only possibility. In this paper we propose a tag cloud based visualization that, in
our opinion, has the potential to be particularly e ective on small-screen mobile
devices. Our aim is twofold:
{ to understand if the tag cloud visualization is e ective;
{ to understand if it is particularly e ective on mobile device small screens.</p>
      <p>The paper is organized as follows: Sect. 2 de nes tag clouds and motivates our
approach; Sect. 3 presents CloudCredo, a mobile clustering engine implementing
the tag cloud approach, and recalls Credino, a companion system used in the
evaluation; Sect. 4 describes the user study that we performed to experimentally
evaluate the tag cloud e ectiveness.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Tag clouds</title>
      <p>A tag cloud (or word cloud) is a set of terms organized spatially and graphically
(in terms of fonts and colors) to visually highlight the most important terms.
Tag clouds are very common: they are being used quite often on the Web, to
show the tags used to annotate web resources, to summarize the main topics of
a Web site, and so on. There are several kinds of tagclouds, that can di er for
the selection of terms, the graphical aspect, and the auxiliary information shown
(like a count of each term frequency in the original text); a description can be
found at en.wikipedia.org/wiki/Tag_cloud.</p>
      <p>As mentioned above, we propose to use a tag cloud to show the label of
the clusters. The rationale for this approach is that a tag cloud can show the
same labels and use less space than the classical tree-like visualization, although
admittedly in a less organized way. Moreover, not only the cluster labels are
shown as a tag cloud, but the labels are clickable, and can be expanded into
sub-clusters (as in the tree like visualization). Also, we speci cally tailor mobile
devices, and we are interested in studying the e ectiveness of the tag cloud
clustering approach when screen space is limited.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Credino and CloudCredo</title>
      <p>
        We build on Credino system [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], implemented with the aim of porting the
CREDO clustering engine on a mobile device (a PDA was used in the original
paper [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], but we slightly adapted it to more recent devices like the iPhone).
Figure 1(a) shows a screenshot of Credino on an iPhone. On the basis of Credino,
we implemented CloudCredo, that visualizes the same clusters as Credino by
means of a tag cloud. Figures 1(b) and (c) show the screenshot of CloudCredo
on an iPhone and an iPad. Credino and CloudCredo are both meta-search
engines on CREDO, therefore they both show exactly the same cluster hierarchy,
(a) Credino
      </p>
      <p>(b) CloudCredo on iPhone
(c) CloudCredo on iPad
just visually di erent. As can be seen in Figures 1(b) and (c), our tag cloud
implementation exploits both colors and size, and each cluster also shows the
number of documents in it. The tag cloud implementation, similarly to the
classical hierarchical tree one, allows to expand a category into subcategories, by
clicking on the \[+]" sign close to the tag (and to compact it by clicking on
\[-]"). Both Credino and CloudCredo are Web applications that can be used
by any standard Web browser; on iPhone and iPad they adapt smoothly to the
portrait/landscape orientation of the device.</p>
      <p>We are not alone in proposing to use tag clouds to show the retrieved results;
the Quintura search engine www.quintura.com/ does exactly that. Our approach
is slightly di erent, though, since: (i) our tags/clusters can be expanded into
subtags/sub-clusters; and (ii) we speci cally target mobile devices in this work.</p>
      <p>CloudCredo is available at smdc.uniud.it/CloudCredo; the version of
Credino used in the experimental evaluation described below is at credino.
dimi.uniud.it/. The two systems, being based on CREDO (see Footnote 1),
are not available at the time of writing.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental evaluation</title>
      <p>We performed a user study towards the two aims stated at the end of Section 1.
These can be translated into the following experimental hypotheses: (i)
CloudCredo is as e ective as Credino; and (ii) CloudCredo e ectiveness turns out to
be high in particular on small screens.
4.1</p>
      <sec id="sec-4-1">
        <title>Experimental design</title>
        <p>We used the two systems Credino and CloudCredo in our evaluation. We also
used an iPhone and an iPad: since the two devices are very similar, the main (if
not only) di erence being their size, we try in this way to single out the e ect
of size. Thus, our experiment has two independent variables:
{ device, or size (iPhone and iPad);
{ system (Credino and CloudCredo).</p>
        <p>
          48 participants, recruited in our university, were involved in our
study. Each participant was asked to perform 4 tasks. The tasks were
built by starting from the most frequent queries on Google Mobile
www.google.com/intl/en/press/zeitgeist2008/: we selected 4 of them and
built 4 simulated work task situations [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] around them. Figure 2 shows task 1,
translated from Italian to English, as given to the user. To have a more
controlled environment, we speci ed the initial query. To limit learning e ects, we
relied on a Graeco-Latin square design: each subject performed her 4 tasks on
the four system/device combinations, in a di erent order.
        </p>
        <p>As dependent variables, we measured both objective user e ectiveness and
subjective user satisfaction. User e ectiveness was measured as a linear
combination of: the success in nding the appropriate page (a binary value in f0; 1g),
Task 1
{ Description: Imagine that you are going to visit a friend in Rome and therefore
you want to nd some information about cultural events (e.g., exhibitions and
concerts) that will take place during your stay in town.
{ Task: Retrieve two di erent pages.</p>
        <p>A page is relevant if it provides the date, time, and location of an event taking
place in Rome during the next 30 days. Pages discussing an event in a general
way, without specifying the above data, will be not relevant.
{ Other instructions: Start with the query [rome events]
the speed (computed on the basis of time needed and normalized into the [0; 1]
range), and the con dence the user had to have performed her task correctly
(again normalized into [0; 1]). We de ned three di erent combinations of these
three factors, with di erent weights; however, there was no di erence among the
three combinations. In the following we measure e ectiveness E as</p>
        <p>E = success (2=3 speed + 1=3 con dence)
(if success is 0, then E is 0 as well; speed is more important than con dence).</p>
        <p>User satisfaction was measured by means of questionnaires: participants lled
in a questionnaire after each task completion, and one nal questionnaire as well.
Questionnaires collected, by means of Likert scales, data about:
{ di culty of the task;
{ di culty of using the system;
{ adequacy of the system to the device.</p>
        <p>We combine these three values into a single satisfaction one S0 by taking their
average, normalized onto [0; 1]:</p>
        <p>S0 = 1=3 task d + 1=3 system d + 1=3 adequacy:
We also take into account other two questionnaire items that, as a control,
asked whether the participant preferred the other system or device. The nal
satisfaction S was computed by slightly changing S0 to take these into account.</p>
        <p>We adopted the usual procedures of a laboratory testing: each subject was
briefed and trained, she lled in a rst questionnaire with some demographics
data, then she started the four task-questionnaire iteration, and nally lled in
the last questionnaire. We also ran a pilot test, that con rmed the choice of the
four tasks and allowed to estimate the maximum time allowed for each task.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Results</title>
        <p>The collected demographics show that participants were either university
students (45 out of 48) or just graduated searching for a job. They had good | and</p>
        <p>Fig. 3. Overall results.
homogeneous | knowledge of computers, Web search, and mobile devices. All
of them were aware of iPhone and iPad devices. Nobody had used a clustering
engine before.</p>
        <p>Figure 3 shows the overall results. The four device/system combinations are
shown in the corners of the gure; the four charts show the di erences | in both
S and E | for the single tasks and averaged on the four tasks. The bars are
oriented towards the best device/system combination, e.g., the leftmost vertical
bar shows that iPad/Credino had a higher e ectiveness E than iPhone/Credino
on task 1.</p>
        <p>
          By analyzing the gure we can understand that:
{ Since all the \average" bars point towards right, on average, CloudCredo
had both a higher E and a higher S than Credino.
{ Since all the average bars point towards up (with a single exception, the
rightmost E bar, which is anyway very small in absolute value), on average,
the iPad device had both a higher E and a higher S than iPhone.
{ Combining the previous two points, iPad/Cloud was the most e ective and
most preferred combination.
{ The above considerations seem stronger for S, which has longer bars.
{ We can see that the above results hold for most of the single tasks as well:
there are only 6 bars on speci c tasks that disagree with the average bar
(out of 32 possibilities).
{ Also, on the single tasks, E and S are often in agreement, although in 6 out
of 16 cases they are not.
{ Although we were interested in showing that the tag cloud visualization was
as e ective as the tree-like one, these results are a rst cue that it is even more
e ective and preferred. However, there is almost no statistical signi cance
on the di erences. On E, according to the Mann-Whitney-Wilcoxon test,
the only statistically signi cant di erence (at the 0:05 level), is on task 4
between iPhone/Credino and iPhone/Cloud (the longest horizontal E bar
in gure). Statistical signi cance is slightly higher on S: although most of
the di erences are not signi cant, the preference of iPad/CloudCredo to
iPad/Credino is signi cant at the 0:05 level, according to the
Mann-WhitneyWilcoxon test.
{ Therefore, the two visualization approaches can be considered equivalent,
with a slight preference for the tag cloud one. This con rms the rst
hypothesis.
{ Turning to the second hypothesis, the gure shows that the average di erence
is slightly higher at the iPhone level than at the iPad one. There is no
statistical signi cance for this result, however, also because there are quite
high variations over the single topics (i.e., bars on the top chart are often
very di erent from the corresponding bars on the bottom chart | see, for
example, the striking di erence on the E value on task 2). Thus we can only
say that there is a slight indication of the particular e ectiveness of the tag
cloud approach on small screens, also on the basis of the results in [
          <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
          ] that
showed how the clustering approach of Credino is more e ective on small
screens than on large ones.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>
        We have proposed a tag cloud based approach to the visualization of the
retrieved results by a clustering search engine. Our experimental study on two
prototypes supports the hypotheses that tag clouds are an e ective visualization
alternative, especially on small screen mobile devices. The second point is
particularly critical, since we do not have a statistically signi cant proof of it. We
do not have any contrary evidence, though; this, combined with the results of
previous studies [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] makes indeed interesting the option of using a tag cloud
based approach on mobile devices, although further evidence should be found.
      </p>
      <p>The experimental design needs some further remarks. The usual user study
performed in information retrieval aims at demonstrating that a new version
of some system reaches higher e ectiveness and/or user satisfaction than some
baseline. Our experimental study was somehow di erent from this classical
setting, since we were interested in showing that an alternative system (actually,
visualization approach) is as e ective as a classical one.</p>
      <p>Although the results of our user study are positive, they are preliminary: we
used four tasks only, and the user population is quite homogeneous. Therefore,
a rst and obvious future work direction is to repeat the experiments with a
higher number of tasks and with a di erent, and perhaps more heterogeneous,
user population. Also, a more sophisticate experimental design can help to prove
the second hypothesis. A last direction is to implement native applications for
iPhone/iPad (and Android as well) of CloudCredo and Credino: this would allow
a more e ective interaction and a better user experience.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>P.</given-names>
            <surname>Borlund</surname>
          </string-name>
          .
          <article-title>The IIR evaluation model: a framework for evaluation of interactive information retrieval systems</article-title>
          .
          <source>Information Research</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          ):paper no.
          <source>152+</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>C.</given-names>
            <surname>Carpineto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Della</given-names>
            <surname>Pietra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizzaro</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Romano. Mobile Clustering</surname>
          </string-name>
          <article-title>Engine</article-title>
          .
          <source>In Proceedings of the 28th European Conference on Information Retrieval</source>
          , London, UK, volume
          <volume>3936</volume>
          of Lecture Notes in Computer Science, pages
          <volume>155</volume>
          {
          <fpage>166</fpage>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Carpineto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mizzaro</surname>
          </string-name>
          , G. Romano, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Snidero</surname>
          </string-name>
          .
          <article-title>Mobile information retrieval with search results clustering: Prototypes and evaluations</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          ,
          <volume>60</volume>
          (
          <issue>5</issue>
          ):
          <volume>877</volume>
          {
          <fpage>895</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>