=Paper=
{{Paper
|id=Vol-201/paper-27
|storemode=property
|title=FaceTag: Integrating Bottom-up and Top-down Classification
|pdfUrl=https://ceur-ws.org/Vol-201/05.pdf
|volume=Vol-201
|dblpUrl=https://dblp.org/rec/conf/swap/RosatiRQ06
}}
==FaceTag: Integrating Bottom-up and Top-down Classification==
<pdf width="1500px">https://ceur-ws.org/Vol-201/05.pdf</pdf>
<pre>
FACETAG: INTEGRATING BOTTOM-UP AND TOP-DOWN CLASSIFICATION IN A SOCIAL TAGGING SYSTEM                                                     1


        FaceTag: Integrating Bottom-up and Top-down
          Classification in a Social Tagging System
                                                 Quintarelli, E. - Resmini, A. - Rosati, L.


  Abstract – Facetag is a working prototype of a semantic                        Despite their low cognitive cost, their capability of
collaborative tagging tool conceived for bookmarking                          matching users’ real needs and language and their great
information architecture resources. It aims to show how the                   value in a serendipity research task, folksonomies imply
widespread homogeneous and flat keywords' space of tags can be                however a lack of precision, a very low findability
effectively mixed with a richer faceted classification scheme to
improve the “information scent” and “berrypicking” capabilities
                                                                              quotient (especially in a known-item approach) and a
of the system. The additional semantic structure is aggregated                limited scalability for the intrinsic variability of language
both implicitly observing user behaviour and explicitly                       [Quintarelli 2005].
introducing a compelling user experience to facilitate the                       As a result of the inherently inconsistent, evolving and
creation of relationships between tags directly by end-users.                 much variable process of associating words and meanings,
Facetag current implementation is written in PHP / SQL and                    tagging systems are also implicitly plagued by a number of
includes an open API which allows querying and integration                    issues which include polysemy, homonymy, plurals,
from other applications.                                                      synonymy, problems of ego-oriented nature and basic
  Index Terms – Social classification, folksonomy, tagging,                   level variation which do not appear easy to solve [Golder
faceted classification, information architecture.                             & Huberman 2005]. Any of these problems can
                                                                              dramatically reduce the effectiveness of the application,
                       I. INTRODUCTION ∗                                      mining the benefits brought on by the use of tagging
                                                                              systems.
   Collaborative tagging systems have been largely adopted by                    In addition, tags have recently started to be used by
end-users as useful and powerful tools to organize, browse                    bloggers as reading-aids to help users identify articles and
and publicly share personal collections of resources on the                   posts of interest, providing as such a complimentary
World Wide Web through the introduction of simple                             structure over a purely chronological list of text pieces.
metadata.                                                                     This approach marks a major shift, in that tagging also
   The aggregation of user metadata is often referred to as a                 becomes a tool to maximize findability and browsability
folksonomy, a user-generated classification, emerging through                 without limiting the reader to only access the most popular
bottom-up consensus while users assign free form keywords                     or recent tags as in common tag clouds [Feinstein &
to online resources for personal or social benefit. Del.icio.us               Smadja 2006].
<http://del.icio.us/>,     Flickr      <http://www.flickr.com/>,                 Tag clouds are widely used visual interfaces for
43things            <http://www.43things.com/>,             Furl              information retrieval that provide a global contextual view
<http://www.furl.net/>                and             Technorati              of tags assigned to resources in the system. In such a
<http://www.technorati.com/> are web-based collaborative                      structure, the most popular tags are usually displayed
systems for building shared databases of items, enriched by a                 through an alphabetically ordered list with the font size
flat metadata vocabulary that can be used to perform                          increasing with the tag's relevance. Users browse the
metadata-driven queries, to monitor change in areas of interest               cloud, scanning hyperlinks to recognize information of
or to discover emergences or trends, such as the hottest / most               interest [Hassan-Montero & Herrero-Solana 2006].
popular topics in the system [Quintarelli 2005].                                 Flat tag clouds are anyway not sufficient to provide a
   In the past, folksonomies have often been seen as                          semantic, rich and multidimensional browsing experience
orthogonal to taxonomies and controlled vocabularies: the                     over large tagging spaces:
latter rigid, hierarchical and organically hand-crafted by                         •    Choosing tags by frequency of use inevitably
professionals a priori; the former flat, inclusive and emerging                         causes a high semantic density with very few
from bottom-up users' consensus [Quintarelli 2005]. In a flat                           well-known and stable topics dominating the
tagging system each document can be retrieved through a                                 scene (as seen on RawSugar,
simple set of keywords, collaboratively introduced by users to                          <http://www.rawsugar.com/>);
describe and categorize the document, very much like in a                          •    Providing only an alphabetical criterion to sort
keyword-based search process in which descriptive terms can                             tags heavily limits the ability to quickly navigate,
be used to get a set of applicable items.                                               scan and extract, and hence build a coherent
                                                                                        mental model out of tags;
∗
  This paper is the result of a collaborative effort. Nonetheless, Emanuele
Quintarelli specifically wrote paragraphs I-II, Andrea Resmini wrote
                                                                                   •    A flat tag cloud cannot visually support semantic
paragraphs V-VI and Luca Rosati paragraphs III-IV.                                      relationships between tags. We suggest that these
FACETAG: INTEGRATING BOTTOM-UP AND TOP-DOWN CLASSIFICATION IN A SOCIAL TAGGING SYSTEM                                                      2


          relationships are needed to improve the user               <http://demo.siderean.com/facetious/facetious.jsp>; Etsy
          experience and general usefulness of the system;           <http://www.etsy.com> 1.
     •    Current tag clouds often miss to provide complex              The choice of facets is based on the CRG theory
          logical operation over tags. Simply clicking on a tag      [Vickery 1960]. Indeed, an aspect often underestimated on
          is not enough to enable a smooth and powerful              the World Wide Web is that both Ranganathan and the
          exploration or refinement.                                 CRG described a generic schema for faceted classification,
   Even if Facetag doesn’t promise to address all of these           which every actual schema can refer to. Thus, in a faceted
issues, we believe our approach can limit the impact of              classification project one does not have to rebuild the
polysemy, homonymy and basic level variation while                   schema from scratch every time, but may follow a constant
introducing an innovative, multidimensional and more                 guideline while building one's main categories (i.e. facets).
semantic paradigm for organizing, navigating and searching           CRG postulates 11-13 general categories. In the table
large information spaces through tags.                               below we show the matching between CRG standard
   To reach this goal, FaceTag mixes three contributions to          categories and IA-related categories that were used to
social tagging systems:                                              define our facets.
     •    The use of (optional) tag hierarchies. Users have the
          possibility to organize their resources by means of            TABLE 1: FACETAG FACETS DEFINITION BY CRG STANDARD
                                                                     CATEGORIES.
          father-son relationships;
     •    Tag hierarchies are semantically assigned to                   CRG                              FaceTag
          editorially established facets that can be later               Thing                            [Documents, resources]
          leveraged on to flexibly navigate the resource                 Type                             Resource Types
          domain;                                                                                         (e.g. online report,          case
     •    Tagging and searching can be mixed to maximize                                               study...)
          findability, browsability and user-discovery.                  Part                             --

               II. OVERVIEW OF FACETAG                                   Property                         Language

   Until today, one of the main limitations of hierarchical              Material                         [Format]
faceted categories was the lack of a good automated process              Process                          --
for both creating the categories and associating items to the
hierarchy of labels under each facet [Hearst 2006a].                     Operation                        Activities/Subjects
   We decided to avoid the issue entirely and use no                                                      (e.g. competitive analysis,
algorithmic round-ups: Facetag is built around the notion that                                         faceted classification ...)
the users provide the structure and especially aims to                   Product                          [Deliverables]
investigate how a hierarchical and faceted metadata structure            Byproduct                        --
can be added to user generated content making use of tags
provided by end users in collaborative systems, limiting the             Patient                          Usage
amount of effort and toil required through a careful user                                                 (e.g. Industry, Health ...)
interface design.                                                        Agent                            People
   III. FACETED ANALYSIS: THE FACETED SCHEME                             Space                            [Country]
                         CONSTRUCTION                                    Time                             Date
   Although facet, faceted have become very common terms
in the information architecture field, their application falls          A preliminary analysis of a corpus of IA resources from
often far from its original meaning. The attribute faceted,          the     Information       Architecture   Institute Library
indeed, is used in a large variety of meanings, and is often         <http://iainstitute.org/library/> allowed us to define six
referred loosely to the availability of means to search by           facets which appeared to be suitable for the classification
different keys [La Barre 2004]. The full theory of faceted           of IA resources.
classification, as it has been developed by Ranganathan and
the Classification Research Group (CRG) and which includes
rules for citation order and notation, is less widespread as a
backend for website organization; remarkable exceptions are
offered by projects staffing librarians, such as FATKS [Slavic
2002].
   So, we thought to apply faceted classification to the IA field
itself respecting in full the original library theory, in order to
leverage on its potentialities and obtain maximum benefits. In
such perspective, our design was inspired by these projects:         1     Both Facetious and Etsy mix proper facets and metadata (formal
Flamenco project <http://flamenco.berkeley.edu/>; Facetious          proprieties of an item).
FACETAG: INTEGRATING BOTTOM-UP AND TOP-DOWN CLASSIFICATION IN A SOCIAL TAGGING SYSTEM                                              3


  TABLE 2: FACETAG FACETS AND EXAMPLES OF FOCI                       activities, to which the user interface adapts providing
               Facet                        Examples                 different aiding tools (navigation, resource management)
   Resource Types                white paper, case study etc.        and different behaviours (zooming, tag suggestions)
   Language                      predefined values (based on
                              ISO Standard ISO 639-2)                respectively.
   Activities/Subjects           discovery>competitive                   When a user accesses the application first, Facetag
                              analysis, classification>facets        replies in browsing mode and she is presented a page
   Usage                         industry, public                    which lists the most recent additions to the system in the
                              administration, health etc.
   People                        dion hinchcliffe, morville          main body. Other relevant parts of the user interface are a
   Date                          automatically added by the          search box and a sidebar. The sidebar lists facets and
                              software                               pertaining first-level tags with query previews, i.e the
                                                                     number of resourced associated to each tag automatically
  The foci listed near some of the facets serve the only             generated from the schema and data stored in the database.
purpose of making the facets self-explanatory. In the actual             Inside Facetag, a user can decide to look for content a)
implementation, since tags are our foci, foci will be user-          by entering keywords b) by choosing first-level tags from
generated, with the only exception of the language facet,            a specific facet list.
which will use a predefined list of languages in the ISO 639-2           If the user enters a keyword, Facetag returns the
notation, and the date facet, which will receive a software-         paginated results set of all the resources which either
generated timestamp upon resource creation.                          contain that keyword in their tags or in their title,
                                                                     description or notes. The sidebar facet display is adjusted
 IV. BERRYPICKING, INFORMATION SCENT AND THE                         to show only those facets and pertaining first-level tags
       TWO AXIS OF INFORMATION ARCHITECTURE                          which are related to the results set.
   As a matter of fact, facets constitute an adaptive                    In case the keyword happens to be an nth-level tag, the
classification system capable, in force of its own nature, to        corresponding facet will show all nth+1 tags and add any
represent:                                                           broader tag in the hierarchy up to the nth-1 tag to the facet
     •    in movement knowledge, like that observable in a           title as clickable items which allow zooming out. If there
          social collaborative context;                              is no nth+1 tag, the facet is not displayed.
     •    several mental models at the same time, such as those          If the user clicks on a tag from the facet sidebar,
          playing their role in this context.                        Facetag returns the paginated results set of all the
   Furthermore, facets are particularly suitable to classify a       resources which have been tagged with that tag. A
homogeneous collection of items – i.e. a set of resources            breadcrumb path is displayed which lists the active facet
belonging to a specific disciplinary area.                           (the one the tag is a focus for) and the position of the tag
   Besides enforcing order on the flat space of keywords, the        in any tag hierarchy it may belong to.
blend of tags and facets is able to empower the “information             The sidebar facet display is adjusted consequently. The
scent” [Chi et al. 2001] and the “berrypicking” [Bates 1989]         active facet shows all broader tags from the hierarchy the
capabilities of the system. Every information architecture           selected tag may be part of alongside the facet title, and all
project refers to two different information axes:                    pertaining narrower tags. Inactive facets show first-level
     •    a vertical (or paradigmatic) axis, i.e. the hierarchical   tags which relate to the resources pertaining to the results
          relationship that each item of a system engages with       set.
          the others;                                                    Upon subsequent zooming in and refining the query,
     •    a horizontal (or syntagmatic) axis, i.e. the semantic,     when there are no narrower tags, the breadcrumb display is
          contiguity relationship that each item engages with        maintained to allow zooming out or what we call
          the others.                                                disengaging, resetting the search, while the active facet
   In our case, the combination of tags and facets allows for        display is effectively removed from the sidebar.
better management of both these axes:                                    Obviously, a user may start searching for a keyword and
     •    from the vertical or paradigmatic point of view, when      then adjust her results set using facets, combining the two
          a user is going to associate a keyword to a facet (in      approaches in any way she prefers until she reaches a
          order to tag a resource), the system suggests similar      satisfactory answer, or proceed viceversa and zoom in and
          tags or hierarchy of tags pertaining to the same facet;    out by using tags. Similarly, tags pertaining to different
     •    from the horizontal or syntagmatic point of view, at       facets can be used together during a single search to
          the same time, the system will allow the user to see       narrow down a results set quickly and efficiently. If there
          all the other tags belonging to the same facet(s).         is no disengagement, all subsequent operations are
                                                                     performed on the intermediate results set.
        V. FACETED HIERARCHICAL TAGGING                                  If a user logs in, access to the administrative interface is
   Facetag deals with users, resources, tags and facets in two       granted and adding, editing and deleting resources and
quite distinct ways: since it's a social tagging application, it     tags becomes possible.
offers both a browsing/searching mode and an                             Upon entering new resources, a user is provided with a
administrative/editing mode. These are two different
FACETAG: INTEGRATING BOTTOM-UP AND TOP-DOWN CLASSIFICATION IN A SOCIAL TAGGING SYSTEM                                                                        4


simple form with entry fields for every facet. These tag fields                                         VII.        REFERENCES
are optional, and can be left empty at will: there is no
mandatory facet. But if a user start to enter a tag, the                             Bar-Ilan J., Shoham S., Idan A., Miller Y., Shachak A., (2006)
completion tool suggests similar tags from the pertaining facet                   Structured vs. unstructured tagging – A case study, WWW20006,
only. Moreover, since users can optionally identify two or                        Edimburg <http://www.rawsugar.com/www2006/12.pdf>.
more tags as a hierarchy through a simple syntax (using the                          Broughton, V. (2001) Klasifikacija za 21. stoljece: nacela i struktura
‘>’ character), the completion tool can suggest, again facet per                  Blissove bibliografske klasifikacije [= A classification for the 21st
                                                                                  century: principles and structure of the Bliss bibliographic
facet, not just similar tags, but similar tags as parts of a                      classification], Vjesnik bibliotekara Hrvatske, 44, 1-4, p. 38-51; trad. it.
hierarchy 2 of tags, hence effectively suggesting an entire                       Una classificazione per il 21’ secolo: principî e struttura della
hierarchy.                                                                        Classificazione bibliografica Bliss, AIB-WEB. Contributi,
   Gradually, with use, these hierarchies acquire complexity                      <http://www.aib.it/aib/contr/broughton1.htm>.
and become globally significant in the system.                                       Campbell, G.D., Fast, K.V., (2006) From Pace Layering to Resilience
   Editing or modifying can be done seamlessly from the                           Theory: The Complex Implications of Tagging from Information
browsing interface, by clicking icons which appear next to                        Architecture, Proceedings of IA Summit 2006 (Vancouver, March 23-27,
                                                                                  2006), ASIS&T
one's own resources. Noticeably, the same happens if a user                       <http://www.iasummit.org/2006/files/164_Presentation_Desc.pdf>.
tries to add a resource she already added (based on URI
                                                                                     Chi, E.H. - Pirolli, P. , Chen, K. – Pitkow, J. (2001) Using
identification): Facetag simply supplies the editing interface                    Information Scent to Model User Information Needs and Actions on the
preloading the original data.                                                     Web, Proceedings of the SIGCHI conference on Human factors in
                                                                                  computing systems (Seattle, Washington, 2001), ACM Press
                     VI. CONCLUSION                                               <http://www2.parc.com/istl/projects/uir/publications/items/UIR-2001-07-
                                                                                  Chi-CHI2001-InfoScentModel.pdf>.
   By providing the user with facets to which hierarchical sets
of tags relate and pertain and a usable interface which adapts                       English, J., Hearst, M., Sinha, R., Swearingen K., and Yee, P.,
to the ongoing query, Facetag may solve, through                                  (2002a) Hierarchical Faceted Metadata in Site Search Interfaces, CHI
                                                                                  2002 Conference Companion
contextualization and user-added semantic value, most of the                      <http://flamenco.berkeley.edu/papers/chi02_short_paper.pdf>.
basic issues connected with polysemy, homonymy and base
                                                                                     -- (2002b) Flexible search and browsing using faceted metadata,
level variations.                                                                 Unpublished Manuscript
   While further testing and usability studies are needed to                      <http://flamenco.berkeley.edu/papers/flamenco02.pdf>.
verify to which extent users are motivated to use our
                                                                                     Feinstein, D., Smadja F., (2006) Hierarchical Tags and Faceted
prototype and to introduce structure in addition to flat tags,                    Search. The RawSugar Approach, Proceedings of SIGIR 2006 (August 6-
preliminary user evaluations show how the addition of                             11, 2006, Seattle, Washington).
hierarchies and facets can improve and disambiguate the                               Flamenco Group (2002) How to Build a Flamenco instance
meaning of tags giving them a stronger context and a more                         <http://bailando.sims.berkeley.edu/flamenco/howtobuild/howtobuild.htm
coherent organization. For example, by navigating a hierarchy                     l>.
users can make better sense of the meaning of a tag, discover                        Gnoli, C., Marino, V., Rosati, L., (2006) Organizzare la conoscenza.
related tags at different levels of specificity and exclude                       Dalle biblioteche all'architettura dell'informazione per il Web [=
homonimies or find out a large number of other tags that can                      Organizing Knowledge. From Libraries to Information Architecture for
                                                                                  the Web], Tecniche Nuove.
be of interest. This approach also tends to augment the
scalability of the system when addressing the enormous                               Golder, A.S., Huberman, B.A., (2005) The Structure of Collaborative
domains presented today by the most appreciated social                            Tagging          Systems,       Information      Dynamics         Lab
                                                                                  <http://arxiv.org/pdf/cs.DL/0508082>.
applications.
   Improving on current features, Facetag aims to provide an                         Hassan-Montero, Y., and Herrero-Solana, V., (2006) Improving Tag-
                                                                                  Clouds as Visual Information Retrieval Interfaces, International
advanced tagging experience through other innovative tools or                     Conference on Multidisciplinary Information Sciences and Technologies,
widgets, like a Firefox plugin to seamlessly add new                              InSciT2006
bookmarks while browsing, a WYSIWYG editor to offer drag                          <http://www.nosolousabilidad.com/hassan/improving_tagclouds.pdf>.
and drop inclusion of texts and pictures from the web page the                       Hearst, M.A. (2006a) Clustering versus faceted categories for
user is bookmarking, and a history of all the times a bookmark                    information exploration. Communication of the ACM April Vol 49, No.4
has been tagged.                                                                  <http://flamenco.berkeley.edu/papers/cacm06.pdf>.
   Future works include testing the application on a real user                       -- (2006b) Design Recommendations for Hierarchical Faceted Search
base and verifying the outcomes, both in terms of internal                        Interfaces, ACM SIGIR Workshop on Faceted Search
logic and usability tests to widely prove the benefits of a                       <http://flamenco.berkeley.edu/papers/faceted-workshop06.pdf>.
semantic tagging application.                                                        -- The Flamenco Search Interface Project
                                                                                  <http://flamenco.berkeley.edu/pubs.html>.
                                                                                     Heymann, P., Garcia-Molina, H., (2006) Collaborative Creation of
                                                                                  Communal Hierarchical Taxonomies in Social Tagging Systems,
                                                                                  Technical Report InfoLab <http://dbpubs.stanford.edu/pub/2006-10>.
2        Note that hierarchies are not taxonomies but simply forests of shallow      Kome, S H., (2006) Hierarchical Subject Relationships in
trees.                                                                            Folksonomies
FACETAG: INTEGRATING BOTTOM-UP AND TOP-DOWN CLASSIFICATION IN A SOCIAL TAGGING SYSTEM   5


   La Barre, K. (2006) The Use of Faceted Analytico-Synthetic Theory as
Revealed in the Practice of Website Construction and Design,
<http://leep.lis.uiuc.edu/publish/klabarre/facetstudy.html>.
   Morville, P., (2005) Ambient Findability, O’Reilly.
    Quintarelli, E., (2005) Folksonomies: Power to the People, Proceedings of
1' ISKO Italy-UniMIB meeting (Milano, 24 giugno 2005)
<http://www.iskoi.org/doc/folksonomies.htm>.
  Slavic, A., (2002) FATKS: Facet Analytical Theory in managing
Knowledge Structures for humanities, <http://www.ucl.ac.uk/fatks>.
   Travis, W., (2006) The strict faceted classification model
<http://facetmap.com/pub/strict_faceted_classification.pdf>.
   Yee, K.P., Swearingen, K., Li, K., and Hearst, M., (2003) Faceted
Metadata for image searching and browsing, Proceeding of CHI 2003
<http://flamenco.berkeley.edu/papers/flamenco-chi03.pdf> .
FACETAG: INTEGRATING BOTTOM-UP AND TOP-DOWN CLASSIFICATION IN A SOCIAL TAGGING SYSTEM   6


                   VIII.     SCREENSHOT


Figure 1: The system interface.
FACETAG: INTEGRATING BOTTOM-UP AND TOP-DOWN CLASSIFICATION IN A SOCIAL TAGGING SYSTEM              7


Figure 2: A zooming sample, choosing Resource type > blog + Subjects > Information architecture.

</pre>