Social approach to context-aware retrieval

                                                                  Luca Vassena
                                                                University of Udine
                                                              via delle Scienze, 206
                                                                   Udine, Italy
                                                           vassena@dimi.uniud.it


ABSTRACT                                                                   data are exploited to capture the dynamic nature of the user
In this paper we present a general purpose solution to Web                 needs, of the information available, and of the relevance of
content perusal by means of mobile devices, named Social                   this information, typical of a mobile user in the real world.
Context-Aware Browser. This is a novel approach for the                    This approach is named Social Context-Aware Browser and
information access based on the users’ context, whose aim is               its novelty is threefold. First of all this is a new radical ap-
to retrieve what the user needs, even if she did not issue any             proach that aims at discovering “the query behind the con-
query. Our solution is built upon a social model that exploits             text”: to retrieve what the user needs, even if she did not
the collaborative efforts of the whole community of users                  issue any query [7]. Second this is not a domain depen-
to control and manage contextual knowledge, related both                   dent application, but a new generic way of interaction and
to situations and resources. This paper presents a general                 information access, able to adapt to every domain. Third,
survey of our solution, describing the idea and presenting an              as current models for context-awareness are too limited for
implementation approach.                                                   very general applications, this approach brings new models
                                                                           built upon the social dynamics at the basis of Web 2.0.
                                                                              This paper is structured as follows. We first briefly sur-
Categories and Subject Descriptors                                         vey related work (Section 2), presenting the Context-Aware
H.3.3 [Information Storage and Retrieval]: Information                     Retrieval field and introducing the main ideas behind Web
Search and Retrieval                                                       2.0. We then describe our solution (Section 3), presenting
                                                                           a general survey, the main ideas, and an implementation
Keywords                                                                   approach. In Section 4 we present a brief discussion and fi-
                                                                           nally we draw some conclusions and we present future work
Context-aware retrieval, mobile search, social, folksonomy,
                                                                           (Section 5).
Web 2.0

1.   INTRODUCTION                                                          2.    RELATED WORK
   Context-aware computing is a computational paradigm
that has faced a rapid growth in the last few years, espe-                 2.1    Context-Aware Retrieval
cially in the field of mobile devices. A key-role in this new                 Context-Aware Retrieval (CAR) is an extension of clas-
approach is played by the notion of context, that is roughly               sical Information Retrieval (IR) that incorporates the con-
described as the situation the user is in. This concept en-                textual information into the retrieval process, with the aim
closes important information that could be used to affect the              of delivering information to the users that is relevant within
capabilities of mobile devices, adapting them to the user’s                their current context [4]. CAR systems are concerned with
needs. In particular, contextual data can be used to pre-                  the acquisition of context, its understanding, and the appli-
dict the user needs and to seek and retrieve information,                  cation of behaviour based on the recognized context [11].
thereby reducing the complexity of the user-device interac-                   Typical CAR applications present the following character-
tion and providing the right information in the right place                istics [4]: a mobile user, i.e., a user whose context is chang-
at the right time. From this point of view, because of the                 ing; interactive or automatic actions, if there is no need to
huge amount of contextual information and its heterogene-                  consult the user; time dependency, since the context may
ity and uncertainty, the mobile and context-aware comput-                  change; appropriateness and safety to disturb the user. Al-
ing environments represent a new challenge for Information                 though CAR applications can be both interactive and proac-
Retrieval (IR). The combination of IR and context-aware                    tive in their communication with the user, we concentrate
computing has been named context-aware retrieval [4].                      on the proactive aspects, since they are more relevant to
   These considerations guided us towards a new approach                   our proposal. Besides, we concentrate on the association
to Web contents production and fruition, where contextual                  between CAR and mobile application, as they can be con-
                                                                           sidered as the prime field for CAR [4].
                                                                              An example of CAR system is the Ubiquitous Web [5], a
                                                                           solution based on the spontaneous annotation by a commu-
                                                                           nity of users of objects, places, and other people with Web
Appears in the Proceedings of the 1st Italian Information Retrieval        accessible content and services. A more general system is
Workshop (IIR’10), January 27–28, 2010, Padova, Italy.                     represented by the MoBe framework [7]. In this applica-
http://ims.dei.unipd.it/websites/iir10/index.html
Copyright owned by the authors.
tion, a general inferential framework (based on ontologies        rative efforts of the community of users. The community, in
and Bayesian networks) combines the information coming            fact, is encouraged to define the contexts of interest, share,
from sensors to infer new and more abstract contexts (user        use and discuss them, associate context to content (web
activities, needs, etc.), that are used to retrieve and execute   pages, applications, etc.), to have a dynamic and more user-
the most relevant applications.                                   tailored context representation and to enhance the process
                                                                  of retrieval based on users’ actual situation.
2.2    Web 2.0, the social web                                       In particular users can freely interact with resources and
   With Web 2.0 [9] and social software we represent all web-     can define that a resource is useful (or not adapt) to their
based services with “an architecture of participation”, that      current context, can associate resources to particular con-
is, an architecture featuring a high interaction level among      texts, can explicitly define the context their are in, and fi-
users and allowing users to generate, share, and take care        nally can browse resources relevant for their current context.
of the content. In the plenty of tools provided by Web
2.0, we are mainly focusing on social bookmarking and folk-       3.2     Model
sonomies.
   Social bookmarking is a method for organizing, search-         3.2.1    Context representation
ing, and managing documents of interest among users. In             We represent the context as a folksonomy. Each tag is
a social bookmarking system, users save links to documents        banally a keyword or string of text and represents a single
of interest in order to remember or share them with the           contextual value [8]. We divide the contextual tags into two
community. Social bookmarking is strictly related with the        categories:
concept of folksonomy, that is the practice of annotating
                                                                     • Concrete tags: represent the information obtained by
and categorizing content in a collaborative way, by means
                                                                       a set of sensors. These information can be read from
of informal tags. Folksonomies, that is a portmanteau of
                                                                       the surrounding environment through physical sensors
folk and taxonomy, allow users to easyly and informally de-
                                                                       (e.g., temperature sensor), or can be obtained by other
scrive documents and content. This represents a powerful
                                                                       software (e.g., calendar) through logical sensors. Con-
combination that has gained popularity as it allows a more
                                                                       crete tags that directly refers to sensors values are rep-
natural and simpler management of the knowledge. The use
                                                                       resented using the triple tags notation that are tags
of freely choosen categorizations and the collaborative as-
                                                                       that uses a particular syntax (namespace:predicate=value)
pect in fact allow also non-expert users to classify and find
                                                                       to define extra information.
information. Folksonomies and social bookmarking for ex-
                                                                       For example, geo:longitude=12.456 is tag for the ge-
ample are used in well-known Web 2.0 systems like Flickr1 ,
                                                                       ographical longitude coordinate whose value is 12.456.
Youtube2 , Del.icio.us3 , etc.
                                                                       Other concrete tags, can be automatically obatined by
   Folksonomies however are criticized because the lack of
                                                                       the sensed values (e.g. afternoon, summer, ...).
terminological control could lead to unreliable and inconsis-
tent results [3].                                                    • Abstract tags: represent the high level contextual in-
                                                                       formation that are freely associated by the users to
3.    SOCIAL CONTEXT-AWARE BROWSER                                     the concrete contexts, in order to detail their context
                                                                       description. Some examples are: home, shopping, etc.
3.1    Description                                                The difference between the two categories is faded since the
   The Social Context Aware Browser (sCAB for short) [12]         contexts cannot be unambiguously assigned to one or the
is a general purpose solution to Web content navigation by        other category. However this partition is helpful in order
means of context-aware mobile devices. It allows a “physical      to distinguish the low level information coming from sen-
browsing”: browsing the digital world based on the situa-         sors and the high level contextual information intoduced by
tions in the real world. The main idea behind sCAB is to          users.
empower a generic mobile device with a browser able to au-          The user context is a “cloud” composed by an undefined
tomatically and dynamically retrieve and load Web pages,          number of concrete and abstract tags (Figure 1).
services, and applications according to the user’s current
context.
   The sCAB acquires information related to the user and
the surrounding environment, by means of sensors installed
on the device or through external servers. This information,
combined with the user’s personal history and the commu-
nity behaviour, is exploited to infer the user’s current con-
text (and its likelihood). In the subsequent retrieval process,
a query is automatically built and sent to an external search
engine, in order to find the most suitable Web pages for the
sensed context and present them to the user.
   As current models for context-awareness are too limited
for very general applications like the sCAB, this approach
brings new social models for CAR that exploit the collabo-
1
  www.flickr.com
2
  www.youtube.com                                                            Figure 1: User’s current context.
3
  www.del.icio.us.com
3.2.2    Operations
  In the sCAB conceptual model [12] there are six main
operations. The first two are performed automatically and
continuosly by the system. With the inference operation
(Figure 2), starting from the concrete tags sensed by sensors,
the most relevant abstract tags are retrieved and become
part of the user’s context representation. Then with the
retrieval operation (Figure 2), starting from the set of all
the tags in the user’s current context, the most relevant
resources are retrieved. For example, starting from the GPS
coordinates, the system enhance the user’s context with the
abstract tags “walk out park dog”; then starting from all
the tags, the system retrieves resources relevant to the given
context, as Web pages that teaches how to train dogs, etc.


                                                                   Figure 3: Definition and annotation operations.


                                                                 weight the operations she performs, while the scores of con-
                                                                 textual tags and resources define their quality and relevance.
                                                                 If a resource annotated with contextual information is never
                                                                 used in that context, the related score decreases and more
                                                                 relevant resources will stand out.

                                                                 3.3     Implementation approach
                                                                    Concrete and abstract tags, and resources are the main
                                                                 elements in our implementation model. Concrete tags, as
    Figure 2: Inference and retrieval operations.                output of sensors, are exploited to retrieve the most relevant
                                                                 abstract tags, and in the same way all the tags are exploited
   The other four operations are strictly related to the user    to retrieve the most relevant resources.
interaction: the main two are definition and annotation             In the following sections we show an implementation pro-
(Figure 3). The definition is used to manage the contextual      posal and how the different operations in the model have
information and it is performed when a user directly define      effect on the system, from a low level point of view.
her context, or when she provides contextual tags during the
annotation of a resource. In particular, this operations man-    3.3.1    Indexes
ages the associations between concrete and abstract tags,           We exploit two indexes. In the first one, called contexts
and the strength of their relationships. The annotation on       index, abstract tags are indexed over concrete tags, while in
the contrary is used to manage the association between con-      the second one, called resources index, resources are indexed
textual tags and resources and it is performed when the users    over the set of all tags (both concrete and abstract). The
link resources to particular contexts. We can imagine a user     proposed approach is community based, thus the indexes
at a park with her dog: she wants to associate to her context    and the inferential system are managed by remote servers
a particular Web page teaching dog training. For this reason     and not stored on the mobile device. Since the approach is
she bookmarks that resource with the contextual tags “out        similar for both the indexes, we are going to show just the
dog park sunny train”. Doing so, first the added abstract        first one.
tags are related to the sensed concrete tags and for all the        The contexts index is a matrix that describes the fre-
users with a similar concrete tag cloud, these abstract tags     quency of abstract tags over the concrete ones. Each column
(or part of them) can become part of the their context rep-      corresponds to a concrete tag, and each row corresponds to
resentation. Second, that particular Web page is enhanced        an abstract tag. Each entry in the matrix has three values
with all the tags, and it will be automatically proposed to      (Figure 4):
users every time they will be in a similar context.
   As the users are the main actors in the process of context       • Uij : represents the user that has associated the ab-
definition and resource annotation, problems related to the           stract tag i to the concrete tag j first;
quality of context and resources are likely to appear. To           • Sij : a score that defines how relevant the abstract tag
cope with this problem we propose the adoption of a social            i is for the concrete tag j. This value is in the interval
evaluation/reputation mechanism. We exploit the ideas pre-            [0, 1];
sented in [6]: every element in the model (users, contexts,
resources) has a score that increases or decreases based on         • σij : steadiness value that defines how steady is the
the community behavior. The score of each user is used to             association between the abstract tag i and the concrete
      tag j.                                                         values in the resources index with the annotation operation
                                                                     the approach is similar) :
                      c1   c2                   ...                     • σij (ti+1 ) = σij (ti ) + SUc (ti ) × β
                a1
                a2         (U22 , S22 , σ22 )                                   σij (ti ) × Sij (ti ) ± SUc (ti ) × β
                                                                        • v=
                ..                                                                           σij (ti+1 )
                 .                                                                        
                                                                                             v if v > 0
                                                                        • Sij (ti+1 ) =
           Figure 4: Contexts index example                                                  0 otherwise
                                                                     where ti represents a discrete time instant and ti+1 the sub-
  Intuitively, since not all the abstract tags can be related
                                                                     sequent time instant.
to all concrete tags, the proposed index will be a very sparse
                                                                        While the score is a value in the interval [0, 1], the steadi-
matrix. At the same time, because of the very high number
                                                                     ness is an always increasing value. The higher the steadiness
of both concrete and abstract tags, the index can assume
                                                                     of an association is, the more stable the association is, and
very huge dimensions. However a lot of research is being
                                                                     then the lesser effect each update operation will have. The
performed on indexes designing and analysis, also in the
                                                                     user’s score is exploited for the update of the values in the
CAR field [2]. The related discussion is out of the scope this
                                                                     index. It can both increase an association, or decrease it
work.
                                                                     (e.g. a user removes a tags from his context). The higher
3.3.2    Users’ score                                                the user’s score is, the more effective the update operation
                                                                     will be. This means that good users have more influence on
  In our approach two values are associated to each user
                                                                     the system than bad users. Finally, β is a parameter greater
and they define the goodness of the user in working with
                                                                     than 0 and it is used to weight the user score: operation per-
contextual information:
                                                                     formed explicitly by users (inclusion or removal of abstract
   • SUc : a score that defines how good the user is in asso-        tags) have more effect than implicit update performed au-
     ciating concrete tags to abstract tags;                         tomatically based on the interaction of the community with
                                                                     the resources.
   • SUr : a score that defines how good the user is in asso-
     ciating resources to contexts;                                  3.3.4    Inference and retrieval
As previously, we are concentrating only on the management             The inference and retrieval operations works respectively
of values related to concrete and abstract tags, since the           on the first and second index, but they are similar, thus in
approach is exactly the same working at the higher level of          the following we are explaining just the inference one.
tags and resources.                                                    The approach is the following:
   Every time a new relation between abstract and concrete
                                                                       1. starting from the concrete tags in input, we consider
tags is created with a definition (“filling a hole” in the index),
                                                                          only the set of abstract tags that have been associated
the user who performed the operation is associated to that
                                                                          at least with one of the concrete tags;
relation. Then on the basis of how the community inter-
acts with those contextual information, the user’s score will          2. for each abstract tag we compute a rank value, to de-
be update. It is calculated as follows: for each association              fine an order of relevance for the abstract tags;
among tags ij performed by the user U , SUc corresponds to
                               σij                                     3. in order to limit the number of retrievd tags, we re-
the mean of the products             × Sij , where σmax is the
                              σmax                                        trieve the abstract tags whose rank value is higher than
max steadiness value in the index.                                        the mean of all rank values.
   New associations have a low steadiness value, thus their
score, as their have not steadied yet, will have low influence          The rank value is computed following an adapted version
on the user’s score. Good associations will have high score          of the tf.idf weighting scheme. In particular for each consid-
and steadiness values, and they will reflect on high users’          ered abstract tag ai we have:
score. In the same way, low users’ scores are due to bad                         P
associations between contextual tags. Since Sij ∈ [0, 1], also           • A = cj σij × Sij , for each sensed concrete tag cj
SUc ∈ [0, 1].
                                                                                       |C|
   In this approach, for simplicity, only new associations be-          • B =                   , where |C| is the total number of
tween tags are considered for the computation of the users’                      |{c : ai ∈ c}|
score. An extension could consider all the existing associa-               sensed concrete tags, and |{c : ai ∈ c}| is the number
tions. In this way a user is “good” because she defines good               of concrete tags to which the abstract tag ai has been
new associations and because she exploits existing good as-                associated;
sociation.                                                              • rank value = Aα × Bβ, where α, β are parameters
3.3.3    Values update                                                    exploited to weight the different values.
   The proposed indexes are not static, but the values related         Some considerations can be drawn. First, more are the
to the association between concrete and abstract tags and            concrete tags in the current context to which an abstract
resources are continuosly updated, based on the interaction          tag is associated, the higher will be its rank value. Second,
of users with resources in context.                                  abstract tags with high score and steadiness will have an
   With every definition operation the values in the contexts        higher rank value. Third, abstract tags related to particular
index are updated according to the following system (for the         sets of concrete tags will have an higher rank value than
very general ones that are associated to an high number of         will proceed hand in hand. As first step we want to exploit
concrete tags (high frequency).                                    benchmarks to evaluate detailed implementation solutions,
   In addition, starting from this basic approach, we can en-      like, for example, different algorithms to assess the relevance
hance the rank value computation exploiting other informa-         of tags for situations and resources. After that, we plan to
tion. For example a reasonable idea is to weight the tags          apply an IIR evaluation methodology, involving users in a
based on their age in the user’s context representation, giv-      controlled environments, following the ideas presented [1,
ing more importance to the newest tag. In this we enhance          10]. Finally a broader user-centred evaluation will help us
the importance of new contexts.                                    to understand if the sCAB is effective in the real world.

4.   DISCUSSION                                                    Acknowledgements
   Although the conceptual ideas are clear, the implementa-        The authors acknowledge the financial support of the Ital-
tion approach we propose is in an initial stage of definition.     ian Ministry of Education, University and Research (MIUR)
We suggested a possible solution, but several are the ways         within the FIRB project number RBIN04M8S8, and the re-
to refine it and several are the algorithms to be exploited.       gion Friuli Venezia Giulia. This research has been partially
For this reason the evaluation hold an important role in our       supported by MoBe Ltd. (www.mobe.it), an academic spin-
work: since different alternative solution exist, it is impor-     off company specializing in software for mobile devices.
tant to evaluate them and compare their effectiveness.
   Even if the knowledge related to the whole community is         6.   REFERENCES
exploited to infer and refine the current context of single         [1] P. Borlund. The IIR evaluation model: a framework
users, the proposed model differentiates the personal from              for evaluation of interactive information retrieval
the community level, giving more importance to the first                systems. Information Research, 8(3):8–3, 2003.
one. For example if a user annotates a situation as “play”,         [2] A. Göker, S. Watt, H. I. Myrhaug, N. Whitehead,
she is considered to be in “play” context, even if most people          M. Yakici, R. Bierig, S. K. Nuti, and H. Cumming. An
annotate the same situation as “work”. On the contrary, if              ambient, personalised, and context-sensitive
a user is for the first time in a situation (e.g. location never        information system for mobile users. In EUSAI ’04:
visited), her context is refined just with the information from         Proceedings of the 2nd European Union symposium on
the community. Considering the previous example, as most                Ambient intelligence, pages 19–24. ACM, 2004.
people annotate the situation with “work”, the user is con-         [3] S. A. Golder and B. A. Huberman. The structure of
sidered to be in “work” context.                                        collaborative tagging systems. Arxiv preprint
   In the last case, the assumption performed by the system             cs.DL/0508082, 2005.
in order to provide the user with relevant resources could be       [4] G. J. F. Jones and P. J. Brown. Context-aware
wrong. However this is not a problem. Since we are working              retrieval for ubiquitous computing environments. In
with people, it will be hardly possible to provide results that         Mobile HCI Workshop on Mobile and Ubiquitous
totally satisfy each user, due the intrinsic difference of views        Information Access, volume 2954, pages 227–243.
and needs in a community. Rather our solution aims at and               Springer LNCS, 2004.
averagely good behavior.                                            [5] D. Lopez de Ipiña, J. I. Vazquez, and J. Abaitua. A
   Talking about the indexes, we have seen how the related              context-aware mobile mash-up plaftorm for ubiquitous
information are changed dynamically based on community                  web. In Proc. of 3rd IET Intl. Conf. on Intelligent
interaction. However this is not the only possible approach.            Environments, pages 116–123, 2007.
We can imagine complementary approaches that can sup-               [6] S. Mizzaro. Quality control in scholarly publishing: A
port the community statistical one. For example, we could               new proposal. J. of the Am. Soc. for Information
use some geographic gazetteer for associating geonames to               Science and Technology, 54(11):989–1005, 2003.
geographic coordinates provided from the concrete tags, so          [7] S. Mizzaro, E. Nazzi, and L. Vassena. Retrieval of
as to reinforce the rank of associated abstract tags that con-          context-aware applications on mobile devices: how to
tain the same geographic names or names of close locali-                evaluate? In Proc. of Information Interaction in
ties. The geonames could be useful also for retrieving more             Context (IIiX ’08), pages 65–71, 2008.
relevant resources, those containing the geonames ore close         [8] S. Mizzaro, E. Nazzi, and L. Vassena. Collaborative
geonames.                                                               annotation for context-aware retrieval. In ESAIR ’09:
                                                                        Proceedings of the WSDM ’09 Workshop on Exploiting
5.   CONCLUSIONS                                                        Semantic Annotations in Information Retrieval, pages
  In this paper we have presented the Social Context-Aware              42–45. ACM, 2009.
Browser, a general purpose solution to Web content perusal          [9] T. O’Reilly. What is web 2.0, design patterns and
by means of mobile devices. The sCAB is a novel approach                business models for the next generation of software,
for the information access based on context, where the com-             2005.
munity of users is called to manage the contextual knowl-          [10] D. Petrelli. On the role of user-centred evaluation in
edge, both related to situations and resources, through col-            the advancement of interactive information retrieval.
laboration and participation. In particular we presented a              Inf. Process. Manage., 44(1):22–38, 2008.
general survey, the main ideas, and an implementation ap-          [11] A. Schmidt. Ubiquitous Computing - Computing in
proach.                                                                 Context. PhD thesis, Lancaster University, 2003.
  As future work we aim at implementing a prototype of the         [12] L. Vassena. Context-aware retrieval going social. In
proposed system, and, in particular, we suggest a multistage            3rd Symposium on Future Directions in Information
approach, where implementation and evaluation processes                 Access (FDIA)., 2009.