=Paper=
{{Paper
|id=Vol-512/paper-9
|storemode=property
|title=Watching Through the Web: Building Personal Activity and Context-Aware Interfaces using Web Activity Streams
|pdfUrl=https://ceur-ws.org/Vol-512/paper09.pdf
|volume=Vol-512
|dblpUrl=https://dblp.org/rec/conf/sigir/KleekKs09
}}
==Watching Through the Web: Building Personal Activity and Context-Aware Interfaces using Web Activity Streams==
Watching Through the Web: Building Personal Activity
and Context-Aware Interfaces using Web Activity Streams
Max Van Kleek David R. Karger mc schraefel
MIT CSAIL MIT CSAIL School of Electronics and
32 Vassar Street 32 Vassar Street Computer Science
Cambridge, MA 02139 Cambridge, MA 02139 University of Southampton
USA USA SO17 1BJ, United Kingdom
+1-617-669-3864 +1 (617) 258-6167
mc+sigir@ecs.soton.ac.uk
emax@csail.mit.edu karger@csail.mit.edu
ABSTRACT model derived from activity logs to enable context and activity-
This paper proposes the use of the increasing numbers of Web- sensitive reminding.
based user activity and personal information sources to enable the 2. User activity monitoring using the Web
creation of more personal, adaptive, and activity-sensitive This excitement surrounding social sharing on “Web 2.0” has
information tools. We describe our initial steps at investigating stimulated the growth of an immense number and variety of "life-
this idea, including challenges surrounding integrating tracking" web sites that are making the chronicling everyday life
information from heterogeneous web data sources. This paper activities into a popular pastime. Several of these sites have
contributes an implementation of an in-browser framework called created applications to enable the automatic capture and
PRUNE that derives an internal world model consisting of an publishing of activity data sensed via the user's own personal
entity database and event chronology based on heterogeneous devices, such as their laptop, desktop, or mobile phone. Examples
RSS/ATOM feeds, Web APIs and other web-based data sources. include Google Latitude1, which senses the user's location using
Finally, we apply this model in an application called Notes that Wi-Fi, GPS and cell phone towers, Rescue Time2, Slife3 and
Float, that automatically learns associations between notes and a Wakoopa4, which track a users' application usage, and the
user's other activities to enable context-aware implicit reminding. audioscrobbler5 from Last.fm, which tracks a user's music
listening activity. Other sites such as fitbit6 and Nike, sell
Keywords hardware devices that capture and publish user activity to their
User modeling, life-logging, personalization, and
respective sites, letting users visualize and track various metrics.
personal information management
The result of the introduction of these sites and their
1. Introduction accompanying data capture tools is that hundreds of thousands of
The wealth of instantaneous information brought to us by the individuals have started broadcasting minute-by-minute updates
Web, e-mail, mobile phones, social networking web sites and of their daily life activities to the web. While the primary
ubiquitous network access has begun to dramatically change how intended use of these data is for letting people compare their lives
we manage our everyday work and leisure activities. In with others, most of these services offer the data back to users via
particular, the sheer volume of information has exceeded our Web APIs and syndication feeds (RSS/ATOM), turning these
ability to consume it, while at the same time our new services into potential sources of data for adaptive and context
responsibilities demand that we stay on top of it -- to keep abreast aware-enabled applications.
of the status of our family, friends, colleagues, field, economic Compared to directly sensing user activity, there are a number of
conditions, financial market, and so on. These heightened drawbacks to using third party life-tracking sites. First, the fidelity
demands on our ability to process, find, and filter information and accuracy of user activity data acquired from the web is often
prescribe the need for better personal information tools that lower, and is made available with substantially higher latency than
expand our ability to pay attention to, and act upon, the vast if directly captured. In fact, we have witnessed a number of the
quantity of information arriving for us and that we have collected sources seemingly deliberately degrading the quality of the data
in our personal information repositories. returned by their APIs such as by omitting certain properties or
Our goal, in our research, has been to apply personal information throttling query/update rates. Last.fm, for example, omits the "end
to the management of personal information itself; specifically, to time" of a played song, thus making it impossible to know the
design personal information management tools that when supplied duration that the individual listened to a particular track.
with information pertaining to its user's ongoing activities, tasks, Furthermore, the very fact that such volumes of high-fidelity
situations and preferences, can proactively take appropriate action
on the user's behalf.
1
In the remaining sections of this paper, we describe a framework See http://latitude.google.com
2
for longitudinal activity monitoring using the web, and a simple See http://www.rescuetime.com
3
prototype personal information management tool that uses a See http://www.slifelabs.com
4
See http://www.wakoopa.com
5
Copyright is held by the author/owner(s). See http://www.audioscrobbler.net, associated with http://last.fm
6
SIGIR'09, July 19-23, 2009, Boston, USA. See http://www.fitbit.com
personal activity information are being automatically transmitted sequences of events for building temporal models and analyzing
to random web services (where they are aggregated and kept correlations between states and activities.
indefinitely) should signal potential privacy concerns. New data sources can be added to enhance PRUNE's model. If the
However, despite these disadvantages, we believe that the Web is new data source uses the same schema as another site or source
a convenient source of a tremendous quantity of rich data about PRUNE uses already, it will be able to use data from the site
users that would have otherwise been able to obtain. For directly. If not, the user may have to build an import filter, a short
example, data aggregated from mobile phones, such as the user's piece of Javascript that maps incoming fields to create/update
call history, and a user's text messages sent and received, can operations on entities or events in the model. A tutorial on
easily be obtained via SkyDeck.com. Similarly, an individual's building such import filters makes it easy for novice programmers
spending history, broken down by time of day and merchant, is to construct such filters, and filters can be easily published and
available via Mint.com. Soon, each individual's health and made available for use by other users.
medical history will be readily available via services such as With respect to predictive modeling, PRUNE's current modeling
Google Health. Furthermore, as these services were designed to mechanisms are rudimentary, consisting of learning probabilities
facilitate sharing of this information with others, incorporating over event type states, and entity identity resolution. For the
and obtaining information about friends' activities becomes former, PRUNE supports online or batch learning of either full
straightforward. As the number and variety of applications that discrete probability distributions of events, or simple pair-wise co-
use data provided by these sites increases, we believe that these occurrences (which can be used for Naive-Bayes style inference).
sites will be pressured to improve the quality of the data they These probabilities are either learned from event counts or event
make available via their APIs. time durations corresponding to how long a user or entity assumes
2.1 Modeling from heterogeneous data a given state. With respect to entity reference resolution, PRUNE
While the web makes accessing the data itself convenient, assumes that every entity (such as a person, place or resource) at
building personalized applications using this data, particularly least one inverse functional property (which can be used as a
from multiple sites or sources requires addressing several unique key for merging data about entities from heterogeneous
challenges. First, despite standardized serialization formats (such sources), and at least one familiar name. Familiar names may not
as RSS/ATOM feeds, REST/JSON APIs), web sites typically necessarily be unique, and thus can only be used to retrieve
publish data using schemas of dissimilar structure. For example, entities, not modify them. This facility is used to identify
audioscrobbler RSS feeds have song and artist fields merged into mentions of people, places and things in interactions with users.
a single field called "Title", while most other music-related APIs 3. Notes that Float: Anticipating information
separate these out. Thus, in order for data from heterogeneous
sources on the web to be effectively compared and combined, needs using heterogeneous activity models
these differences and inconsistencies need to be dealt with. While the recent rise in popularity of personal, lightweight note-
Since it is undesirable to have to deal with the complexities of taking and scrap-booking tools have improved many individual's
individual sources at the application level, we built a lightweight note capture frequency and volume, the abundance of the resulting
integration framework (called PRUNE7) to specifically handle this notes can make effectively using and accessing particular notes
integration process. Based on data retrieved from external difficult: in order for a particular note to be useful, the user must
sources, PRUNE derives a simple world model that applications remember they took it (to make the effort of looking for it), or
can query and explore directly. Having an intermediate model s/he must serendipitously rediscover it in her collection. As one's
collapses the problem of schema alignment from an O(n2) note collection grows, the likelihood of forgetting increases, while
pairwise alignment problem to an O(n) alignment -- between the likelihood of serendipitous discovery diminishes due to
external schemas and PRUNE's world model. decreased visibility.
PRUNE's world model consists of two databases containing To address this problem, we have designed a system called
entities and events, respectively. Entities represent people, places, "Notes that Float" (NTF) that proactively anticipates when a note
documents, events, and other "things" represented by various web might be needed based on its contents and previous access
data sources. Information about person entities are currently patterns. When NTF detects that a note might be useful in a
obtained from open social networking sites or web-based PIM particular new situation, it actively raises its visual salience by
tools such as Gmail contacts. Similarly, information about events popping the note to the top of the user's list of notes. NTF was
can be acquired directly from a localizer, a gazetteer service, or built on top of List-it [8], our simple personal note-taking tool for
event descriptions (which contain location descriptors). Relations Firefox, and relies on PRUNE for observations of user activity.
between entities are represented by named properties on these
entities. Events, on the other hand consist of time-based
3.1 Note content features (Dates and times)
observations of the dynamic states or activities of those entities. Although we are currently expanding NTF to analyze other
Events are 5-tuples (start and time, event type, entity and content features (particularly entity references and note types), we
state/value) representing the duration that the particular entity started with extracting date and time expressions for two reasons.
engaged in or assumed the particular value. Events are kept in an First, they appeared prominently in a significant number of notes
ordered chronology, which allows applications to easily examine of our pre-study [1]. Second, these times often indicated when a
particular event occurred or task to be done was due, and thus
served as a useful indicator of times of future relevance. We
7
designed NTF's date-time extractor to a wide variety of ways of
PRUNE: PLUM Runtime Usually Not Exponential (PLUM = referring to time, including vague and relative descriptions, and
Personal Lifetime User Modeling, a previous, RDF-based life- constructed NTF's expression relevance function to represent how
tracking project, please see http://plum.csail.mit.edu) likely it was that a particular expression referred to a particular
moment in time. For example, the expression "tomorrow" yields that context dimension/activity type (e.g., "web page viewed")
a high likelihood of relevance for any calendar times that falls assumed a particular value (e.g., "http://mit.edu") while a note was
within the next (wall clock) day after the expression was written. being accessed. This value is directly computed from the pair-
Although this function was hand-constructed, we are working to wise counts previously described by taking the ratio of counts for
replace it with one derived from a corpus such as TimeEx [7]. the particular value (e.g. viewing of "http://mit.edu" while
accessing a note) and the sum of the counts of all values for that
3.2 Correlating note use with activity/location activity type (e.g., viewing any web page while accessing the
Our pre-study suggested that individual notes tended to be edited particular note). In the third line above, we made a conditional
at particular times of day, and days of week, and while the user independence assumption of each context type given a particular
was looking at the same web pages as when the note was edited note. While this is an obvious simplification of actual fact, this is
previously. NTF is designed to identify and leverage correlations done to let the system use pair-wise affinities instead of full
(when present) between note-edits and any user activity or state, conditional probability tables (CPTs) which are space-inefficient
including time of day, physical location, weather, web page and expensive to marginalize, and forces NTF to fit a simpler
views, music listening activity or ongoing calendar activities, and model corresponding to a Naive-Bayes independence assumption.
use this towards ranking notes by relevance.
As described in the next section, the NTF UI allows users to select
NTF’s algorithm is simple: it listens to new events representing which event/activity types (Cc’s) are included in the calculation
observations of changes to entities and their activities. These above, as well as whether Tr(notei, now) is included. This lets the
events might consist of observations of a change in what web user have more control over the ranking process.
page or document the user is viewing, the room in which they are
sitting, or music to which they are listening. Then, whenever a
note is accessed, NTF tallies a count, for each activity and
situation dimension, of the particular activities, documents,
locations, or other entities were being performed, viewed, or
experienced at that particular time. These counts are then used
directly in the ranking process, described next.
Figure 2. List-it interface – Sidebar on the left, with re-search
bar, float by: bar, and notes with time information highlighted.
Figure 1. Learning note-activity relevance – To compute the On the bottom right shows the user's computed location.
relevance of a particular note (top row) to a particular activity or
context (location, web site, scheduled calendar event or music
3.4 User interface
listening activity), overlap counts are computed between the event Figure 2 shows the List-it note-taking tool embedded in the
and the other ongoing events at the same time (here, time is Firefox sidebar with the NTF extension installed. NTF introduces
illustrated as flowing along the x-axis). Extremely brief overlaps the small “float by” bar beneath the search tabs on the main UI,
are discounted. which is used to select floatation modes. Multiple modes may be
enabled simultaneously, resulting in these terms being included as
3.3 Ranking notes “givens” to the ranking algorithm previously described. When
The learned associations allow NTF to simply rank notes by the any of these buttons are enabled, NTF re-ranks all notes in List-it
posterior likelihood of the note given the user’s active context and every 30 seconds (adjustable), bringing notes that exceed a
included date/time expressions. Specifically, the posterior relevance threshold to the top of the list. To make these notes
relevance of each note is first calculated as follows: salient and to differentiate them from the user's other notes, it
"glows" floated notes with a white perimeter. When time-
expression ranking is enabled, detected date/time expressions are
also made to glow in yellow when the user mouses over them.
The intention is to give the user feedback about the clues the
system has used to rank the particular note in question. An
additional configuration page (not shown) allows the user to
Where P(notei) is used as shorthand to represent the prior configure PRUNE's data sources, including specifying their site-
probability that Note i is accessed, Tr(notei, now) is the maximum specific account usernames and passwords. Some data sources,
time relevance (computed by NTF's time expression evaluation such as our OIL localizer, require the user's system to have a WiFi
function) of all time expressions extracted from the note, and each card installed and, to "instruct" the system for training. Users can
P(Cc|notei) term in the final expression represents the probability teach OIL about places (such as the rooms in their house) by
clicking on a small widget in their status bar, and either typing a activity-sensitive predictive reminding not available in PIM
new place or selecting one they previously selected. This creates applications today. Achieving this context-adaptivity would have
a new location state and assigns the current Wi-Fi signature to it, been substantially more difficult to implement and maintain if we
so that it may be recognized on subsequent visits. had written the low-level sensing and instrumentation ourselves.
In the face of the obvious complexities of dealing with
3.5 INITIAL EVALUATION heterogeneous Web APIs, feeds in different formats and the like,
Ten existing List-it users volunteered to test an early alpha release we have found that distilling a simple, relational world model
of the NTF-enabled version of List-it for 5 days, in which only 3 greatly facilitates model construction and provides a useful
floating modes were available: By Time, By Place (physical abstraction to simplify application logic. Based on our initial
location) and By Site (website). Nine users successfully installed experiences, we believe that this approach to using diverse
the system (one user could not due to an unforeseen compatibility information sources on the Web to characterize the user's situation
issue with 64-bit Windows). Participants used By Time mode the and activity will foster the creation of new, more personal
most (26% of the time), followed by no ranking (24%), by Site applications and interfaces that can effectively adapt to
alone (14%), and by Place alone (12%). Combined modes were individuals and their dynamically changing needs8.
less popular. During the study duration, NTF re-ranked notes a
total of 73 times (across all users), recommending up to 10 notes Acknowledgements
per rank. We are planning a formal study and larger deployment This work is funded in part by the National Science Foundation,
after implementing a few features to enhance the usability and Nokia Research, WSRI, and a Royal Academy of Engineering
predictability of the system, as described next. Senior Research Fellowship. We would like to thank our
PLUM/PRUNE and List-it collaborators and student researchers,
3.6 Ongoing NTF Work including Michael Bernstein, Jamey Hicks, Greg Vargas, Katrina
The NTF work just described demonstrates our first steps at Panovich, Paul André, and Brennan Moore.
applying PRUNE to facilitate implicit contextual retrieval for
personal note collections. Implicit contextual retrieval, we believe 5. REFERENCES
is important in the future for helping individuals manage large [1] Bernstein, M., Van Kleek, M., Karger, D., schraefel, mc,
quantities of personal information, some of which they may have "Information Scraps: How and Why Information Eludes are
entirely forgotten about. Our initial trial, while small, ended with Personal Information Tools" ACM Trans. Info. Systems,
encouraging results; one participant said: “[Having] tried it I 26,4 (Sept 2008), 1-46.
decided that I liked it .. This could be the answer to an older man's [2] Budzik, J., and Hammond, K. Watson: Anticipating and
increasing info and fading memory problems.” Contextualizing Information Needs. In Proc. American
With respect to next steps, we are working to improve the NTF Society for Information Science and Technology 1999.
ranking algorithm and UI in several ways. The NTF ranking [3] Dumais, S., Cutrell, E., Cadiz, J., Jancke, G., Sarin, R., and
algorithm was our naive initial first shot at devising a method that Robbins, D. Stuff I’ve Seen: A System for Personal
was simultaneously principled and could take into account Information Retrieval and Re-Use. In Proc. SIGIR 2003,
heterogeneous activity, situational and content features of notes. ACM Press (2003).
One initial improvement will be to automate the selection of
context types/dimensions used in the ranking process; this might [4] Jones, E., Bruce, H., Klasnja, P, Jones, W. I Give Up! Five
simultaneously improve ranking performance and permit the Factors that Contribute to the Abandonment of Information
simplification of the UI to a single button ("ranking on/off"). To Management Strategies. In Proc. American Society for
do this, NTF could learn (e.g., using feature selection approaches) Information Science and Technology 2008
the dimensions of context that are most strongly correlated with [5] "Plazer" software from Plazes http://www.plazes.com.
use of particular notes. A note containing the username/password
[6] Rhodes, B.J. Margin Notes: Building a Contextually Aware
for a web site, for example, is likely to be correlated only with
Associative Memory. In Proc. IUI 2000.
web site viewing activity but not others. Second, to measure the
effectiveness of the ranking, we plan to add facilities that let users [7] Time Expression Recognition and Normalization.
easily give feedback about floated notes in various ways. This http://timex2.mitre.org
feedback will allow users to express nuances of “I don’t want to [8] Van Kleek, M., Bernstein, M., Panovich, K., Vargas, G.,
see this now” – differentiating, whether the recommendation was Karger, D., schraefel, mc. Examining Personal Information
a bad one (so that this feedback may be used to adjust the Keeping in a Lightweight Note-Taking Tool. In Proc. CHI
particular notes associations), or whether the user wants to dismiss 2009, ACM Press (2009).
the reminder until later for other reasons – such as in the case of
deliberately putting off a to-do item. Finally, we also want to [9] Van Kleek, M., André, P., Karger, D., schraefel, mc. Mixing
allow for greater transparency of learned associations, so that the reactive with the personal: Opportunities for end-user
users will be able to understand why particular notes were chosen programming in Personal information Management. To
and promoted by the algorithm. appear in EUP-WWW, End User Programming for the Web,
ACM Press, 2009.
4. Conclusion
In this paper we have described our initial work towards using
"Web 2.0" user activity information sources to observe user
activity and information access over time, and to apply this to the
construction of an implicit information reminding service. 8
Although in its early stages of development, our simple PRUNE and NTF are released under the MIT License and
application, NTF, supports a level of flexible, implicit context and available for download at http://plum.csail.mit.edu.