=Paper= {{Paper |id=Vol-512/paper-9 |storemode=property |title=Watching Through the Web: Building Personal Activity and Context-Aware Interfaces using Web Activity Streams |pdfUrl=https://ceur-ws.org/Vol-512/paper09.pdf |volume=Vol-512 |dblpUrl=https://dblp.org/rec/conf/sigir/KleekKs09 }} ==Watching Through the Web: Building Personal Activity and Context-Aware Interfaces using Web Activity Streams== https://ceur-ws.org/Vol-512/paper09.pdf
 Watching Through the Web: Building Personal Activity
and Context-Aware Interfaces using Web Activity Streams
            Max Van Kleek                                     David R. Karger                              mc schraefel
              MIT CSAIL                                       MIT CSAIL                             School of Electronics and
           32 Vassar Street                                32 Vassar Street                            Computer Science
         Cambridge, MA 02139                             Cambridge, MA 02139                       University of Southampton
                USA                                             USA                                SO17 1BJ, United Kingdom
           +1-617-669-3864                                +1 (617) 258-6167
                                                                                                mc+sigir@ecs.soton.ac.uk
        emax@csail.mit.edu                             karger@csail.mit.edu


ABSTRACT                                                               model derived from activity logs to enable context and activity-
This paper proposes the use of the increasing numbers of Web-          sensitive reminding.
based user activity and personal information sources to enable the     2. User activity monitoring using the Web
creation of more personal, adaptive, and activity-sensitive            This excitement surrounding social sharing on “Web 2.0” has
information tools. We describe our initial steps at investigating      stimulated the growth of an immense number and variety of "life-
this idea, including challenges surrounding integrating                tracking" web sites that are making the chronicling everyday life
information from heterogeneous web data sources. This paper            activities into a popular pastime. Several of these sites have
contributes an implementation of an in-browser framework called        created applications to enable the automatic capture and
PRUNE that derives an internal world model consisting of an            publishing of activity data sensed via the user's own personal
entity database and event chronology based on heterogeneous            devices, such as their laptop, desktop, or mobile phone. Examples
RSS/ATOM feeds, Web APIs and other web-based data sources.             include Google Latitude1, which senses the user's location using
Finally, we apply this model in an application called Notes that       Wi-Fi, GPS and cell phone towers, Rescue Time2, Slife3 and
Float, that automatically learns associations between notes and a      Wakoopa4, which track a users' application usage, and the
user's other activities to enable context-aware implicit reminding.    audioscrobbler5 from Last.fm, which tracks a user's music
                                                                       listening activity. Other sites such as fitbit6 and Nike, sell
Keywords                                                               hardware devices that capture and publish user activity to their
User      modeling,    life-logging,       personalization,      and
                                                                       respective sites, letting users visualize and track various metrics.
personal information management
                                                                       The result of the introduction of these sites and their
1. Introduction                                                        accompanying data capture tools is that hundreds of thousands of
The wealth of instantaneous information brought to us by the           individuals have started broadcasting minute-by-minute updates
Web, e-mail, mobile phones, social networking web sites and            of their daily life activities to the web. While the primary
ubiquitous network access has begun to dramatically change how         intended use of these data is for letting people compare their lives
we manage our everyday work and leisure activities. In                 with others, most of these services offer the data back to users via
particular, the sheer volume of information has exceeded our           Web APIs and syndication feeds (RSS/ATOM), turning these
ability to consume it, while at the same time our new                  services into potential sources of data for adaptive and context
responsibilities demand that we stay on top of it -- to keep abreast   aware-enabled applications.
of the status of our family, friends, colleagues, field, economic      Compared to directly sensing user activity, there are a number of
conditions, financial market, and so on. These heightened              drawbacks to using third party life-tracking sites. First, the fidelity
demands on our ability to process, find, and filter information        and accuracy of user activity data acquired from the web is often
prescribe the need for better personal information tools that          lower, and is made available with substantially higher latency than
expand our ability to pay attention to, and act upon, the vast         if directly captured. In fact, we have witnessed a number of the
quantity of information arriving for us and that we have collected     sources seemingly deliberately degrading the quality of the data
in our personal information repositories.                              returned by their APIs such as by omitting certain properties or
Our goal, in our research, has been to apply personal information      throttling query/update rates. Last.fm, for example, omits the "end
to the management of personal information itself; specifically, to     time" of a played song, thus making it impossible to know the
design personal information management tools that when supplied        duration that the individual listened to a particular track.
with information pertaining to its user's ongoing activities, tasks,   Furthermore, the very fact that such volumes of high-fidelity
situations and preferences, can proactively take appropriate action
on the user's behalf.
                                                                       1
In the remaining sections of this paper, we describe a framework         See http://latitude.google.com
                                                                       2
for longitudinal activity monitoring using the web, and a simple         See http://www.rescuetime.com
                                                                       3
prototype personal information management tool that uses a               See http://www.slifelabs.com
                                                                       4
                                                                         See http://www.wakoopa.com
                                                                       5
Copyright is held by the author/owner(s).                                See http://www.audioscrobbler.net, associated with http://last.fm
                                                                       6
SIGIR'09, July 19-23, 2009, Boston, USA.                                 See http://www.fitbit.com
personal activity information are being automatically transmitted       sequences of events for building temporal models and analyzing
to random web services (where they are aggregated and kept              correlations between states and activities.
indefinitely) should signal potential privacy concerns.                 New data sources can be added to enhance PRUNE's model. If the
However, despite these disadvantages, we believe that the Web is        new data source uses the same schema as another site or source
a convenient source of a tremendous quantity of rich data about         PRUNE uses already, it will be able to use data from the site
users that would have otherwise been able to obtain. For                directly. If not, the user may have to build an import filter, a short
example, data aggregated from mobile phones, such as the user's         piece of Javascript that maps incoming fields to create/update
call history, and a user's text messages sent and received, can         operations on entities or events in the model. A tutorial on
easily be obtained via SkyDeck.com. Similarly, an individual's          building such import filters makes it easy for novice programmers
spending history, broken down by time of day and merchant, is           to construct such filters, and filters can be easily published and
available via Mint.com. Soon, each individual's health and              made available for use by other users.
medical history will be readily available via services such as          With respect to predictive modeling, PRUNE's current modeling
Google Health. Furthermore, as these services were designed to          mechanisms are rudimentary, consisting of learning probabilities
facilitate sharing of this information with others, incorporating       over event type states, and entity identity resolution. For the
and obtaining information about friends' activities becomes             former, PRUNE supports online or batch learning of either full
straightforward. As the number and variety of applications that         discrete probability distributions of events, or simple pair-wise co-
use data provided by these sites increases, we believe that these       occurrences (which can be used for Naive-Bayes style inference).
sites will be pressured to improve the quality of the data they         These probabilities are either learned from event counts or event
make available via their APIs.                                          time durations corresponding to how long a user or entity assumes
2.1 Modeling from heterogeneous data                                    a given state. With respect to entity reference resolution, PRUNE
While the web makes accessing the data itself convenient,               assumes that every entity (such as a person, place or resource) at
building personalized applications using this data, particularly        least one inverse functional property (which can be used as a
from multiple sites or sources requires addressing several              unique key for merging data about entities from heterogeneous
challenges. First, despite standardized serialization formats (such     sources), and at least one familiar name. Familiar names may not
as RSS/ATOM feeds, REST/JSON APIs), web sites typically                 necessarily be unique, and thus can only be used to retrieve
publish data using schemas of dissimilar structure. For example,        entities, not modify them. This facility is used to identify
audioscrobbler RSS feeds have song and artist fields merged into        mentions of people, places and things in interactions with users.
a single field called "Title", while most other music-related APIs      3. Notes that Float: Anticipating information
separate these out. Thus, in order for data from heterogeneous
sources on the web to be effectively compared and combined,                needs using heterogeneous activity models
these differences and inconsistencies need to be dealt with.            While the recent rise in popularity of personal, lightweight note-
Since it is undesirable to have to deal with the complexities of        taking and scrap-booking tools have improved many individual's
individual sources at the application level, we built a lightweight     note capture frequency and volume, the abundance of the resulting
integration framework (called PRUNE7) to specifically handle this       notes can make effectively using and accessing particular notes
integration process. Based on data retrieved from external              difficult: in order for a particular note to be useful, the user must
sources, PRUNE derives a simple world model that applications           remember they took it (to make the effort of looking for it), or
can query and explore directly. Having an intermediate model            s/he must serendipitously rediscover it in her collection. As one's
collapses the problem of schema alignment from an O(n2)                 note collection grows, the likelihood of forgetting increases, while
pairwise alignment problem to an O(n) alignment -- between              the likelihood of serendipitous discovery diminishes due to
external schemas and PRUNE's world model.                               decreased visibility.
PRUNE's world model consists of two databases containing                To address this problem, we have designed a system called
entities and events, respectively. Entities represent people, places,   "Notes that Float" (NTF) that proactively anticipates when a note
documents, events, and other "things" represented by various web        might be needed based on its contents and previous access
data sources. Information about person entities are currently           patterns. When NTF detects that a note might be useful in a
obtained from open social networking sites or web-based PIM             particular new situation, it actively raises its visual salience by
tools such as Gmail contacts. Similarly, information about events       popping the note to the top of the user's list of notes. NTF was
can be acquired directly from a localizer, a gazetteer service, or      built on top of List-it [8], our simple personal note-taking tool for
event descriptions (which contain location descriptors). Relations      Firefox, and relies on PRUNE for observations of user activity.
between entities are represented by named properties on these
entities. Events, on the other hand consist of time-based
                                                                        3.1 Note content features (Dates and times)
observations of the dynamic states or activities of those entities.     Although we are currently expanding NTF to analyze other
Events are 5-tuples (start and time, event type, entity and             content features (particularly entity references and note types), we
state/value) representing the duration that the particular entity       started with extracting date and time expressions for two reasons.
engaged in or assumed the particular value. Events are kept in an       First, they appeared prominently in a significant number of notes
ordered chronology, which allows applications to easily examine         of our pre-study [1]. Second, these times often indicated when a
                                                                        particular event occurred or task to be done was due, and thus
                                                                        served as a useful indicator of times of future relevance. We
7
                                                                        designed NTF's date-time extractor to a wide variety of ways of
    PRUNE: PLUM Runtime Usually Not Exponential (PLUM =                 referring to time, including vague and relative descriptions, and
    Personal Lifetime User Modeling, a previous, RDF-based life-        constructed NTF's expression relevance function to represent how
    tracking project, please see http://plum.csail.mit.edu)             likely it was that a particular expression referred to a particular
moment in time. For example, the expression "tomorrow" yields          that context dimension/activity type (e.g., "web page viewed")
a high likelihood of relevance for any calendar times that falls       assumed a particular value (e.g., "http://mit.edu") while a note was
within the next (wall clock) day after the expression was written.     being accessed. This value is directly computed from the pair-
Although this function was hand-constructed, we are working to         wise counts previously described by taking the ratio of counts for
replace it with one derived from a corpus such as TimeEx [7].          the particular value (e.g. viewing of "http://mit.edu" while
                                                                       accessing a note) and the sum of the counts of all values for that
3.2 Correlating note use with activity/location                        activity type (e.g., viewing any web page while accessing the
Our pre-study suggested that individual notes tended to be edited      particular note). In the third line above, we made a conditional
at particular times of day, and days of week, and while the user       independence assumption of each context type given a particular
was looking at the same web pages as when the note was edited          note. While this is an obvious simplification of actual fact, this is
previously. NTF is designed to identify and leverage correlations      done to let the system use pair-wise affinities instead of full
(when present) between note-edits and any user activity or state,      conditional probability tables (CPTs) which are space-inefficient
including time of day, physical location, weather, web page            and expensive to marginalize, and forces NTF to fit a simpler
views, music listening activity or ongoing calendar activities, and    model corresponding to a Naive-Bayes independence assumption.
use this towards ranking notes by relevance.
                                                                       As described in the next section, the NTF UI allows users to select
NTF’s algorithm is simple: it listens to new events representing       which event/activity types (Cc’s) are included in the calculation
observations of changes to entities and their activities. These        above, as well as whether Tr(notei, now) is included. This lets the
events might consist of observations of a change in what web           user have more control over the ranking process.
page or document the user is viewing, the room in which they are
sitting, or music to which they are listening. Then, whenever a
note is accessed, NTF tallies a count, for each activity and
situation dimension, of the particular activities, documents,
locations, or other entities were being performed, viewed, or
experienced at that particular time. These counts are then used
directly in the ranking process, described next.




                                                                       Figure 2. List-it interface – Sidebar on the left, with re-search
                                                                       bar, float by: bar, and notes with time information highlighted.
Figure 1. Learning note-activity relevance – To compute the            On the bottom right shows the user's computed location.
relevance of a particular note (top row) to a particular activity or
context (location, web site, scheduled calendar event or music
                                                                       3.4 User interface
listening activity), overlap counts are computed between the event     Figure 2 shows the List-it note-taking tool embedded in the
and the other ongoing events at the same time (here, time is           Firefox sidebar with the NTF extension installed. NTF introduces
illustrated as flowing along the x-axis). Extremely brief overlaps     the small “float by” bar beneath the search tabs on the main UI,
are discounted.                                                        which is used to select floatation modes. Multiple modes may be
                                                                       enabled simultaneously, resulting in these terms being included as
3.3 Ranking notes                                                      “givens” to the ranking algorithm previously described. When
The learned associations allow NTF to simply rank notes by the         any of these buttons are enabled, NTF re-ranks all notes in List-it
posterior likelihood of the note given the user’s active context and   every 30 seconds (adjustable), bringing notes that exceed a
included date/time expressions. Specifically, the posterior            relevance threshold to the top of the list. To make these notes
relevance of each note is first calculated as follows:                 salient and to differentiate them from the user's other notes, it
                                                                       "glows" floated notes with a white perimeter. When time-
                                                                       expression ranking is enabled, detected date/time expressions are
                                                                       also made to glow in yellow when the user mouses over them.
                                                                       The intention is to give the user feedback about the clues the
                                                                       system has used to rank the particular note in question. An
                                                                       additional configuration page (not shown) allows the user to
Where P(notei) is used as shorthand to represent the prior             configure PRUNE's data sources, including specifying their site-
probability that Note i is accessed, Tr(notei, now) is the maximum     specific account usernames and passwords. Some data sources,
time relevance (computed by NTF's time expression evaluation           such as our OIL localizer, require the user's system to have a WiFi
function) of all time expressions extracted from the note, and each    card installed and, to "instruct" the system for training. Users can
P(Cc|notei) term in the final expression represents the probability    teach OIL about places (such as the rooms in their house) by
clicking on a small widget in their status bar, and either typing a      activity-sensitive predictive reminding not available in PIM
new place or selecting one they previously selected. This creates        applications today. Achieving this context-adaptivity would have
a new location state and assigns the current Wi-Fi signature to it,      been substantially more difficult to implement and maintain if we
so that it may be recognized on subsequent visits.                       had written the low-level sensing and instrumentation ourselves.
                                                                         In the face of the obvious complexities of dealing with
3.5 INITIAL EVALUATION                                                   heterogeneous Web APIs, feeds in different formats and the like,
Ten existing List-it users volunteered to test an early alpha release    we have found that distilling a simple, relational world model
of the NTF-enabled version of List-it for 5 days, in which only 3        greatly facilitates model construction and provides a useful
floating modes were available: By Time, By Place (physical               abstraction to simplify application logic. Based on our initial
location) and By Site (website). Nine users successfully installed       experiences, we believe that this approach to using diverse
the system (one user could not due to an unforeseen compatibility        information sources on the Web to characterize the user's situation
issue with 64-bit Windows). Participants used By Time mode the           and activity will foster the creation of new, more personal
most (26% of the time), followed by no ranking (24%), by Site            applications and interfaces that can effectively adapt to
alone (14%), and by Place alone (12%). Combined modes were               individuals and their dynamically changing needs8.
less popular. During the study duration, NTF re-ranked notes a
total of 73 times (across all users), recommending up to 10 notes        Acknowledgements
per rank. We are planning a formal study and larger deployment           This work is funded in part by the National Science Foundation,
after implementing a few features to enhance the usability and           Nokia Research, WSRI, and a Royal Academy of Engineering
predictability of the system, as described next.                         Senior Research Fellowship. We would like to thank our
                                                                         PLUM/PRUNE and List-it collaborators and student researchers,
3.6 Ongoing NTF Work                                                     including Michael Bernstein, Jamey Hicks, Greg Vargas, Katrina
The NTF work just described demonstrates our first steps at              Panovich, Paul André, and Brennan Moore.
applying PRUNE to facilitate implicit contextual retrieval for
personal note collections. Implicit contextual retrieval, we believe     5. REFERENCES
is important in the future for helping individuals manage large          [1] Bernstein, M., Van Kleek, M., Karger, D., schraefel, mc,
quantities of personal information, some of which they may have              "Information Scraps: How and Why Information Eludes are
entirely forgotten about. Our initial trial, while small, ended with         Personal Information Tools" ACM Trans. Info. Systems,
encouraging results; one participant said: “[Having] tried it I              26,4 (Sept 2008), 1-46.
decided that I liked it .. This could be the answer to an older man's    [2] Budzik, J., and Hammond, K. Watson: Anticipating and
increasing info and fading memory problems.”                                 Contextualizing Information Needs. In Proc. American
With respect to next steps, we are working to improve the NTF                Society for Information Science and Technology 1999.
ranking algorithm and UI in several ways. The NTF ranking                [3] Dumais, S., Cutrell, E., Cadiz, J., Jancke, G., Sarin, R., and
algorithm was our naive initial first shot at devising a method that         Robbins, D. Stuff I’ve Seen: A System for Personal
was simultaneously principled and could take into account                    Information Retrieval and Re-Use. In Proc. SIGIR 2003,
heterogeneous activity, situational and content features of notes.           ACM Press (2003).
One initial improvement will be to automate the selection of
context types/dimensions used in the ranking process; this might         [4] Jones, E., Bruce, H., Klasnja, P, Jones, W. I Give Up! Five
simultaneously improve ranking performance and permit the                    Factors that Contribute to the Abandonment of Information
simplification of the UI to a single button ("ranking on/off"). To           Management Strategies. In Proc. American Society for
do this, NTF could learn (e.g., using feature selection approaches)          Information Science and Technology 2008
the dimensions of context that are most strongly correlated with         [5] "Plazer" software from Plazes http://www.plazes.com.
use of particular notes. A note containing the username/password
                                                                         [6] Rhodes, B.J. Margin Notes: Building a Contextually Aware
for a web site, for example, is likely to be correlated only with
                                                                             Associative Memory. In Proc. IUI 2000.
web site viewing activity but not others. Second, to measure the
effectiveness of the ranking, we plan to add facilities that let users   [7] Time Expression Recognition and Normalization.
easily give feedback about floated notes in various ways. This               http://timex2.mitre.org
feedback will allow users to express nuances of “I don’t want to         [8] Van Kleek, M., Bernstein, M., Panovich, K., Vargas, G.,
see this now” – differentiating, whether the recommendation was              Karger, D., schraefel, mc. Examining Personal Information
a bad one (so that this feedback may be used to adjust the                   Keeping in a Lightweight Note-Taking Tool. In Proc. CHI
particular notes associations), or whether the user wants to dismiss         2009, ACM Press (2009).
the reminder until later for other reasons – such as in the case of
deliberately putting off a to-do item. Finally, we also want to          [9] Van Kleek, M., André, P., Karger, D., schraefel, mc. Mixing
allow for greater transparency of learned associations, so that              the reactive with the personal: Opportunities for end-user
users will be able to understand why particular notes were chosen            programming in Personal information Management. To
and promoted by the algorithm.                                               appear in EUP-WWW, End User Programming for the Web,
                                                                             ACM Press, 2009.
4. Conclusion
In this paper we have described our initial work towards using
"Web 2.0" user activity information sources to observe user
activity and information access over time, and to apply this to the
construction of an implicit information reminding service.               8
Although in its early stages of development, our simple                    PRUNE and NTF are released under the MIT License and
application, NTF, supports a level of flexible, implicit context and     available for download at http://plum.csail.mit.edu.