<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>iMeMex: A Platform for Personal Dataspace Management∗</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marcos Antonio Vaz Salles</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jens-Peter Dittrich</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          ,
          <addr-line>Datbse, iLv</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Information Systems ETH Zurich 8092 Zurich</institution>
          ,
          <addr-line>Switzerland dbis.ethz.ch</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2003</year>
      </pub-date>
      <abstract>
        <p>Desktop computers provide thousands of different applications that query and store data in hundreds of thousands of files of different formats. Those files are stored in the local filesystem and also in a number of remote data sources, such as network shares or as attachments to emails. To handle this heterogeneous and distributed mix of personal information, data processing logic is re-invented inside each application. This results in an undesirable situation: most advanced data management functionality, such as complex queries, backup and recovery, versioning, provenance tracking, among others, is (at least partially) performed by end-users in tedious, manual tasks. To solve these problems we propose a software platform that brings physical and logical data independence to the desktop, freeing users from low-level data management activities. Unlike current relational DBMSs, this platform unifies data from several independent personal data sources without imposing semantic schema integration. It manages the complex dataspace [12] of one's personal information. We attack three major research challenges in the building of that platform: (i) definition of a data model that allows the integration of information in distinct representations and locations, (ii) design of a new search&amp;query language over this data model along with algorithms for the efficient processing of complex queries, and (iii) formulation of an update model that enables soft durability guarantees, when compared to ACID properties, on data authored independently from the platform.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        In 1945, Bush [3] presented a vision of a personal information
management system named memex. That vision has deeply
influenced several advances in computing. Part of that vision led to the
development of the Personal Computer in the 1980’s. It also led
to the development of hypertext and the World Wide Web in the
1990’s. Since then, several projects have attempted to implement
other memex-like functionality [
        <xref ref-type="bibr" rid="ref14 ref2 ref3 ref9">13, 2, 4, 18</xref>
        ]. Further, personal
information management regained interest in the DB research
community [
        <xref ref-type="bibr" rid="ref11 ref5 ref6">15, 9, 8</xref>
        ]. Moreover, it was identified as an important topic
∗This work is partially supported by the Swiss National Science
Foundation (SNF) under contract 200021-112115.
c 2006 for the individual paper by the paper’ authors. Copying permitted
for private and scientific purposes. Re-publication of material on this page
requires permission by the copyright owners.
      </p>
      <sec id="sec-1-1">
        <title>Proceedings of the VLDB2006 Ph.D. Workshop</title>
      </sec>
      <sec id="sec-1-2">
        <title>Seoul, Rep of Korea, 2006</title>
        <p>
          in the Lowell Report [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], discussed in a VLDB panel [19],
considered in an NSF-sponsored workshop [
          <xref ref-type="bibr" rid="ref13">17</xref>
          ], debated in the SIGIR
2006 PIM workshop [22], and became topic of both SIGMOD 2005
keynotes [
          <xref ref-type="bibr" rid="ref2">2, 21</xref>
          ].
        </p>
        <p>
          In spite of these previous efforts, we argue that a satisfactory
solution has not yet been brought forward to the issues of physical and
logical data independence on the desktop. Physical data
independence relates to abstraction from the devices and formats in which
data is represented. This is clearly not achieved by the simple data
model of the current generation of file systems. Applications
develop specific solutions that directly handle protocols to access the
data (email, RSS/ATOM, network file system, etc) and also formats
in which data is stored (XML, LATEX, image and audio formats,
etc). This creates application-specific data silos in which data
management functionality, e.g., querying, updating, performing backup
and recovery operations, are absent or re-invented. Logical data
independence relates to the capability of defining views over the
logical data model in which data is represented. It is also only
partially achieved with current desktop systems, e.g. smart folders.
Personal Dataspaces. DBMS technology successfully resolved
the physical and logical data independence problem for highly
structured data, but not for the highly heterogeneous data mix present in
personal information. Indeed, Franklin et al. [
          <xref ref-type="bibr" rid="ref8">12</xref>
          ] argue that today
we rarely have a situation in which all the data that needs to be
managed can fit nicely into a relational DBMS. Rather, most of the
data will be authored independently from the DBMS and will not
be in its full control. Franklin et al. introduce the term dataspace
to describe this world of disparate, distributed and independently
authored unstructured, semi-structured and structured data.
        </p>
        <p>
          In this project we focus on personal dataspaces, that is the
total of all personal information pertaining to a given individual. In
contrast to the vision of [
          <xref ref-type="bibr" rid="ref8">12</xref>
          ], we propose one concrete Personal
Dataspace Management System (PDSMS) implementation, named
iMeMex (integrated memex). Unlike traditional information
integration approaches, a PDSMS does not require semantic data
integration before any data services are provided. Rather, a PDSMS
is a data co-existence approach in which tighter integration is
performed in “pay-as-you-go” fashion [
          <xref ref-type="bibr" rid="ref8">12</xref>
          ].
        </p>
        <p>Current Status. The ultimate goal of the dissertation is to build the
first publicly available PDSMS. The dissertation has so far one year
of development and a research plan has been drawn for the next
three years. In the first year of work, the Ph.D. student has helped
to set the vision and context for the iMeMex project. As a result of
this work, we have written a research proposal detailing the goals
and work breakdown for the whole project. This proposal has been
accepted by the Swiss National Science Foundation (SNF)[6] and
supports two Ph.D. positions for a period of three years.</p>
        <p>
          To evaluate our ideas, we have developed one first prototype of
iMeMex. It was demonstrated in [
          <xref ref-type="bibr" rid="ref5">8</xref>
          ] and provided a traditional file
system interface to explore arbitrary views over one’s personal
information. In parallel to the development of the prototype, we have
defined our data model for representing personal information: the
iMeMex Data Model (iDM). That model is presented in [7].
        </p>
        <p>The current, second, version of the iMeMex platform extends the
first prototype and incorporates the work on iDM. It offers a
unified view on a set of personal data sources and allows basic query
processing on that view. Our current implementation (Java 1.4)
contains about 215 classes and 22, 000 lines of code.</p>
        <p>Outline. We present related personal information management
approaches in Section 2. We proceed by discussing the research
challenges involved in building a PDSMS in Section 3. We then
describe in Section 4 some of our solutions to these challenges, which
consist of: (1) a unified data model for personal information
(Section 4.1); (2) a flexible query language that operates on this model,
along with techniques for efficient query processing (Section 4.2);
(3) an update model for a PDSMS which includes mechanisms for
recovery and versioning of all data present in a personal dataspace
(Section 4.3); and (4) the architecture of a PDSMS, which
integrates all of the previous contributions into a unified framework
(Section 4.4). Finally, we conclude in Section 5.
2.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>STATE OF THE ART</title>
      <p>
        As we approach an age in which each computer user will face
the challenge of managing her own personal terabyte, PIM research
has obtained renewed interest in a variety of areas, such as HCI, IR
and data management [
        <xref ref-type="bibr" rid="ref13">17</xref>
        ]. Due to space limitations, we only
comment on a few solutions in this section. Current operating systems
have been amended in the past years to include full-text search
appliances, such as Google Desktop, Apple Spotlight, and Phlat [
        <xref ref-type="bibr" rid="ref3">4</xref>
        ].
These systems offer an intuitive keyword search interface,
sometimes augmented by generic metadata (modification date, author,
etc). Their data models, however, are unable to represent structural
information inside documents. A PDSMS, in contrast, enriches
keyword and property search with advanced structural querying.
      </p>
      <p>
        Systems such as SEMEX [
        <xref ref-type="bibr" rid="ref6">9</xref>
        ] and Haystack [
        <xref ref-type="bibr" rid="ref14">18</xref>
        ] allow users to
browse by association. They employ an ETL cycle to extract
information from desktop data sources into a repository and
represent that information in a domain model (ontology). The domain
model is a high-level mediated schema over the personal
information sources. These systems focus on creating a queryable,
however non-updatable, view on the user’s personal information. In
contrast, a PDSMS offers support for not only advanced querying
and browsing but also for updating information in the underlying
personal dataspace whenever possible. In fact, all of the systems
above may be thought of as applications on top of a PDSMS.
      </p>
      <p>
        Other systems offer tools to ease the management of personal
data. Lifestreams [
        <xref ref-type="bibr" rid="ref9">13</xref>
        ] organizes all personal documents in a
timeline. In Placeless Documents [11], users may tag their documents
with active properties, such as “backup” or “replicate”, and the
appropriate actions will be carried out by the system. MyLifeBits [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
models each piece of information as resources and allows these
resources to be annotated and organized in collections. Microsoft
WinFS [23], now discontinued1, represented information in an item
data model which is a subset of the object-oriented data model
and offered a basic class library to represent data items commonly
found in user desktops. Like MyLifeBits, WinFS based storage of
items on a relational DBMS. All of these systems need full control
of the data to offer features such as backup&amp;recovery. In contrast,
1The downloadable beta as well as all other preliminary
information about WinFS were recently removed from its web-site [23].
a PDSMS enables data to be authored and updated independently
by the interfaces offered by the underlying data sources. Further, in
these systems, advanced PDSMS queries that bridge structural
information across the inside-outside file boundary are not available
(see Example 1).
3.
      </p>
    </sec>
    <sec id="sec-3">
      <title>RESEARCH CHALLENGES</title>
      <p>In the following, we discuss major research challenges that are
targeted by the Ph.D. on the iMeMex PDSMS.</p>
      <sec id="sec-3-1">
        <title>Challenge 1 (Representing Personal Information). A major re</title>
        <p>
          search challenge of managing personal information is dealing with
its heterogeneity. Heterogeneity relates to data models and formats
used to represent the information. It also relates to the data sources
in which that information is available and to the mechanisms
available for data delivery (push/pull). Let’s consider an example:
EXAMPLE 1 (INSIDE AND OUTSIDE FILES) Users organize their
workspaces in folder hierarchies and use applications to store
information inside files. Each file is an independent data cage in which
complex structural representations may be stored. Consider the
following query: “Show me all LATEX ‘Introduction’ sections
pertaining to project PIM that contain the phrase ‘Personal Information”’.
With current technology, this query cannot be issued in one single
request by the user as it has to bridge the inside-outside file
boundary. The user may only search the file system using simple system
tools like grep, find, or a keyword search engine. However, these
tools may return a large number of results which would have to
be examined manually to determine the final result. Even when a
matching file is encountered, then, for structured file formats like
Microsoft PowerPoint, the user typically has to conduct a second
search inside the file to find the desired information [
          <xref ref-type="bibr" rid="ref3">4</xref>
          ]. Moreover,
state-of-the-art operating systems do not support at all exploitation
of structured information inside the user’s documents. 2
        </p>
        <p>Ideally, we would like to have a common representation for all
personal information in different data models and sources. This
common representation (or view) would enable queries that ignore
how the data is stored or where it is located. In addition, we should
be able to construct that view without performing labor-intensive
semantic schema integration. Rather, we would like to perform
lightweight data model integration and leave expensive semantic
integration to be carried out in a “pay-as-you-go” fashion.
Challenge 2 (Querying Personal Information). Once we have an
integrated view on one’s personal dataspace, the next natural
challenge is how to query this view. Users have traditionally employed
browsing (i.e., neighborhood expansion) and keyword queries to
explore their data. Ideally, we would like to provide one single
search&amp;query language to analyze and modify all data in a
personal dataspace. This query language should allow impreciseness
in query formulation and also integrate ranking of query results.
Further, advanced functionality, such as branching expressions and
joins, should be available. Note that expressions written in this
language should be processed with interactive response times.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Challenge 3 (Updating Personal Information). Given a unified</title>
        <p>view of all personal information, another important challenge is
to provide means to update that personal information through that
unified view. Current desktop search engines (DSEs) are read
optimized systems. DSEs are able to detect updates made to the data
sources and to incorporate those updates into their index structures.
They have, however, two important drawbacks: (1) DSEs do not
allow applications to perform updates on the data sources through the
DSEs’ interfaces, and (2) DSEs do not offer any update guarantees
on the underlying data, such as durability (e.g., to allow recovery
of past images of the data).
rPojects</p>
        <p>IPM
LOAP</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>OUR APPROACH</title>
      <p>In this section, we provide more details on our previous and
ongoing work on the iMeMex PDSMS. First, we discuss our
solutions for each of the challenges described in the previous section.
We then conclude by presenting the iMeMex PDSMS architecture,
which serves as a framework to deploy the presented techniques.
4.1</p>
    </sec>
    <sec id="sec-5">
      <title>Representing Personal Information</title>
      <p>
        2006.tex”, for example, inside the subsection “The Problem”, there
is a reference to the section “Preliminaries” and vice versa. An
extended example of such occurences is provided in [7].
Why XML is not Enough. Ideally, files&amp;folders as well as the
structure inside files should be represented into the same logical
data model. One could try to employ XML technology to address
this challenge of representation heterogeneity. In fact, we followed
that approach in [
        <xref ref-type="bibr" rid="ref5">8</xref>
        ]. Unfortunately, XML is associated to both a
logical data model and a physical markup to represent this
logical model. This means that the manipulation of XML views is
coupled with serialization concerns.
      </p>
      <p>
        Recent work has identified
this gap, e.g. [
        <xref ref-type="bibr" rid="ref10 ref12">20, 16, 14</xref>
        ], and argues in favor of clearly separated
logical data models supporting more advanced features, e.g.
multiple hierarchies [
        <xref ref-type="bibr" rid="ref12">16</xref>
        ]. However, none of the existing approaches
is sufficient to naturally represent the complex, possibly infinite,
distributed and lazily computed information graph encountered in
a personal dataspace. Therefore, we have decided to represent all
personal information based on a novel, more powerful, logical data
model: the iMeMex Data Model (iDM).
      </p>
      <p>Resource View Graph. We briefly sketch a few characteristics of
iDM in this section; full details are provided elsewhere [7]. iDM
enables a logical representation of a personal dataspace, as shown
in Figure 1(b). The main features of iDM are:
• Resource Views: in iDM, all personal information is
represented by fine-grained resource views. A resource view is made
of components that express structured, semi-structured and
unstructured pieces of the underlying data. For instance, resource
views may represent nodes in a files&amp;folders hierarchy as well
as elements in an XML, LATEX or other office document. Other
than that, we use resource views to uniformly represent email
messages, email attachments, infinite data streams, relational
data, RSS/ATOM messages, bookmarks, query results, calls to
web services and many others [7]. The granularity at which
resource views are represented is determined by a set of plugin
components in our system architecture (see Section 4.4).
• Graphs: resource views in iDM are linked to each other
forming directed graph structures. In Figure 1(b), we show the
resource view graph that corresponds to the personal data in
Figure 1(a). In that graph, there is no inside-outside file boundary.
All structural elements (folders, sections, subsections, etc) are
represented in the same model and queries may address them
uniformly. Note that cycles may naturally arise in that graph (in
this example as a consequence of section cross referencing).
• Intensional Data: any given resource view or parts of a
resource view graph may be either materialized (i.e., extensional
data) or computed on demand as the result to a query or to a
remote web service invocation (i.e., intensional data [20]). This
is in sharp contrast to static data models such as XML.
• Stream Support: another important feature of our model is
the ability of resource views to contain finite as well as infinite
components. Infinite resource view components are used to
represent data streams (e.g., RSS, publish/subscribe) and
content streams (e.g., audio and video) in our model.</p>
      <p>In our approach, the notion of impreciseness is included in our
query language, briefly discussed in Section 4.2.</p>
      <p>Data Model Instantiations. A resource view is given by the
following four formal components:
name η
tuple τ
content χ
group γ</p>
      <sec id="sec-5-1">
        <title>Name of the resource view.</title>
      </sec>
      <sec id="sec-5-2">
        <title>List of attribute value pairs</title>
        <p>((name0, value0), (name1, value1), . . .).
(in)finite In-/Output of content (e.g. text).</p>
      </sec>
      <sec id="sec-5-3">
        <title>References to other resource views.</title>
        <p>- S: (in)finite set {. . .}
- Q: (in)finite ordered sequence h. . .i</p>
        <p>We use resource view classes to constrain resource view
components. Resource view classes allow integration of data from diverse
data models into iDM without requiring time consuming semantic
schema integration. A resource view Vi of class C is denoted by
ViC. Similarly, its components are denoted by ηiC, τiC, χiC, and γiC.</p>
        <p>We show in Table 1 how our model may be constrained to
represent files, folders and the core subset of XML. We denote the
name of an underlying data item i by Ni, attribute-value pairs
associated to it by a schema W and a tuple Ti, and its content by Ci.</p>
        <p>File
The instantiations shown in Table 1 allow the creation of resource
view graphs as the one shown in Figure 1(b). Our data model is,
however, much more powerful: instantiations for relations and data
streams, as well as a more rigorous discussion on intensional
aspects [20] of iDM are presented in [7].
4.2</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Querying Personal Information</title>
      <p>The iMeMex PDSMS should offer querying services on the
resource view graph representing all of one’s personal dataspace. In
the following, we discuss our ongoing work and open issues on
query specification and processing.</p>
      <p>
        Personal Dataspace Query Language. We propose a new search&amp;
query language for schema-agnostic querying of a resource view
graph: the iMeMex Query Language (iQL). The definition of the
iQL syntax and associated semantics is work in progress. In our
current implementation, the syntax of iQL is a mix between typical
search engine keyword expressions and XPath navigational
restrictions. The semantics of our language are, however, much
different than those of XPath and XQuery. Our language’s goal is to
enable querying of a resource view graph that has not necessarily
been submitted to expensive schema integration. Therefore, as in
search technology, we account for impreciseness in query
formulation. For example, by default, when an attribute name is specified
(e.g. size &gt; 10K), we do not require exact matches on the (implicit
or explicit) schema for that attribute, but rather return fuzzy, ranked
results for the resource views that better match the specified
conditions (e.g. size, fileSize, docSize). This allows us to define
malleable schemas as in [
        <xref ref-type="bibr" rid="ref7">10</xref>
        ]. Other important features of iQL are the
ability to reflect structural constraints, e.g. to explore the context or
neighborhood of items, the definition of extensible algebraic
operations like joins and grouping, and the specification of updates to
the resource view graph.
      </p>
      <p>Indexing Techniques. In our current implementation of iMeMex,
we index all components of every resource view created in the
system. This full indexing strategy follows the intuition that the PIM
environment shares with data warehousing the characteristic of low
update rates, allowing us to trade space and indexing time for query
performance. The information from each component of a resource
view (e.g., name or group of related resource views) goes to a
different index and we perform intersects to process conditions on
several components. We plan to investigate whether it pays off to have
integrated index structures for various resource view components.
In contrast to traditional XML indexing, our index structures must
operate in a general graph data model on possibly infinite data.
Cost-based Optimization. Cost-based optimization (CBO) is one
key technique to provide interactive response times in read-mostly
environments. We are planning to build a CBO for iMeMex to
account for trade-offs in the usage of alternative query plans, e.g., to
consider join orders and different access methods.</p>
      <p>
        Neighborhood Queries. Providing context is key to enable
exploration of query results [
        <xref ref-type="bibr" rid="ref4">5</xref>
        ]. Thus, it is a common pattern to query the
neighborhood of objects returned from a previous query. One
alternative to speed-up such queries is to keep their results materialized
in a special index structure. This index may cover only the
immediate neighborhood of each resource view or it may be extended to
include other reachable resource views. We plan to evaluate how
much context should be kept in the index structure to account for
trade-offs in querying speed, indexing time and update processing.
4.3
      </p>
    </sec>
    <sec id="sec-7">
      <title>Updating Personal Information</title>
      <p>The iMeMex PDSMS should offer soft durability guarantees on
updates made through its interface or via the APIs of the underlying
data sources bypassing iMeMex. In the following, we discuss our
ideas for tackling that challenge.</p>
      <p>Dataspace Update Model. We plan to design an update model for
the iMeMex PDSMS that accounts for the fact that data may be
independently updated via the APIs of the underlying data sources.
In this scenario, ACID guarantees are too strict, once the iMeMex
PDSMS may be notified of updates “after the fact”.
Nevertheless, we believe that classical database logging techniques may be
adapted to this setting to provide softer recovery guarantees (e.g.,
all items updated more than 5 min ago may be recovered).
Versioning. In relational systems, previous versions of a given
tuple may be reconstructed from the database log (see e.g. “time
travel” feature of Oracle). However, personal items are typically
more heavyweight than relational tuples, as they may have medium
to large content. An alternative to logging would be to keep an
independent versioning subsystem (e.g. Subversion) to track content
evolution. We plan to investigate how to integrate versioning into
our update model for personal information and also whether there
are profitable interactions with the techniques chosen for recovery.
Write back. Updates to personal information may be performed
via the API of a given data source or via iMeMex’s API. In the latter
case, one must write the data back to the affected data sources. If
the data is not already present in any data source, iMeMex must
decide in which subsystem(s) it is most suitable to be represented.
Distribution. When a user has several devices, it is natural to ask
how to manage several iMeMex instances and coordinate distributed
query and update processing among these instances. We believe
that the many challenges of this scenario exceed the scope of the
current Ph.D. work. Those challenges will be tackled by a separate
Ph.D. thesis as part of the iMeMex project.
4.4</p>
      <p>iMeMex PDSMS Architecture</p>
      <p>We present in this section the current architecture of the iMeMex
PDSMS, which serves as a framework for all of the previously
discussed technical contributions. We also indicate points of ongoing
work in which the architecture will be extended.</p>
      <p>The core idea of iMeMex is to implement a logical layer that
abstracts from the underlying subsystems and data sources, such as
file systems, email servers, network shares, music streams, RSS
feeds, etc. That logical layer does not take full control of the data,
so it may be bypassed by applications. Figure 2 depicts that layer
and its current implementation in iMeMex.</p>
      <p>iMeMex contains two important sublayers: iQL Query
Processor and Resource View Manager. The main task of the iQL Query
Processsor is to translate incoming iQL queries and to create query
plans for those queries. Our current implementation is based on
rule-based query optimization. We plan to invest in cost-based
optimization techniques as part of future work.</p>
      <p>The Resource View Manager (RVM) is the central instance to
oCnteTiDM
oCnverts</p>
      <p>SF
lPugins</p>
      <p>.</p>
      <p>SR
r3d part
.
managing resource views. Its major components are: Data Source
Proxy, ContentToiDMConverters, Replica&amp;Indexes Module, and
Synchronization Manager. We describe them in the following.
Data Source Proxy. Provides connectivity to the distinct types of
subsystems. It contains a set of Data Source Plugins that
represent the data from the different subsystems (e.g., file systems, RSS,</p>
      <sec id="sec-7-1">
        <title>IMAP, databases, etc) as an initial iDM graph. such as the one presented in Section 4.1.</title>
        <sec id="sec-7-1-1">
          <title>ContentToiDMConverters Module. Enriches the iDM graph pro</title>
          <p>vided by the data source proxy. This is achieved by converting
resource view content to iDM subgraphs that then reflect structural
information (e.g., in LATEX, XML, etc). The result is an iDM graph</p>
        </sec>
        <sec id="sec-7-1-2">
          <title>Replica&amp;Indexes Module.</title>
          <p>Materializes mappings between
resource view identifiers and resource view components (e.g., name
or group of related resource views) to accelerate query processing.
A mapping from resource view identifiers to copies of component
instances is termed a replica. The inverse mapping is termed an
index. Currently, our implementations of replicas and indexes are
based on a DBMS (Apache Derby) for structured information, such
as attribute-value pairs and resource view connections, and on
inverted keyword lists (Apache Lucene) for textual information, such
as names and text content. We plan to extend this module to
provide specialized index structures as discussed in Section 4.2.
Synchronization Manager. Monitors registered data sources for
changes. When a data source is registered at the RVM, the
Synchronization Manager analyzes the data found on the data source
and sends each resource view definition to the Replica&amp;Indexes
Module. The Synchronization Manager also subscribes to update
notifications from the data source. As a consequence, updates
performed on the data source bypassing the RVM layer are then
immediately considered by the Synchronization Manager and the
Replica&amp;Indexes Module. If the data source does not offer update
notifications, the Synchronization Manager generates them based on a
generic polling facility. We will extend this module to incorporate
recovery and versioning techniques, as described in Section 4.3.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSION</title>
      <p>Personal Information Management has become a key necessity
for almost everybody. Reflecting this prominence, considerable
attention has been given to PIM research in the recent past. At the
same time, it has become clear that what is missing is a unified
approach to bring physical and logical data independence to the
management of one’s personal dataspace. We address three
major research challenges in the pursuit of this goal. First, we define
a new data model capable of representing the heterogeneous data
mix found in personal dataspaces. As one application of our model
we bridge the artificial boundary that separates inside and outside
files. Second, we are working on a new search&amp;query language
that operates on our data model. The processing of expressions in
this language calls for the design of efficient techniques, e.g. for
indexing and neighborhood querying. Third, we are working on
a dataspace update model. That model will provide soft
durability guarantees, write-back to data sources as well as detection of
changes made on data sources bypassing iMeMex. We plan to
design integrated recovery and versioning techniques to support our
update model. By building the first publicly available PDSMS, we
believe that we make a significant contribution to the development
of advanced PIM applications.
6.
In CHI, 2006.
2005.
Database Research Self Assessment. The Computing Research</p>
      <p>ACM SIGMOD, 2003.</p>
      <p>In ACM SIGMOD, 2005.
[22] SIGIR PIM 2006.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abiteboul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Agrawal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <article-title>and others. The Lowell Repository (CoRR), cs</article-title>
          .
          <source>DB/0310006</source>
          ,
          <year>2003</year>
          .
          <article-title>Database (Keynote)</article-title>
          .
          <source>In ACM SIGMOD</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bell. MyLifeBits: a Memex-Inspired Personal Store; Another</surname>
          </string-name>
          <string-name>
            <surname>TP</surname>
          </string-name>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Bush</surname>
          </string-name>
          .
          <article-title>As we may think</article-title>
          .
          <source>Atlantic Monthly</source>
          ,
          <year>1945</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E.</given-names>
            <surname>Cutrell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Robbins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dumais</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Sarin</surname>
          </string-name>
          .
          <article-title>Fast, flexible filtering with Phlat - Personal search and organization made easy</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Dittrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Fischer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Kossmann</surname>
          </string-name>
          . AGILE:
          <article-title>Adaptive Indexing for Context-Aware Information Filters</article-title>
          . In ACM SIGMOD, [6]
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Dittrich</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Kossmann</surname>
          </string-name>
          .
          <article-title>iMeMex: A Unified Approach to Personal Information Management</article-title>
          .
          <source>In SNF project under contract</source>
          [7]
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Dittrich</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. A. V.</given-names>
            <surname>Salles</surname>
          </string-name>
          .
          <article-title>iDM: A Unified and Versatile Data Model for Personal Dataspace Management</article-title>
          .
          <source>In VLDB</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.-P.</given-names>
            <surname>Dittrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A. V.</given-names>
            <surname>Salles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kossmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Blunschi</surname>
          </string-name>
          . iMeMex:
          <article-title>Escapes from the Personal Information Jungle (Demo Paper)</article-title>
          .
          <source>In VLDB</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          .
          <article-title>A Platform for Personal Information Management and Integration</article-title>
          . In CIDR,
          <year>2005</year>
          . Report. In WebDB, pages
          <fpage>139</fpage>
          -
          <lpage>144</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Halevy. Malleable Schemas</surname>
          </string-name>
          : A Preliminary [11]
          <string-name>
            <given-names>P.</given-names>
            <surname>Dourish</surname>
          </string-name>
          et al.
          <article-title>Extending Document Management Systems with User-Specific Active Properties</article-title>
          .
          <source>ACM Transactions on Information Systems (TOIS)</source>
          ,
          <volume>18</volume>
          (
          <issue>2</issue>
          ):
          <fpage>140</fpage>
          -
          <lpage>170</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Franklin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Maier</surname>
          </string-name>
          . From Databases to Dataspaces:
          <article-title>A New Abstraction for Information Management</article-title>
          .
          <source>SIGMOD Record</source>
          ,
          <volume>34</volume>
          (
          <issue>4</issue>
          ):
          <fpage>27</fpage>
          -
          <lpage>33</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Freeman</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Gelernter</surname>
          </string-name>
          .
          <article-title>Lifestreams: A Storage Model for Personal Data</article-title>
          .
          <source>SIGMOD Record</source>
          ,
          <volume>25</volume>
          (
          <issue>1</issue>
          ):
          <fpage>80</fpage>
          -
          <lpage>86</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Graupmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schenkel</surname>
          </string-name>
          , and
          <string-name>
            <surname>G. Weikum.</surname>
          </string-name>
          <article-title>The SphereSearch Engine for Unified Ranked Retrieval of Heterogeneous XML and Web Documents</article-title>
          .
          <source>In VLDB</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          et al.
          <article-title>Crossing the Structure Chasm</article-title>
          .
          <source>In CIDR</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Jagadish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. V. S.</given-names>
            <surname>Lakshmanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Scannapieco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Wiwatwattana. Colorful</surname>
          </string-name>
          <string-name>
            <surname>XML</surname>
          </string-name>
          :
          <article-title>One Hierarchy Isn't Enough</article-title>
          .
          <source>In ACM SIGMOD</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>W.</given-names>
            <surname>Jones</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Bruce</surname>
          </string-name>
          .
          <source>A Report on the NSF-Sponsored Workshop on Personal Information Management</source>
          , Seattle, Washington,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Karger</surname>
          </string-name>
          et al. Haystack:
          <string-name>
            <given-names>A Customizable</given-names>
            <surname>General-Purpose</surname>
          </string-name>
          [
          <volume>23</volume>
          ] http://msdn.microsoft.com/data/WinFS/ WinFS.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>