<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>I remember I have seen this: how can I re-find it?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Samur Araújo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Delft University of Technology</institution>
          ,
          <addr-line>PO Box 5031, 2600 GA Delft</addr-line>
          ,
          <country country="NL">the Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Everybody experiences it: our personal information from mails, documents, chats and browsing is scattered over many applications, formats, and devices, which makes it difficult to access it again some time after it has been seen for the first time. The challenge of the Memorizer project is to allow users to re-find a piece of data that they have seen before but for which they cannot remember where it is. This objective can be achieved by building a Personal Web of Data, a semantic layer that describes, aggregates, interlinks and stores metadata about information that users have accessed before. Later on, this web of data can then be exploited using smart semantic browsing. In this PhD symposium paper we consider Memorizer's grand challenge in refinding and the approach to achieve this in the period ahead.</p>
      </abstract>
      <kwd-group>
        <kwd>semantic browsing</kwd>
        <kwd>semantic web in use</kwd>
        <kwd>information extraction</kwd>
        <kwd>memory</kwd>
        <kwd>user history</kwd>
        <kwd>linked data</kwd>
        <kwd>RDF</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1. Introduction
“I remember I have seen this: where is it?” This question is asked everyday by
millions of people that work in a digital information environment [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. The core
challenge for solving this problem is to allow users to find back information that they
have seen before, using any mental clue (represented through meta-information or
metadata [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]) they can remember, independent of the application, format or device
where the information was seen before. Our hypothesis is that bringing the
representation of the users’ personal data closer to the users’ mental model about that
data, or their memories, we can better support the re-finding tasks. Semantic Web
technology can play a crucial role in this challenge, since it provides a complete
framework to weave the users’ personal data into a web of data that, in level of
granularity, is closer to the users’ memory.
      </p>
      <p>
        Many important related issues have been tackled in the Personal Information
Management (PIM) research area [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ], e.g. finding and re-finding of information,
representing and unifying of personal information, keeping and organizing of personal
information, privacy, security, etc. Nevertheless, the role of semantics and in
particular the elicitation of metadata in the context of re-finding has been dealt with in
a modest manner in that area. Firstly, because semantic technologies are still reaching
maturity and there is no reference architecture about how to apply natural language
processing, information extraction, Resource Description Framework (RDF) model
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], linked data, ontology matching, etc. for that concern of metadata elicitation for
refinding. Another reason is the fact that the PIM research area is still learning how
people collect, store, manage, and specially re-find information.
      </p>
      <p>
        In spite of several projects that demonstrate the benefits of bringing together
semantic technology and PIM, such as Nepomuk [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Semex [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], IMemex [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and
Haystack [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], these works were not particularly focused on re-finding and many
aspects remain to be addressed, such as information fragmentation [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ],
addressability, mobility of data, and also privacy, provenance and trust [
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13, 14, 15</xref>
        ]
that we will not concern in this paper for the matter of space.
      </p>
      <p>In this PhD work, we are concerned with the use of Semantic Web in the context
of re-finding digital information. In particular, we focus on the problem of data access
and show that the elicitation, combination and interlinking of metadata using the
semantic approach can be applied to enrich and optimize re-finding of information.</p>
      <p>For instance, suppose that Ana is designing a web site together with Daniel.
Daniel finds a reference that may be interesting to Ana: the Scriptaculous API, a
Javascript API for building animations. So, he sends an e-mail recommending her to
have a look on the Scriptaculous’ web site. As Ana has already decided which
technology to use in the project she does not visit the web site. One year later, Ana is
looking for a Javascript API for building pop-up menus on web pages. She has
visited many pages but she cannot find any solutions suitable to her needs.
Fortunately, she remembers that Daniel has mentioned an API (the Scriptaculous) that
can be useful, however, she does not remember the name of the API, or the URI or
when Daniel gave her such information. In this scenario, the system can use the user
mental clues during the re-finding task by eliciting and connecting information
between the user history and email messages. For instance, Ana can search for
messages sent by Daniel, but as the system knows that she has just visited few pages
concerning Javascript APIs, the system can filter messages sent by Daniel that also
are related to Javascript concept. As result, Ana will be able to find the Scriptaculous
URI by providing only her mental clues that she is able to remember, instead of
wasting her time by browsing the entire mailbox or performing many queries on
search engines.</p>
    </sec>
    <sec id="sec-2">
      <title>2. State of the Art</title>
      <p>Due to the nature of this PhD symposium paper, an exhaustive overview on related
work cannot be given here. Instead, we focus on the identification of some specific
problems that allow us to help clarify the need for a semantics-based approach.</p>
      <p>Information Fragmentation</p>
      <p>
        The process of re-finding personal data is closely related to the process of how
the personal data was collected and stored. Unfortunately, the tools that users have
today, lead them to spread their data among many applications and formats, therefore
decreasing and/or breaking the data connectivity. For instance, once a user saves an
email attachment to her file system, the connection between the file and message is
lost. Consequently, this demands a high mental load for re-finding the knowledge
about who sent that document. This lack of data connectivity is known in the
literature as information fragmentation [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        To solve this problem some authors [
        <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
        ] have proposed to build an integration
layer that describes the information in a uniform language. The Semantic Web (SW)
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] can play an important role in this endeavor, since the users’ mental model of
their digital artifacts can be expressed using the Resource Description Framework
(RDF) model, that it is flexible enough to change, grow or expand the users’ mental
model representation as it evolves. The benefits of using these semantic technologies
for building a uniform representation layer was shown in Nepomuk [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Semex [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
IMemex [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and Haystack [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Although, it has not been proven how good these
systems perform the re-finding task, clearly there is space for improvement. They do
not detect the user context that we argue is crucial for providing a better re-finding
system. If the user is looking for something for the second time, we can infer that
what she is looking for is based on her current context, since such context was seen at
least once before, when the information was firstly accessed. Those semantic desktops
do not exploit such correlation that can be used, at least, to reduce the amount objects
presented for users during the re-finding task, consequently reducing the user mental
load on such process.
      </p>
      <p>Information Extraction</p>
      <p>
        An important part of the problem concerns the techniques to elicit relevant
metadata about the information that it is already there, e.g. file system, email, web
history, etc. Information extraction in connection to the Semantic Web can be used to
help in structuring the users’ data that is likely wrongly (less optimally) structured for
the purpose of re-finding, consequently allowing to form an integrated and universal
medium of information that (expectedly) improves the possibility to re-find data [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
Nepomuk, Semex, Memex, IMemex and Haystack use information extraction to build
a semantic layer of metadata: however, they do not focus on combining and
integrating the metadata to better support users re-find information.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Approach and Methodology</title>
      <p>Little attention has being given until now to the process of re-finding information.
The previous section indicates that there is still a lot of work to do regarding this
challenge. In this PhD research we have started work on an approach centered around
the notion of elicitation of metadata about information that user accesses, organizing
it and exploiting it, in order to support re-finding tasks. In particular, we are
concerned with the following research questions.</p>
      <sec id="sec-3-1">
        <title>1. How do we integrate meta-data about the personal data for re-finding?</title>
        <p>2. How do we query the integrated meta-data for supporting re-finding tasks?
To outline a realistic Ph.D. proposal, we will focus on re-finding of email and web
history. Our first approach for that problem is to elicit meta-data about the user’s web
history and email archive. For that purpose, we are going to use available tools on the
literature that can be used for extracting information from the user’s email and web
history. Also, we are going to express and consolidate this meta-data in a uniform
semantic layer, expressed in RDF model, which serves as a first step towards
organizing the entities for re-finding. We are going to investigate the process of
integrating data, and how this meta-data can be automatically interconnected. We will
also investigate how to detect the user context based on her web navigation, and how
to integrate such information with other elicited meta-data during the re-finding
process.</p>
        <p>All this information will be expressed in a novel ontology-independent approach.
Our intention is to evaluate an approach where different terminologies can co-exist
but where the system exposes to the user only a joint view of concepts, properties and
classes that are similar but they were expressed in different terminologies. With this
approach, the system can learn the user’s mental model and adapt its terminology to
it, instead of forcing the user to adapt to the system terminology.</p>
        <p>Since there might be information that can be a crucial mental clue for re-finding
but that can not be directly obtained from the user’s personal data information, we are
going to use external sources of data as WordNet1, DBPedia2 and other datasets on the
Linked Data to enrich the semantic layer.</p>
        <p>Once we have organized the semantic layer, we will expose it to users using
different exploration mechanisms, such as browsing, facet navigation and
keywordsearch, in order to identify the best retrieval mechanism to perform and support
refinding tasks. We will evaluate the precision and recall of each retrieval mechanism.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>We motivated our work with general considerations about the use of the semantic web
in the process of personal information management. While there are many open
problems, we focused on a specific family of interrelated problem that are centered
around the notion of re-finding information using metadata about the information that
they have been seen before. As a result, we expect to bring a novel architecture to
represent the user’s personal data that it is closer to the mental model that the user has
about her environment, therefore improving how the information can be re-found.
Nevertheless, we are still in early stage of development; we expect soon the delivery
of the first tools that will be part of the whole re-finding system called Memorizer.</p>
      <sec id="sec-4-1">
        <title>1 http://wordnet.princeton.edu/wordnet/ 2 http://dbpedia.org/About</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Tauscher</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greenberg</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <article-title>How people revisit Web pages: empirical findings and implications for the design of history systems</article-title>
          ,
          <source>International Journal of HumanComputer Studies</source>
          , v.
          <volume>47</volume>
          n.1, p.
          <fpage>97</fpage>
          -
          <lpage>137</lpage>
          ,
          <year>July</year>
          ,
          <year>1997</year>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Teevan</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adar</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potts</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>History repeats itself: repeat queries in Yahoo's logs</article-title>
          ,
          <source>Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, August 06-11</source>
          ,
          <year>2006</year>
          , Seattle, Washington, USA
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Candan</surname>
            <given-names>K. S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suvarna</surname>
            <given-names>H.</given-names>
          </string-name>
          ,
          <article-title>Resource description framework: metadata and its applications, ACM SIGKDD Explorations Newsletter</article-title>
          , v.
          <volume>3</volume>
          <fpage>n</fpage>
          .1,
          <string-name>
            <surname>July</surname>
          </string-name>
          ,
          <year>2001</year>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Al-Fedaghi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Ahmad</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>Personal information modeling in semantic Web</article-title>
          ,
          <source>The Asian Semantic Web Conference (ASWC)</source>
          , Beijing, China,
          <fpage>3</fpage>
          -
          <lpage>7</lpage>
          September, pp.
          <fpage>668</fpage>
          -
          <lpage>681</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Sauermann</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elst</surname>
            van
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dengel</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>PIMO - a Framework for Representing Personal Information Models</article-title>
          .
          <source>9th International Conference on Knowledge Management and Knowledge Technologies 2-4 September</source>
          <year>2009</year>
          , Graz, Austria
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Norrie</surname>
            ,
            <given-names>M. C.</given-names>
          </string-name>
          ,
          <source>PIM Meets Web 2.0. In Proceedings of the 27th International Conference on Conceptual Modeling (ER</source>
          <year>2008</year>
          ), Barcelona, Spain, October,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>7. RDF Primer - http://www.w3.org/TR/rdf-primer/</mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Groza</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Handschuh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moeller</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grimnes</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sauermann</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Minack</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mesnage</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jazayeri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reif</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Gudjonsdottir</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <source>The NEPOMUK</source>
          Project -
          <article-title>On the way to the social semantic desktop</article-title>
          .
          <source>In Proceedings of the International Conference on Semantic Technologies (I- Semantics)</source>
          .
          <fpage>201</fpage>
          -
          <lpage>211</lpage>
          .
          <year>2007</year>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Cai</surname>
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            <given-names>X. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halevy</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            <given-names>J. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madhavan</surname>
            <given-names>J.</given-names>
          </string-name>
          ,
          <article-title>Personal information management with SEMEX</article-title>
          ,
          <source>Proceedings of the 2005 ACM SIGMOD international conference on Management of data, June</source>
          <volume>14</volume>
          -16,
          <year>2005</year>
          , Baltimore, Maryland
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dittrich</surname>
            <given-names>J.P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salles</surname>
            <given-names>M.A.V</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kossmann</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blunschi</surname>
            <given-names>L.</given-names>
          </string-name>
          ,
          <article-title>iMeMex: escapes from the personal information jungle</article-title>
          ,
          <source>Proceedings of the 31st international conference on Very large data bases, August 30-September 02</source>
          ,
          <year>2005</year>
          , Trondheim, Norway
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Karger</surname>
          </string-name>
          , David R. and Bakshi, Karun and Huynh, David and Quan, Dennis and Sinha, Vineet.,
          <string-name>
            <surname>Haystack</surname>
            :
            <given-names>A Customizable</given-names>
          </string-name>
          <string-name>
            <surname>General-Purpose Information</surname>
          </string-name>
          <article-title>Management Tool for End Users of Semistructured Data</article-title>
          .
          <source>Proceedings of the 2003 CIDR Conference</source>
          .
          <year>2005</year>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Tungara</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pyla</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sampat</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez-Quinones</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <article-title>Defragmenting Information using the Syncables Framework</article-title>
          ,
          <source>In: SIGIR Workshop on Personal Information Management</source>
          ,
          <year>2006</year>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>P.</given-names>
            <surname>Nasirifard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hausenblas</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. Decker.</surname>
          </string-name>
          ,
          <source>Privacy Concerns of FOAF-Based Linked Data, In Trust and Privacy on the Social and Semantic Web Workshop (SPOT 09) at ESWC09</source>
          , Heraklion, Greece,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>H.</given-names>
            <surname>Halpin</surname>
          </string-name>
          .,
          <string-name>
            <surname>Provenance:</surname>
          </string-name>
          <article-title>The missing component of the semantic Web for privacy and trust</article-title>
          ,
          <source>In Trust and Privacy on the Social and Semantic Web (SPOT2009)</source>
          ,
          <source>workshop of ESWC</source>
          <year>2009</year>
          , Heraklion, Crete, Greece,
          <year>June 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Jeremy J. Carroll</surname>
          </string-name>
          , Christian Bizer, Pat Hayes, Patrick Stickler,
          <article-title>Named graphs, provenance and trust</article-title>
          ,
          <source>Proceedings of the 14th international conference on World Wide Web, May 10-14</source>
          ,
          <year>2005</year>
          , Chiba, Japan
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Nivas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <source>Semantic Web Architecture, 6th International and 2nd Asian Semantic Web Conference (ISWC2007+ASWC2007)</source>
          .
          <source>Pages 77-78. November</source>
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Elsweiler</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baillie</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ruthven</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <article-title>Exploring Memory in Email Re-finding. ACM TOIS CFP special issue on Keeping, Re-finding, and Sharing Personal Information</article-title>
          . vol.
          <volume>26</volume>
          (
          <issue>4</issue>
          ), pp.
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Bernstein</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kleek</surname>
            <given-names>M. V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karger</surname>
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schraefel</surname>
            <given-names>M. C.</given-names>
          </string-name>
          ,
          <article-title>Information scraps: How and why information eludes our personal information management tools</article-title>
          ,
          <source>ACM Transact September</source>
          ,
          <year>2008</year>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>