Sharing Contextualized Attention Metadata to Support Personalized Information Retrieval Martin Memmel, Andreas Dengel DFKI, Knowledge Management Department & University of Kaiserslautern, Computer Science Department {memmel,dengel}@dfki.de ABSTRACT The ability to provide the right resources in a given context is a key factor for the support of knowledge workers. The information provided about the resources is crucial for any information retrieval approach, and it should allow multi- perspective descriptions of the resources. Enhancing these descriptions with information about the attention that users spend on such resources in a specific context will provide valuable additional information. The architecture proposed in this paper will allow to share and distribute contextu- alized attention metadata gathered with different user ob- servation components to enable the integration of context- aware, personalized information retrieval services in arbi- trary contexts and applications. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: General, In- formation Storage, Information Search and Retrieval, Sys- tems and Software, Online Information Services; H.4 [In- Figure 1: Creativity depends on the quality to com- formation Systems Applications]: Miscellaneous; H.5 bine tacit knowledge with explicit information [Information Interfaces and Presentations]: General, Multimedia Information Systems, User Interfaces At their workspace, knowledge workers are involved in var- ious processes in which they have to solve tasks by employ- Keywords ing available expertise or make use of their experience from Contextualized Attention Metadata, Information Retrieval, earlier similar situations. In this considerations, as shown Knowledge Management, Resource Profiles, User Observa- in figure 1, they have access to local or shared reposito- tion ries capturing best practices and other information objects which may be clustered into categories or even structured 1. INTRODUCTION into hierarchical schemes helping to make decisions or drive In a world where the amount of available information and workflows. Furthermore, they have tacit knowledge based knowledge is growing with a speed higher than ever be- on terminological background, on individual competences, fore, and where knowledge workers have to learn consistently know how but also subjective interests all of which can be throughout their lifespan, the role of information retrieval combined with the accessible sources for being creative as techniques becomes more and more important. They should soon as new documents are received or generated. support users in efficiently accessing digital resources, i.e., to get just the right content in just the right time, ideally It is a matter of fact that the quality of solving a given without having to leave the current task and workspace con- task strongly depends on the operating experience with the text. available explicit sources (where to find what for which pur- pose in which form, ...) and how to relate these pieces of information with context of the given task. By interacting with documents users form mental models based on their experience and the contents with which they are interacting. These mental models provide both predic- tive and explanatory power for understanding and catego- rizing the containing messages, questions, commands, no- tices, or orders. This is because bits of information are never stored in memory as individual units, but integrated 2. MULTI-PERSPECTIVE PERSONAL into known clusters which correspond to the very individual DOCUMENT MANAGEMENT view of the world of a human being. Although the informa- Establishing contextual information is a difficult task. Man- tion object does not change, such clusters may differ over ually defined formal ontologies and process models typically time because the perspective has changed, i.e., because new address only a high-level fraction of a domain and require insights have been gained or the information is applied to continuing maintenance that is cost-intensive. Automatic another problem. methods mainly driven by statistical machine learning ap- proaches, in most cases, leave too much ambiguity and dis- Mental models evolve naturally through our interaction with orientation when they are used in shared contexts because particular environments. They play an important role for users have different roles, tasks and interests, and thus con- orientation and problem solving because they are used to sider the contents subjectively. This becomes obvious if we simplify understanding and learning by representing and or- take a contract document about a technical innovation and ganizing general knowledge. They are formed to explain ask a group of persons, say a lawyer, a sales person, or complex phenomena of our world and to filter our environ- a technician how they would file the document into their ment making it easier for us to interpret and predict the repository. things which may happen as well as to take action to re- spond. Because we are part of different cultural and social It is obvious that each of them would categorize the docu- systems, belong to different peer groups, have different at- ment in a different way. Apparently it strongly depends on titudes or beliefs, and play different roles, there are also the role of a reader, at what time the document is consid- differences in these models. ered, in which terminology and language it is written, on which tasks he/she is currently working, and what expertise If we were able to understand and to capture how people and experience is available. Thus, a document may be seen evolve their mental models, we might provide cognitive ade- as valuable information, as bootless or even as an annoyance. quate interaction platforms for communication and collabo- ration. These interfaces should include attention, memory, Even a single user may have difficulties because documents perception and learning but however, should also consider usually allow for perspective considerations depending on the way users perceive, categorize and remember in the con- the given circumstances. The ‘who’, ‘what’, ‘where’, and text of specific tasks. ‘when’ aspects inherent to documents usually give a choice of filing a document into different folders. Taking all of these The famous statement of the Austrian philosopher Ludwig issues into consideration, we have proposed an adaptive per- Wittgenstein (1889 - 1951) sonal memory system allowing to import native structures, such as file folder hierarchies, bookmark collections or email ‘Die Bedeutung eines Wortes ist repositories. It is built on the following principles: sein Gebrauch in der Sprache’ ‘The meaning of a word is its use in the language’ 1. Using statistical machine learning techniques for gen- can be transferred into the world of (digital) resources: erating terminological conceptualizations to explain the subjective understanding of the folder names. ‘The meaning of a resource is its use in the community’ 2. Provide multi-dimensional views to a document space, Capturing and sharing information about the attention that e.g., document type, topic, project, event, contact. users spend on resources in specific contexts will provide valuable information enabling significantly improved, per- 3. Install an integrated view that combines the docu- sonalized information retrieval services based on the mental ments of the file system with those in the email system models of the users. and the bookmarks. In this paper, we will first introduce the concept of multi- perspective personal document management and the im- In this way, a user may reorganize his own workspace into a plications for the description of resources with metadata. personal memory offering him different organizational views Several existing approaches to realize context-aware sup- for filing and seeking information. For more details, we like port and general requirements concerning the distribution, to refer to [1]. matching and sharing of Contextualized Attention Meta- data (CAM) will provide the basis for our proposed archi- Note that all initial categorizations result from the imported tecture allowing to aggregate, share, and distribute resource structures or from an initial training phase. As soon as some descriptions and CAM captured in different scenarios, and documents are already categorized into the folder providing to provide context-aware, personalized information retrieval a representative conceptualization, the system supports the services. An interactive context cockpit will allow users to user based on earlier categorization decisions. For exam- intuitively control the matching processes between contexts ple, new emails arriving in the inbox of a user are, after and resource descriptions. conceptualization, compared to the concepts of folders and the system comes up and proposes folders the email may belong to, such as document type ‘A’, event ‘B’ and topic ‘C’. This is visualized by question marks. In figure 2, we show some exemplary views from a real personal workspace. Figure 3: Resources are described with different in- Figure 2: Snapshot from a multi-perspective per- formation captured in various ways sonal memory providing information views such as document type, partner and customer, department, and topic. reasons. Wiley et al. therefore distinguish between objec- tive (e.g., the size of a file) and subjective (e.g., the degree of interactivity of a resource) metadata [11]. For views ‘document type’ (German: ‘Dokumente’), ‘part- ner and customers’ (German: ‘Partner und Kunden’), ‘de- Despite the existence of methods that allow for the auto- partments’ (German: ‘Organisation’), and ‘topics’ (German: matic generation of metadata, meaningful data can often ‘Themen’) allow for filing one and the same document into only by created by humans, but often ‘People lie’, ‘People multiple folders at the same time. are lazy, ‘People are stupid, and ‘People are lousy observers of their own behaviors’ as Doctorow states in [2]. Beside the organizational aspects and the support for cate- gorizing new documents, we have implemented a set of re- Due to these facts it is clear that centralized approaches trieval techniques. It includes: are a bad idea if we want to provide resource descriptions according to our needs. Instead, any attempt to describe re- sources should embrace diversity. Thus, we propose the use • Classical full text search. of ‘resource profiles’ instead of single metadata sets [3]. A resource profile is defined as a ‘a multi-faceted, wide ranging • Contextual search (as a query expansion using those description of a resource’. It is not conform to a particular terms which are conceptually close). XML schema, instead, it is a patchwork of metadata formats • Combining terms queries and folders, i.e., search for (potentially created by different authors) which are assem- ‘eLearning’ restricted by the folder ‘Reports’. bled as needed in order to form a description that is most appropriate for the given resource. • User feedback by indicating good documents (click on ‘+’) or bad ones (click on ‘-’), etc. When designing a system to share digital resources and ac- cording CAM, and to realize context-aware, personalized in- formation retrieval, this means we should offer the possibil- Moreover, we allow the additional use of metadata, such as ity to annotate various descriptions for each resource (see author, generation date, document size or type (see also [1]). figure 3). This includes multi-perspective descriptions of documents, context information gathered with various com- 3. DESCRIBING RESOURCES ponents, and information created in a lightweight approach The quality of the information provided about digital re- using social software (e.g., tagging of resources). sources is crucial for any information retrieval approach. There are different ways and standards (e.g., Dublin Core) Nevertheless, there must of course exist some mandatory to describe digital resources. However, these approaches metadata to enable basic functionalities such as search and usually suffer from several problems (see [2]) that can only display (containing, e.g., the name and location of a re- be partly solved with technology. The main problem is that source), about the technical format of a resource and the there is the implicit assumption in the structure of most technical requirements to use it, and for intellectual property metadata formats which suggests that there is a one-to-one rights with information about the way in which a resource relationship between a resource and the metadata that de- may be used. An example of an according format will be scribes it [3]. But as we have already argued in section 2, given in section 6.3.2. there is no ‘single and correct’ way to describe a resource. A lot of the information depends on the context in which a resource was created, and by whom it will be used for what "underline" "discovery feedback" "2006-11-24T06:43:57" "2006-11-24T06:43:57" ... http://jena.sourceforge.net/ Jena Semantic Web Framework text/html 2005-04-13T14:45:42 1.0 Figure 4: Excerpt of context information captured within PEEK and EPOS 4. APPROACHES TO REALIZE CONTEXT- The methods used in these projects to capture context in- AWARE SUPPORT formation will be introduced later in section 6.1. There are various different approaches allowing to capture CAM in a knowledge worker’s environment. In the DFKI 5. DISTRIBUTING, MATCHING AND Knowledge Management Department, several efforts have SHARING CAM been undertaken to realize context-sensitive support: When we want to use CAM captured by different user obser- vation components, first of all a common format to represent context and according mapping mechanisms are required. • In the research project EPOS 1 (Evolving Personal to The data provided in this format can then be used to realize Organizational Knowledge Spaces), a context-sensitive context-aware information retrieval. But when this technical system to support knowledge workers was developed problems are solved, we only have a basis for context-aware, [4, 10]. The objective of EPOS is to leverage a user’s personalized information retrieval. The main task will be efforts for his personal knowledge management for his to encourage users to participate in the system, to create own benefit as well as to evolve this within the orga- and publish context information, and to provide additional nization. information about resources. This especially includes issues such as privacy and security. • The research project MyMory 2 (Personal Memories with Attentive Documents for Knowledge Workers) 5.1 Representing CAM aims at employing technologies for unobtrusive user To avoid ambiguities, and to ensure that a common under- observation in order to create relations between in- standing of terms is guaranteed, we propose to use a flat and formation items that are meaningful to the user in his rather simple format as the basis for context-aware informa- specific context, using attention evidence for more pre- tion retrieval. The matching algorithms used to find similar cise information delivery, and providing mechanisms of contexts will use context information provided in this way. meaning coordination to facilitate reusability of know- A basic context format can, e.g., consist of the following ledge among different contexts. MyMory results shall concepts: be demonstrated within the C3DW (Connected, Con- text-aware, Creative Document Workspace) applica- tion. People: Persons that are involved in the current context, e.g., contacted via mail or instant messaging. • The TaskNavigator developed in the competence cen- Resources: Resources used in the current context (e.g., ter ‘virtual office of the future’3 is a novel prototype documents modified with a text processor, sent via to support weakly-structured processes by integrating email or created in a file explorer application) a standard task list application with a state-of-the-art document classification system. The resulting system Topics: Topics that a user dealt with, e.g., extracted by allows for a task-oriented view on office workers’ per- analyzing used resources, highlighted passages, etc. sonal knowledge spaces in order to realize a proactive Tasks: The user’s current tasks, e.g., extracted from a work- and context-sensitive information support [5]. flow or task management system. • The project PEEK (Personal and Episodic Knowledge Projects: Projects that are related to the current context. Retrieval in Desktop Search) aims to enhance Seman- Organizations: Organizations that are related to the cur- tic Desktop Applications by capturing relevant infor- rent context. mation about document with a Digital Pen. Events: Events that are related to the current context. 1 http://www.dfki.uni-kl.de/epos Locations: Locations that occurred in the users’s context. 2 http://www.dfki.uni-kl.de/mymory 3 Time: The time the context information was captured. http://www.ricoh.rlp-labs.de/ (?X rdf:type rdfs:Class), approach. It is important that the user is always in control noValue(?X rdfs:subClassOf ?X) -> (?X rdfs:subClassOf ?X). of what happens with the information he provides. This includes the right to delete information at any given time, (?X rdf:type ?D), and to provide a transparent system where every user can (?D rdfs:subClassOf ?C) see exactly how his information is used [6]. -> (?X rdf:type ?C). ################################# 6. OVERALL ARCHITECTURE # PIMO # ################################# The overall architecture of the proposed CAM sharing sys- tem is depicted in figure 5.2. It consists of the following (?X pimo:hasOtherRepresentation ?Y) components: -> (?Y pimo:hasOtherRepresentation ?X). (?X pimo:hasOtherRepresentation ?Y), • User observation components developed in various pro- (?Y pimo:hasOtherRepresentation ?Z), notEqual(?X, ?Z) jects to gather CAM data in different formats, -> (?X pimo:hasOtherRepresentation ?Z). • CAM preprocessing components allowing to enhance gathered CAM data, and to map it to a common for- Figure 5: Excerpt of mapping rules created with the mat as described in section 5.1, Jena framework in the EPOS project • a resource and metadata hub to store resources and in- formation about them, e.g., provided by the user ob- 5.2 Integrating CAM servation components, and The information gathered by the user observation compo- nents is often represented in heterogeneous formats. Figure • context-aware information retrieval services based on 4 provides an example for information gathered in EPOS the information provided by the user and the resource and PEEK. This information can be enhanced using entity and metadata hub. extraction algorithms (provided, e.g., by tagthe.net4 ), and related concepts can be added using existing classificators or ontologies. 6.1 User observation components As shown in figure 5.2, there are numerous possibilities to When the captured context data is enriched with informa- capture CAM. In the projects mentioned in section 4, dif- tion, it has to be mapped to the concepts introduced in ferent methods have been developed to capture information section 5.1. Therefore, mapping rules can, e.g., be defined about a user’s context: using the Jena semantic web framework (see figure 5 for an example used in EPOS). • In EPOS, context information is gathered through the 5.3 Encouraging users to participate use of installable user observation plugins for stan- As already explained in section 5.1, centralistic approaches dard office software such as email clients (thunder- have a lot of weaknesses. Thus, the aim must be to at- bird), browsers (firefox) and text processors (jedit). tract enough stakeholders that can contribute valuable (con- These plugins can analyse, e.g., which content was in text) information about resources, at best working as a self- the user’s focus (also taking into account scrolling be- sustained community. Apart from dissemination efforts, it havior), and which searches have been carried out. In is very important to encourage users to participate: addition to that, file explorers were used as a source for CAM. • The interaction with the system should be as easy and • MyMory enhances the components developed in EPOS intuitive as possible. Therefore, user interfaces are re- by using an Eye Tracker to deliver more precise in- quired that follow the principles of simplicity [8] and formation about on which part of a document a user joy-of-use [9]. spends attention. • Reward mechanisms can be used to promote contribu- tors and the quality of contributions. • In PEEK a Digital Pen is used that can capture and store handwritten annotations on printed documents • Users should be offered the possibility to use function- (printed on paper with dot pattern to allow the recog- alities in their usual contexts and applications, so that nition of the document); annotations are stored with they can contribute without having the need to use the original document as pdf. new tools. E.g., widgets and a service oriented archi- tecture can be used to realize such an integration. • The TaskNavigator prototype captures information a- bout resources used in certain tasks. It is also possible to assign a document to a task when copying or scan- Especially in the field of user observation, ensuring privacy, ning it with a multifunctional product (MFP). In this security and transparency are crucial for the success of our case, OCR techniques are used to extract information 4 http://www.tagthe.net/ from a document. Figure 6: Overall architecture of the proposed CAM sharing system 6.2 CAM Preprocessing For each user observation components, a method has to be defined to create CAM data according to the format used for context matching components used by the context-aware information retrieval services. As described in section 5.2, such a method can also be used to enhance the captured information. It is also important to notice that existing descriptions of resources used in the given context can be used as a source of information for this task. 6.3 Resource and Metadata Hub To provide information retrieval based on information about digital resources and shared CAM, a component allowing to share resources and according CAM, and to collect and retrieve this information is required. In the project CoMet 5 (Collaborative Sharing of Metadata), such a component is currently being developed in DFKI. 6.3.1 Functionalities Figure 7: Screenshot of the CoMet user interface CoMet provides functionalities to insert, search and display resources. For registered users, CoMet also offers mechanisms to rate, tag and comment on resources, and to manage own tags and Insert: A resource can be inserted by uploading it as a file lists of friends and favorite resources. The information pro- into CoMet’s WebDAV repository, or by just using a vided by the users allows to browse content via tags (social reference to the resource, i.e., its URI. browsing), and resources can be ranked according to differ- Search: CoMet provides different search filters, e.g., a user ent criteria, e.g., alphabetically, most viewed, best rated, can search for resources which contain certain key- etc. words in their title, description or tags. Further an advanced search is provided that allows to search for Most of the functionalities can be accessed via Web Interface keywords in defined metadata terms. (see figure 7) or Web service API. This allows for an easy integration in different contexts and applications. Display: Supported formats are those which can be dis- played directly by a browser (e.g., JPEG, MP3, SWF, 6.3.2 Metadata etc.) CoMet stores a derivative of the Dublin Core Metadata Ele- 5 http://www.dfki.uni-kl.de/comet ment Set for every resource which is registered in the system. Mandatory Resource Metadata dc:contributor Person who inserted the resource into CoMet. dc:creator Author of the resource. dc:date Date of insertion. dc:description Description of the resource. dc:format Either MIME type or a proprietary for- mat. dc:identifier URI which identifies the resource uniquely. dc:rights CC license which is associated with the resource. dc:title Title of the resource. Figure 8: Mockup of a Context Cockpit Metadata of a User-Defined Metadata Set dc:contributor Person who inserted the metadata set into CoMet. To allow users to adapt the information retrieval process dc:creator Author of the metadata set. to their current needs, we propose an interface following dc:date Date of insertion. the metaphor of a mixing desk (see figure 6.4). This ‘con- dc:description Description of the metadata set. text cockpit’ allows users to intuitively control the matching dc:format Metadata format (e.g., DC) process by assigning different weights to the different con- dc:identifier Identifier of the metadata set. cepts. Thus, the results as well as the order in which they dc:relation URI of the described resource. are presented can be adapted to the specific needs of the user. Table 1: An excerpt of the metadata used in the CoMet system 7. SUMMARY AND FUTURE WORK Capturing and sharing information about the attention that Additionally, users can associate resources with metadata users spend on resources in a specific context provides valu- sets in arbitrary formats (e.g., about the context in which able information about the mental models of users. The they used a resource). Together with these metadata sets in- possibility to use this information can significantly improve formation about the metadata (i.e., the meta-metadata) has existing information retrieval approaches. To realize an ac- to be provided. An excerpt of the metadata used to store cording system, we propose a service oriented architecture information about resources and user-defined metadata sets that allows to integrate various user observation compo- in CoMet is presented in table 1. CoMet also allows the def- nents, and where arbitrary types of multimedia resources inition of different variants of resources. These variants are can be integrated and described using resource profiles. Thus, intended to mediate similar information in different dimen- we can provide multi-perspective descriptions about the re- sions (e.g., ‘language’ or ‘difficulty’), and they constitute a sources and the contexts in which they were used. To ease basis for personalized content provision (see [7]). context sharing and matching, we propose the use of a sim- ple, common format to represent context information. The data gathered with the different user observation compo- 6.4 Context-aware, personalized information nents therefore has to be transformed using preprocessing retrieval services components to enhance the gathered data, and to map it to Based on the information we have captured and prepro- the common format. Context-aware information retrieval cessed in the presented way, information retrieval processes services can use the information provided in this way to re- can be significantly improved. Instead of just performing alize advanced and innovative functionalities, among others a search based on single terms and the classification of a by using an interactive context cockpit that allows the user resource, we can now use the following information: to intuitively control the matching process between contexts and resource descriptions. We are currently developing an according system to integrate existing approaches developed 1. multi-perspective descriptions of resources in several projects, and we expect the first prototype to be 2. the user’s current context available in autumn 2008. 3. descriptions of the contexts in which resources have been used before 8. ACKNOWLEDGMENTS The project EPOS has been supported by a grant from the German Federal Ministry of Education, Science, Re- Any context-aware information retrieval service must be able search, and Technology (FKZ ITW-01 IW C01). MyMory to determine the degree of similarity between two different is currently funded by the German Federal Ministry of Ed- contexts. In case of the context representation introduced in ucation and Research (FKZ 01 IW F01). CoMet is cur- section 5.1, this can be realized by adding up the calculated rently funded by the Stiftung Rheinland-Pfalz für Innova- similarities between each of the concepts (e.g., between top- tion. PEEK is funded by Hitachi Central Research Labora- ics, between persons, etc.) For this purpose, various existing tory, Tokyo. TaskNavigator is a joint project of DFKI and approaches can simply be reused. Ricoh Co. Ltd. 9. REFERENCES [1] A. Dengel. Six thousand words about multi-perspective personal document management. In Proceedings EDM, IEEE Int. Workshop on the Electronic Document Management in an Enterprise Computing Environment, Hong Kong, China. IEEE Computer Society, 2006. None. [2] C. Doctorow. Metacrap: Putting the torch to seven straw-men of the meta-utopia, 2001. Electronic document. Date of publication: August 26,2001. Retrieved March 14, 2007, from http://www.well.com/˜doctorow/metacrap.htm. [3] S. Downes. Resource profiles. Journal of Interactive Media in Education, 5, 2004. ISSN:1365-893X. [4] H. Holz, H. Maus, A. Bernardi, and O. Rostanin. From Lightweight, Proactive Information Delivery to Business Process-Oriented Knowledge Management. Journal of Universal Knowledge Management. Special Issue on Knowledge Infrastructures for the Support of Knowledge Intensive Business Processes, 0(2):101–127, 2005. [5] H. Holz, O. Rostanin, A. Dengel, T. Suzuki, K. Maeda, and K. Kanasaki. Task-based process know-how reuse and proactive information delivery in tasknavigator. In Proceedings: CIKM ’06. ACM Conference on Information and Knowledge Management, November 6-11, 2006, Arlington, USA, pages 522–531, November 2006. [6] A. Iskold. The Attention Economy: An Overview, 2007. Read/WriteWeb. Electronic document. Date of publication: March 01, 2007. Retrieved April 05, 2007, from http://www.readwriteweb.com/archives/ attention economy overview.php. [7] M. Memmel. Adaptivity with multidimensional learning objects. In G. Richards, editor, Proceedings of the World Conference on E-Learning in Corporate, Government, Healthcare, and Higher Education 2005, Vancouver, pages 2221–2230. AACE, 2005. [8] J. Nielsen. Designing Web Usability: The Practice of Simplicity. New Riders Publishing, 2000. [9] I. E. Reeps. Joy-of-Use: eine neue Qualität für interaktive Produkte. Master’s thesis, University of Konstanz, 2004. [10] S. Schwarz. A context model for personal knowledge management. In Proceedings of the 2nd International Workshop of Modelling and Retrieval of Context (MRC 2005) in conjunction with IJCAII 2005, pages 39–50, Edinburgh, jul 2005. [11] D. A. Wiley, M. Recker, and A. S. Gibbons. Getting axiomatic about learning objects. In D. A. Wiley, editor, The Instructional Use of Learning Objects: Online Version. 2000. Retrieved February 01, 2006 from http://reusability.org/axiomatic.pdf.