DYNAMIC TERM WEIGHTING FOR PERSONAL PHOTO RETRIEVAL Yi Chen Centre for Digital Video Processing, Dublin City University, Dublin 9, Ireland ABSTRACT1 must reply on browsing the corresponding folders to find the required photos if the folders are labelled with time, location Personal photo retrieval is different from many search tasks or event name, etc. This suggests that people may often have in that all the targets are either known to the user, or they are better memory about contextual information related to about something the owner has seen. For this reason, events than the details of the photos. Thus, we assume that if generating queries for searching personal photos will more the photos are also annotated with contextual information likely rely on an individual’s memory. In a pilot study of that people remember well, the IR may be more efficient. In Personal file re-finding, the results suggested a change of fact, some standard forms of context data are already been retrieval efficiency over time for different types of queries integrated into Personal Retrieval Systems, such as due to recall reliability. While auto-annotating of the photos MediAssist [2]. [3] suggested that well remembered features with contextual information, we are seeking to develop a include location (indoor/outdoor), season, year, people, query weighting strategy, which takes the recall reliability weather, etc., but not the textual information in the photos. into account, and give feedback to the user based on the searching queries to improve the searching efficiency. 2. PILOT STUDIES In our previous pilot studies [4,5], the participant recorded all her computer activities as well as corresponding 1. INTRODUCTION contextual information such as her personal location, for a two month period, and indexed them with the Lucene search The prevalence of digital cameras (including camera mobile engine[6]. She generated 30 scenarios of information re- phones) and advances in storage devices, while enabling the finding from her past two month’s experiences. Queries of recording of every memorable moment in our lives, is content only and content with all possible combinations of leading to huge amount of personal digital image data, context were tested right after her data collecting period and which can be difficult for searching. Although techniques of six month later, in which her correctly recalled information content based retrieval for static images are becoming were used. Both results (right after data collecting and six increasingly sophisticated, they may not suffice the needs of months later) suggested that the combination of correctly efficient retrieval in such large collections, due to the varied recalled context information improves searching efficiency, image quality and severe redundancy of content, and most and the advantage was greater six months later. The recall importantly, that people do not always remember what the results showed that she had better memory about perceptual exact contents are, even of photos taken by themselves [1]. information such as location, period of the day (e.g. One possible solution is to do information retrieval (IR) morning, evening), weather etc., but not the textual basing on the photos’ metadata (e.g. annotations) which the (conceptual) information such as the hour or day. Also, her user can remember. The reason is that searching for personal recalled contents are significantly less effective in searching, photos can rather be viewed as information re-finding, as as evaluated by Lucene [5]. The drop of content-based query opposed to general information seeking in an unknown searching efficiency implies that the key content the user collection, like the World Wide Web. It largely depends on recalls may differ over time. the individual’s memory about the photos. For example, We assume that if the document can be annotated with while we are looking for certain photos, we usually have what the user remembers at the time of searching, the some recollection of the occasion in which they were taken. retrieval effectiveness can be maintained. We aim to develop In cases where no assistant retrieval tool is provided, we an algorithm, which can continuously update the status of the metadata, so that when the recalled metadata is entered as query, it can be weighted dynamically according to the This work is supervised by Dr. Gareth J.F. Jones and funded by estimated recall reliability as well as other traditional IR grant CMS023 under the Science Foundation Ireland Research methods. Above all, annotations are required for all the Frontiers Programme 2006. photos to apply the algorithm. 3. DATA COLLECTION Manual annotation of large volumes of photos is unrealistic. Thus, we need to do this automatically by capturing context information which can be synchronized with the photos. For example: 1) Time and Date can automatically be embedded into the photos when they were created. 2) Location can be recorded by GPS devices. 3) Weather and light status can be determined by combining time and location information at the time of creation [2]. 4) Emotional status can be roughly interpreted from Figure1. IR system with query feedback wearable biometric sensors such as heart rate monitor and the BodyMedia SenseWear armband. Simple semantic processing will be applied to the entered 5) Bluetooth tracking devices allow for the detection of queries, mainly expanding query words to arrays of other nearby Bluetooth devices. Thus it enables the synonyms from the database. The Query evaluation step will recording of objects or people with Bluetooth devices (such estimate the reliability of recalled query features based on as Mobile phones) at the time of photo taking. the memory model. In the first stage of our system 6) Content tagging may mainly rely on third part content developing, the searching interface will allow the users to analysis technologies such as face detection. judge the reliability of their recall for each query (very sure, guessed, etc.). This will also combined with traditional IR 4. DYNAMIC WEIGHTING SYSTEM methods to provide feedbacks to the user about the queries’ efficiency as well as potential possible queries or We propose to develop a dynamic weighting system, combination of queries based on the links, and leave the structuring the data partly based on a memory model, so that final decision to the user to refine their queries. the recall reliability can be evaluated, with feedback to the The above proposed system still needs a series of user user to generate more efficient queries. studies which will be based on our on going data collection. 4.1. Structuring and Weighting 6. REFERENCES [1] Sellen, A. J., Fogg, A “Do life-logging technologies support Information processing theories have argued that the human memory for the past?: an experimental study using sensecam.” In memory exists in Associated Networks, that nodes of Proceedings of the SIGCHI Conference on Human Factors in remembered information are linked to each other so that Computing Systems. CHI '07, ACM, New York, 2007, pp. 81-90. they can be retrieved tracing from the links [7]. In our model, we propose to create links from attributes to items [2] N. O’Hare, H. Lee et. al, “MediAssist: Using Content-Based and links between attributes or same level items. We assume Analysis and Context to Manage Personal Photo Collections,” in that the same level links are usually created based on time CIVR2006, 2006, pp. 529–532. proximity (belonging to the same events), but which types of attributes tends to link with each other is yet to be explored. [3] Naaman, M., et. al. 2004. Context data in geo-referenced digital photo collections. In Proceedings of the 12th Annual ACM Instead of weighting independent nodes like the Page international Conference on Multimedia. New York, NY, 196-203. rank algorithm does, this model weights the links’ strength, which estimate the likelihood of information on one end [4] Fuller M, Kelly L and Jones G. “Applying Contextual Memory being retrieved if cued with the other end. Based on memory Cues for Retrieval from Personal Information Archives.” PIM and learning theories, we propose to integrated several 2008 - Proceedings of Personal Information Management, factors into an algorithm, such as time lapse from the last Workshop at CHI 2008, 2008 time the two nodes were associated, frequency of occurrence of the link, and encoding quality calculated from various [5] Kelly L, Chen Y, Fuller M, and Jones G “A Study of factors [7]. The links’ weight automatically updates (e.g. Remembered Context for Information Access from Personal Digital Archives”. In Proceedings of IIiX2008, London, 2008 when triggered by encoding of new items). [6]Gospodnetic, Otis . “Lucene in Action”. Manning Publications. 4.3. Feedback Mechanism for Photo Searching System 2004 The retrieval system is based on the above structure, [7] Jesse E. Purdy, M. r. M., Bennett L. Schwartz, William C inheriting traditional IR strategies, and implemented with a Gordon (2001). Learning and Memory. California : Wadsworth, query evaluation and feedback mechanism. Belmont