=Paper= {{Paper |id=Vol-379/paper-14 |storemode=property |title=Dynamic Term Weighting for Personal Photo Retrieval |pdfUrl=https://ceur-ws.org/Vol-379/paper6.pdf |volume=Vol-379 }} ==Dynamic Term Weighting for Personal Photo Retrieval== https://ceur-ws.org/Vol-379/paper6.pdf
              DYNAMIC TERM WEIGHTING FOR PERSONAL PHOTO RETRIEVAL

                                                              Yi Chen

                 Centre for Digital Video Processing, Dublin City University, Dublin 9, Ireland

                        ABSTRACT1                                   must reply on browsing the corresponding folders to find the
                                                                    required photos if the folders are labelled with time, location
Personal photo retrieval is different from many search tasks        or event name, etc. This suggests that people may often have
in that all the targets are either known to the user, or they are   better memory about contextual information related to
about something the owner has seen. For this reason,                events than the details of the photos. Thus, we assume that if
generating queries for searching personal photos will more          the photos are also annotated with contextual information
likely rely on an individual’s memory. In a pilot study of          that people remember well, the IR may be more efficient. In
Personal file re-finding, the results suggested a change of         fact, some standard forms of context data are already been
retrieval efficiency over time for different types of queries       integrated into Personal Retrieval Systems, such as
due to recall reliability. While auto-annotating of the photos      MediAssist [2]. [3] suggested that well remembered features
with contextual information, we are seeking to develop a            include location (indoor/outdoor), season, year, people,
query weighting strategy, which takes the recall reliability        weather, etc., but not the textual information in the photos.
into account, and give feedback to the user based on the
searching queries to improve the searching efficiency.                                  2. PILOT STUDIES

                                                                    In our previous pilot studies [4,5], the participant recorded
                                                                    all her computer activities as well as corresponding
                    1. INTRODUCTION                                 contextual information such as her personal location, for a
                                                                    two month period, and indexed them with the Lucene search
The prevalence of digital cameras (including camera mobile          engine[6]. She generated 30 scenarios of information re-
phones) and advances in storage devices, while enabling the         finding from her past two month’s experiences. Queries of
recording of every memorable moment in our lives, is                content only and content with all possible combinations of
leading to huge amount of personal digital image data,              context were tested right after her data collecting period and
which can be difficult for searching. Although techniques of        six month later, in which her correctly recalled information
content based retrieval for static images are becoming              were used. Both results (right after data collecting and six
increasingly sophisticated, they may not suffice the needs of       months later) suggested that the combination of correctly
efficient retrieval in such large collections, due to the varied    recalled context information improves searching efficiency,
image quality and severe redundancy of content, and most            and the advantage was greater six months later. The recall
importantly, that people do not always remember what the            results showed that she had better memory about perceptual
exact contents are, even of photos taken by themselves [1].         information such as location, period of the day (e.g.
One possible solution is to do information retrieval (IR)           morning, evening), weather etc., but not the textual
basing on the photos’ metadata (e.g. annotations) which the         (conceptual) information such as the hour or day. Also, her
user can remember. The reason is that searching for personal        recalled contents are significantly less effective in searching,
photos can rather be viewed as information re-finding, as           as evaluated by Lucene [5]. The drop of content-based query
opposed to general information seeking in an unknown                searching efficiency implies that the key content the user
collection, like the World Wide Web. It largely depends on          recalls may differ over time.
the individual’s memory about the photos. For example,                    We assume that if the document can be annotated with
while we are looking for certain photos, we usually have            what the user remembers at the time of searching, the
some recollection of the occasion in which they were taken.         retrieval effectiveness can be maintained. We aim to develop
In cases where no assistant retrieval tool is provided, we          an algorithm, which can continuously update the status of
                                                                    the metadata, so that when the recalled metadata is entered
                                                                    as query, it can be weighted dynamically according to the
This work is supervised by Dr. Gareth J.F. Jones and funded by      estimated recall reliability as well as other traditional IR
grant CMS023 under the Science Foundation Ireland Research          methods. Above all, annotations are required for all the
Frontiers Programme 2006.                                           photos to apply the algorithm.
                 3. DATA COLLECTION

Manual annotation of large volumes of photos is unrealistic.
Thus, we need to do this automatically by capturing context
information which can be synchronized with the photos. For
example:
1) Time and Date can automatically be embedded into the
photos when they were created.
2) Location can be recorded by GPS devices.
3) Weather and light status can be determined by combining
time and location information at the time of creation [2].
4) Emotional status can be roughly interpreted from                          Figure1. IR system with query feedback
wearable biometric sensors such as heart rate monitor and
the BodyMedia SenseWear armband.                                  Simple semantic processing will be applied to the entered
5) Bluetooth tracking devices allow for the detection of          queries, mainly expanding query words to arrays of
other nearby Bluetooth devices. Thus it enables the               synonyms from the database. The Query evaluation step will
recording of objects or people with Bluetooth devices (such       estimate the reliability of recalled query features based on
as Mobile phones) at the time of photo taking.                    the memory model. In the first stage of our system
6) Content tagging may mainly rely on third part content          developing, the searching interface will allow the users to
analysis technologies such as face detection.                     judge the reliability of their recall for each query (very sure,
                                                                  guessed, etc.). This will also combined with traditional IR
         4. DYNAMIC WEIGHTING SYSTEM                              methods to provide feedbacks to the user about the queries’
                                                                  efficiency as well as potential possible queries or
We propose to develop a dynamic weighting system,                 combination of queries based on the links, and leave the
structuring the data partly based on a memory model, so that      final decision to the user to refine their queries.
the recall reliability can be evaluated, with feedback to the         The above proposed system still needs a series of user
user to generate more efficient queries.                          studies which will be based on our on going data collection.

4.1. Structuring and Weighting                                                         6. REFERENCES

                                                                  [1] Sellen, A. J., Fogg, A “Do life-logging technologies support
Information processing theories have argued that the human
                                                                  memory for the past?: an experimental study using sensecam.” In
memory exists in Associated Networks, that nodes of               Proceedings of the SIGCHI Conference on Human Factors in
remembered information are linked to each other so that           Computing Systems. CHI '07, ACM, New York, 2007, pp. 81-90.
they can be retrieved tracing from the links [7]. In our
model, we propose to create links from attributes to items        [2] N. O’Hare, H. Lee et. al, “MediAssist: Using Content-Based
and links between attributes or same level items. We assume       Analysis and Context to Manage Personal Photo Collections,” in
that the same level links are usually created based on time       CIVR2006, 2006, pp. 529–532.
proximity (belonging to the same events), but which types of
attributes tends to link with each other is yet to be explored.   [3] Naaman, M., et. al. 2004. Context data in geo-referenced
                                                                  digital photo collections. In Proceedings of the 12th Annual ACM
     Instead of weighting independent nodes like the Page
                                                                  international Conference on Multimedia. New York, NY, 196-203.
rank algorithm does, this model weights the links’ strength,
which estimate the likelihood of information on one end           [4] Fuller M, Kelly L and Jones G. “Applying Contextual Memory
being retrieved if cued with the other end. Based on memory       Cues for Retrieval from Personal Information Archives.” PIM
and learning theories, we propose to integrated several           2008 - Proceedings of Personal Information Management,
factors into an algorithm, such as time lapse from the last       Workshop at CHI 2008, 2008
time the two nodes were associated, frequency of occurrence
of the link, and encoding quality calculated from various         [5] Kelly L, Chen Y, Fuller M, and Jones G “A Study of
factors [7]. The links’ weight automatically updates (e.g.        Remembered Context for Information Access from Personal
                                                                  Digital Archives”. In Proceedings of IIiX2008, London, 2008
when triggered by encoding of new items).
                                                                  [6]Gospodnetic, Otis . “Lucene in Action”. Manning Publications.
4.3. Feedback Mechanism for Photo Searching System                2004

The retrieval system is based on the above structure,             [7] Jesse E. Purdy, M. r. M., Bennett L. Schwartz, William C
inheriting traditional IR strategies, and implemented with a      Gordon (2001). Learning and Memory. California : Wadsworth,
query evaluation and feedback mechanism.                          Belmont