=Paper= {{Paper |id=Vol-233/paper-21 |storemode=property |title=Automatic Text Searching For Personal Photos |pdfUrl=https://ceur-ws.org/Vol-233/p43.pdf |volume=Vol-233 |dblpUrl=https://dblp.org/rec/conf/samt/OHareLCGJMOSU06 }} ==Automatic Text Searching For Personal Photos== https://ceur-ws.org/Vol-233/p43.pdf
        Automatic Text Searching For Personal Photos
                       Neil O’Hare, Hyowon Lee, Saman Cooray, Cathal Gurrin, Gareth J.F. Jones,
                  Jovanka Malobabic, Noel E. O’Connor, Alan F. Smeaton, and Bartlomiej Uscilowski



   Abstract— This demonstration presents the MediAssist proto-
type system for organisation of personal digital photo collections
based on contextual information, such as time and location of
image capture, and content-based analysis, such as face detection
and recognition. This metadata is used directly for identification
of photos which match specified attributes, and also to create text
surrogates for photos, allowing for text-based queries of photo
collections without relying on manual annotation. MediAssist
illustrates our research into digital photo management, show-
ing how a combination of automatically extracted context and
content-based information, together with user annotation and
traditional text indexing techniques, facilitates efficient searching
of personal photo collections.
   Index Terms— Personal Photo Management, Text Search, Con-
text


                          I. I NTRODUCTION
   In recent years digital photography has become increasingly                 Fig. 1.   The MediAssist Photo Management System
popular, resulting in the accumulation of large numbers of
personal digital photos. The MediAssist project [6] at the
Centre for Digital Video Processing (CDVP) addresses this                      identity of faces. This manually annotated dataset serves as a
situation by developing tools for the efficient searching of                   ground truth for the evaluation of content-based analysis tools,
photo archives. The system uses both automatically generated                   and also can be used to bootstrap semi-automatic tools (which
contextual metadata (eg. time, location) and content-based                     depend on a certain level of user annotation). All photos are
analysis tools (eg. face detection and recognition). Semi-                     indexed using both context and content-based analysis. Time
automatic annotation allows the user to interactively improve                  and location of photo capture are used to derive additional
the automatically generated annotations. Retrieval tools allow                 contextual information such as daylight status, weather and
for complex query formulation, in addition to the facility to                  indoor/outdoor classification [7]. A face detection system is
create simple text queries, based on these features. In previous               used to detect the presence of frontal-view faces [2]. Other
work using context for photo management, Davis et al [1]                       content-based tools used include body patch (the area under
utilised context to recommend recipients for sharing photos                    the face, modelling the clothes worn by the individual) fea-
taken with a context-aware phone, although their system does                   ture extraction [2], face recognition using ICA (Independent
not support retrieval. Naaman et al [4] use context-based                      Component Analysis) and building detection based on the
features for photo management, but they do not use content-                    distribution of edges in the image [7]. All of this information
based analysis tools, or facilitate semi-automatic annotation                  can prove very useful for searching photo collections.
or text-based searches. There is also a huge body of work
on content-based image retrieval [10], but it has been shown                      III. T HE M EDI A SSIST W EB D EMONSTRATOR S YSTEM
that users to not find this facility useful for personal photo
                                                                                  The MediAssist Web-based desktop interface allows users
management [9].
                                                                               to search through their personal photo collections using the
                                                                               contextual and content-based features described above. The
         II. C ONTENT AND C ONTEXT-AWARE P HOTO
                                                                               MediAssist system interface is shown in Fig. 1. Our earlier
                      O RGANISATION
                                                                               version of the MediAssist prototype supported filter-based
   The MediAssist photo archive contains over 17,000                           searching using the photo metadata features [6]. The new
location-stamped photos taken with a number of different                       version presented here has been extended to include free-text
camera models, including camera phones. Over 11,000 of                         ranked information retrieval functionality.
these have been manually annotated for a number of concepts,
including buildings, indoor/outdoor and the presence and
                                                                               A. Filter-Based Search
  All authors are members of the Centre for Digital Video Processing, Dublin     The system presents search options enabling a user to enter
City University. Alan F. Smeaton and Noel E. O’Connor are members of the
Adaptive Information Cluster, Dublin City University.                          details of desired locations, times, and advanced options such
  email: nohare@computing.dcu.ie                                               as people present, weather, light status, indoor/outdoor and
building/non-building. Semi-automatic person identification          shown effective methods of suggesting identities within photos
relies on a combination of automatic methods and manual an-          using context-based data [5]: in our ongoing research we are
notation as described below. Time filters enable powerful time-      exploring the combination of this type of approach with both
based queries, for example all photos taken in the evening, at       face recognition and body-patch matching.
the weekend, during the summer or within certain date ranges.
                                                                                               IV. C ONCLUSIONS
B. Text Search Interface                                                We have presented the MediAssist demonstrator system for
   For text-based search the automatic context and content-          context-aware management of personal digital photo collec-
based features are mined to construct text surrogates for all        tions. Automatically extracted features are supplemented with
photos, creating a textual equivalent of each feature (e.g. if the   semi-automatic annotation which allows the user to correct or
date is October 21th 2006, the text ‘october autumn saturday         add to the automatically generated annotations. The system
weekend 21 twenty-first 2006’ would form a surrogate textual         allows the user to formulate precise queries using content and
description). So an image might have the text ‘dublin ireland        context based features, or alternatively the user can formulate
september weekend afternoon person alan’ associated with it,         simple text queries, which are enabled without the need for
representing the features location, time, face detection and         manual annotation. We plan to leverage context metadata to
person annotation. We index the text document associated with        improve on the performance of content analysis tools [7], and
an image using a conventional text search engine based on            we will use combined context and content-based approaches
the standard BM25 information retrieval model [8]. We also           to identity annotation, based on face recognition, body-patch
create text surrogates for ‘events’ (see below) to allow for text-   matching and contextual information. We will also extend the
based searching of events in the ‘Event List’ view described         integration of person recognition to enable the user to query for
below. The system presents a text search box to allow for            a given individual, and the system will (in addition to returning
the quick and easy formulation of text queries based on the          the photos with confirmed annotations) return a ranked list of
content and context features described above. We will conduct        candidate photos which should contain this person.
an evaluation of this search interface in future work.
                                                                                             ACKNOWLEDGMENTS
C. Collection Browsing                                                 The MediAssist project is supported by Enterprise Ireland
                                                                     under Grant No CFTD-03-216. This work is partly supported
   Four different views are available to present the results of
                                                                     by Science Foundation Ireland under Grant No 03/IN.3/I361
searches. The default view, Event List, organises the filtered
photos into ‘events’ in which the photos are grouped together
                                                                                                  R EFERENCES
based on time proximity, by detecting large temporal gaps
between consecutively captured photos, similar to [3]. Each           [1] S. Ahern, S. King, and M. Davis. MMM2: mobile media metadata
                                                                          for photo sharing. In ACM Multimedia, pages 267–268, Singapore,
event is summarized by a label (location and date/time) and               November 2005.
five representative thumbnail photos selected based on the            [2] S. Cooray, N. O’Connor, C. Gurrin, G. Jones, N. O’Hare, and A. F.
query. Event Detail is composed of the full set of photos in              Smeaton. Identifying person re-occurrences for personal photo man-
                                                                          agement applications. In VIE 2006, pages 144–149, Bangalore, India,
an event, automatically organized into sub-events. Individual             September 2006.
Photo List is an optional view where the thumbnail size photos        [3] A. Graham, H. Garcia-Molina, A. Paepcke, and T. Winograd. Time as
are presented without any particular event grouping, but sorted           essence for photo browsing through personal digital libraries. In ACM
                                                                          Joint Conference on Digital Libraries, pages 326–335, Portland, USA,
by date/time. Photo Detail is an enlarged single photo view               July 2002.
presented when the user selects one of the thumbnail size pho-        [4] M. Naaman, S. Harada, Q. Wang, H. Garcia-Molina, and A. Paepcke.
tos in any of the above views. In all of the above presentation           Context data in geo-referenced digital photo collections. In ACM
                                                                          Multimedia, pages 196–203, New York, USA, October 2004.
options, each photo is presented with its associated automatic        [5] M. Naaman, R. B. Yeh, H. Garcia-Molina, and A. Paepcke. Leveraging
annotation information.                                                   context to resolve identity in photo albums. In ACM Joint Conference
                                                                          on Digital Libraries, pages 178–187, Denver, CO, USA, June 2005.
                                                                      [6] N. O’Hare, C. Gurrin, H. Lee, N. Murphy, A. F. Smeaton, and G. Jones.
D. Semi-Automatic Annotation                                              Digital photos: Where and when? In ACM Multimedia 2005, pages 261–
                                                                          262, Singapore, November 2005.
   MediAssist allows users to manually change or update any           [7] N. O’Hare, H. Lee, S. Cooray, C. Gurrin, G. Jones, J. Malobabic,
of the automatically tagged information for a single photo or             N. O’Connor, A. F. Smeaton, and B. Uscilowski. Mediassist: Using
                                                                          content-based analysis and context to manage personal photo collections.
for a group of photos. In Photo Detail view, the user can                 In CIVR 2006, pages 529–532, Tempe, AZ, July 2006.
highlight all detected faces in the photo and tidy up the results     [8] S. E. Robertson, S. Walker, , S. Jones, M. M. Hancock-Beaulieu, and
of the automatic detection by removing false detections or                M. Gatford. Okapi at TREC-3. In Proceedings of the Third Text
                                                                          REtrieval Conference (TREC-3), pages 109–126, NIST, November 1995.
adding missed faces. The system uses a body patch feature (i.e.       [9] K. Rodden and K. R. Wood. How do people manage their digital
a feature modeling the clothes worn by a person) combined                 photographs? In CHI 2003, pages 409–416, Florida, USA, April 2003.
with a face recognition feature to suggest names for detected        [10] A. Smeulders, M. Worring, S. Santini, A. Gupta, and A. Jain. Content-
                                                                          based image retrieval at the end of the early years. IEEE Transactions
faces: the suggested name for an unknown face is the known                on Pattern Analysis and Machine Intelligence, 22(12):1349–1380, 2000.
face with the most similar body patch and face [2]. The user
can confirm the system choice or choose from a shortlist of
suggested names, again based on similarity. Other work has