Enhancing the Presentation of Multimedia using Extracted Semantics Hyowon Lee Guest Speech at 1st SEMPS Workshop (6 Dec 2006) Centre for Digital Video Processing Dublin City University Overview • Centre & my role • Selection of multimedia applications and their presentation design issues • Some observations – Different applications, different design decisions – Applying general design principles 1 Centre for Digital Video Processing at Dublin City University • Developing automatic indexing/retrieval tools for managing large amount of image/video information – Object/Face Detection & Tracking in Video – Audio & Video Event Detection – Video Delivery on Mobile Devices – Large-scale Distributed Web Image Search – Search Engine Design for Collaborative Video Retrieval – Hardware Accelerator Design for MPEG-4 Mobile Platform – Personalisation & Recommendation for Video – Synergy between automatic & manual indexing – Fusion of multi-modal query results My Role: Usability & User Issues • Understand the research & development of Image/Video indexing/retrieval tools within the Centre • Think how these could be exploited – Envision the use: scenarios & future system use – Prototyping user-interfaces – Deploy (if possible) – User testing: monitor usage & guide future development 2 MediAssist (Personal Photo Manager) Mobile Applications Físchlár-News Movie Browser CCTV Search Development System (interaction design + Físchlár-Nursing software engineering) BBC Rushes Físchlár-News Search System SenseCam Interactive Físchlár-TV Object-based RF Browser system v2 TRECVid03 TRECVid04 TableTop Video TRECVid02 Object-based RF Interactive Search Interactive Search Search System Interactive Search system v1 (TRECVid05) System System System Image-Image Time Shot Boundary News Story Similarity RF Automatic Detection Face Detection Segmentation Personal Photo Organisation Object Detection Keyframe Building Detection & Tracking Passive Photo Technology Capture Extraction Indoor/Outdoor Development for Object-Object • Event Detection Cityscape/Landscape automatic Similarity RF Scene Detection • Unique Event extraction of Advert Detection in Movies Determination Video syntactic & Recommendation • Landmark Image Pedestrian Selection semantic features Detection in image/video Sports Summarisation Hardware acceleration for video processing Start of video End of video Original video shot boundary detection Camera shot Keyframe Extraction 3 Físchlár-News Archive • Online archive of daily RTE1 9pm TV news • Automatic video indexing: News Story Segmentation, based on: – Anchorperson detection (by shot clustering) – Face detection – Advertisement detection – Shot length – Activity measure Broadcast Story-based news TV news browsing, searching, streamed playback and… …recommendation MPEG-1 encoding News story linkage analysis Oracle Web Video application An MPEG-1 encoded daily 9 Server o’clock news program (30 min) User Shot Boundary profile Detection News story database Shot segmented program Story Segmentation - SVM Advertisement (Support Vector Machine) with: Detection • Speech vs. music discrimination • Anchorperson shot clustering • Face detection • Shot length cue Shot segmented, advert Story segmented program detected program • Activity measure 4 User Evaluation of Físchlár-News: An Automatic Broadcast News Delivery System. Lee H, Smeaton A.F, O'Connor N and Smyth B. TOIS - ACM Transactions on Information Systems, 24(2), 2006. Automatic news story segmentation as main back-end => story-based browsing, searching, recommendation Deployment effort... User studies to refine the UI 5 Some Factors in its UI Design • Application specific... Daily update, up-to-dateness of news => Calendar Anchorperson’s 2-line summary statement as story summary text Average #stories per day (10- 20 only) => Linear list most effective (no drop-down box or pagination necessary) 6 Some Factors in its UI Design • General design principles, guidelines, graphic design, web design, etc. – knowledge & experience I have in general – E.g. Overview first, details on demand Day list of the months (calendar) Story list of the day Shot list of the story Playback (full detail) 7 Some Factors in its UI Design • General design principles, guidelines, graphic design, web design, etc. – knowledge & experience I have in general – E.g. Overview first, details on demand – E.g. Visual consistency 8 Whenever list of stories appears... ... to make obvious what a piece of presentation on the screen represents and doesn’t require interpretation effort 9 MediAssist (Personal Photo Manager) Mobile Applications Físchlár-News Movie Browser CCTV Search Development System (interaction design + Físchlár-Nursing software engineering) BBC Rushes Físchlár-News Search System SenseCam Interactive Físchlár-TV Object-based RF Browser system v2 TRECVid03 TRECVid04 TableTop Video TRECVid02 Object-based RF Interactive Search Interactive Search Search System Interactive Search system v1 (TRECVid05) System System System Image-Image Time Shot Boundary News Story Similarity RF Automatic Detection Face Detection Segmentation Personal Photo Organisation Object Detection Keyframe Building Detection & Tracking Passive Photo Technology Capture Extraction Indoor/Outdoor Development for Object-Object • Event Detection Cityscape/Landscape automatic Similarity RF Scene Detection • Unique Event extraction of Advert Detection in Movies Determination Video syntactic & Recommendation • Landmark Image Pedestrian Selection semantic features Detection in image/video Sports Summarisation Hardware acceleration for video processing Físchlár-TRECVid2004: Combined Text- and Image-Based Searching of Video Archives. O'Connor N, Lee H, Smeaton A.F, Jones G, Cooke E, Le Borgne H and Gurrin C. ISCAS 2006 - IEEE International Symposium on Circuits and Systems, Kos, Greece, 21-24 May 2006. 10 Keyframe as main visual cue in interaction (browse search result, copy to query panel, save, etc. From left to right... natural progression Potential screen complexity – use of main plain vs. background plain, round edges, and corresponding buttons MediAssist (Personal Photo Manager) Mobile Applications Físchlár-News Movie Browser CCTV Search Development System (interaction design + Físchlár-Nursing software engineering) BBC Rushes Físchlár-News Search System SenseCam Interactive Físchlár-TV Object-based RF Browser system v2 TRECVid03 TRECVid04 TableTop Video TRECVid02 Object-based RF Interactive Search Interactive Search Search System Interactive Search system v1 (TRECVid05) System System System Image-Image Time Shot Boundary News Story Similarity RF Automatic Detection Face Detection Segmentation Personal Photo Organisation Object Detection Keyframe Building Detection & Tracking Passive Photo Technology Capture Extraction Indoor/Outdoor Development for Object-Object • Event Detection Cityscape/Landscape automatic Similarity RF Scene Detection • Unique Event extraction of Advert Detection in Movies Determination Video syntactic & Recommendation • Landmark Image Pedestrian Selection semantic features Detection in image/video Sports Summarisation Hardware acceleration for video processing 11 Original video Composited video Video object planes A unit representation shows: - the unit’s video content summary, - all the detected Objects & Events and link possibility are indicated OBJECT 1 OBJECT 2 BACKGRD. … I can’t imagine even such an amiable ladies as my great grandmother could have been so gracious as to overlook one’s house guest, shooting one through the face… OBJECT 1 OBJECT 2 BACKGRD. … I can’t imagine even such an amiable ladies as my great grandmother could have been so gracious as to overlook one’s house guest, shooting one through the face… OBJECT 1 OBJECT 2 BACKGRD. … I can’t imagine even such an amiable ladies as my great grandmother could have been so gracious as to overlook one’s house guest, shooting one through the face… 12 [unit]s with similar text/image content Link to… [unit] … and here is the ASR or Closed Caption text that … and here is the ASR or Closed Caption text that AS Ror … and here is the ASR orClosed Closed Capti ontext Caption textthat that R or Closed Capti on text that Link to… [unit]s with linked OBJECT 1 Link to… … and here is the ASR or Closed Caption text that Link to… … and here is the ASR or Closed Caption text that ASR or Closed Caption text that … and here is the ASR or Cl osed [unit]s with linked OBJECT 3 Caption text that osed Caption text that … and here is the ASR or Closed Caption text that … and here is the ASR or Closed [unit]s with linked OBJECT 2 Caption text that … and here is the ASR or Closed Caption text that … and here is the ASR or Closed Caption text that … and here is the ASR or Closed Captio n text that is the ASR or Closed Captio n text that Select Object 1 from this Unit … and here is the ASR or Closed Caption text that Result is a set of Units that contain Object 1, Object 2 and Event 1 + together in the Unit Select Object 2 from this Unit … and here is the ASR or Closed Caption text that Caption text that .Caption text … and here is the ASR or Closed Caption that Caon text that… text that Querying + … and here is the ASR or Closed Caption text that Caption text that .Caption text that Caon text that… Select Event 1 from this Unit Composite Unit … and … and here here isis the the ASR AS RororClosed ClosedCaption Capti on text text that that R or Closed Capti on text … and here is the ASR or Closed Caption that text that Caption text that .Caption text that Caon text that… … and here is the ASR or Closed Caption text that Caption text that .Caption text that Caon text that… 13 Search result Querying Composition querying + Search result Selecting 14 Start with a micro-interaction scheme (using buttons that represent objects), then develop it further in an integrated interface Detected objects have no labelling (meaning) but only as spatial region or blob on the keyframe Applying this interaction scheme to object- based Relevance Feedback applications... User-Interface to a CCTV Video Search System. Lee H, Smeaton A.F, O'Connor N and Murphy N. ICDP 2005 - IEE International Symposium on Imaging for Crime Detection and Prevention, London, U.K., 7-8 June 2005. 15 Using the Unit Representation (enabling object-based interaction) at micro-level interaction (Conventional) Relevance Feedback idea Search result showing not only matched list of objects but also geographical map summarising and highlighting the matched object’s route (note the application’s main purpose: chase a suspect) Observations • Considerable amount of time & effort in manually hard-designing UIs – For example... 16 Design taking 4 months, 4 iterative refinements: starting with pen-and-paper sketches, then Photoshop sketches, discussing with technical team then me re- sketching accordingly Observations • Huge amount of time & effort in manually hard-designing – For example... • Different situations require different specific design decisions – For example, presentation of keyframes 17 Linear list images and text Images overlayed with text 18 Image in different sizes Which layout on which situation? 19 Observations • Huge amount of time & effort in manually hard- designing – For example... • Different situations require different specific design decisions – For example, presentation of keyframes • Dream of automatic multimedia presentation generation: designer’s application knowledge & capability in design decisions in specific situations Thank you 20