Background

MIDB: A Web-Based Film Annotation Tool

Azzam Althaga

aazzam@seas.upenn.edu 0 1 2 0 Arnav Jhala North Carolina State Univ , USA 1 Hui-Yin Wu North Carolina State Univ. huiyin , USA 2 Univ. of Pennsylvania , USA

We present the Movie Insights Database (MIDB) annotation tool that allows for a web-based upload and annotation of lms for e cient lmdata collection. MIDB is light-weight and can be installed on a web server for easy access. The tool provides automated scene boundary detection, an interface for de ning the annotation language, upload and download of new or existing annotations, shot boundary adjustment, and tools for quick and e cient shot-by-shot annotation. Our primary motivation for the development of this tool is to create an easily accessible platform for video analysis that allows AI assisted annotation.

Background

For almost 15 years, the Cinemetrics annotation tool and database (www.cinemetrics.lv) has collected over 19 thousand individual annotations on the cutting tempo and cinematographic features of lms from all ages. The tool is particularly popular due to two reasons: (1) it is easy to use directly in parallel with a lm playing on the side, allowing the user to quickly tag shots and transitions using the keyboard, and (2) it is widely accessible with both desktop and online applications that o er direct uploading and recording to its database. The database and tool is widely popular among lm analysts to gain statistics for lm pacing.

The recent appearance of lm annotation tools such as Advene [AP05], Insight [MWS+15], and Anvil [Kip12] shows a growing trend of drawing observations on lm data from a data scientist point of view. The tools provide much more advanced functionalities such as accurate timelines, automatic face detection, 3D head models, color analysis, etc. with the goal of increasing both the quality and the dimensions of the collected data. However, these tools have not been broadly accepted by the lm community, and compared to Cinemetrics, have much fewer users and thus fewer annotations. This can be attributed to the fact that some of these tools require time, and possibly technical knowledge, to install and to familiarize with their various functionalities.

To this end, we propose the MIDB, which was inspired by the Insight annotation tool. It has integrated shot boundary detection using mpeg , intuitive navigation tools (by frame, by shot, and video player controls), and customizable side panels for specifying annotation tags. Moreover, for accessibility, MIDB is written completely in javascript, which makes it easy to expand its functionality with plugins, and can be set up on a web server or installed independently on a personal computer. 2

MIDB Tool

In this section we introduce the architecture, work ow, and interface design of the MIDB tool. The tool uses node:js for backend, and React framework for the front end interface.

Our chosen tool for shot boundary detection is probe, which is a multimedia analysis toolkit that is integrated and makes use of the mpeg libraries.

Other plugins that are currently under development for MIDB include a trained model for automatic shot size assignment, and automatic actor detection. In the future, we also expect to experiment plugins for more advanced object and camera movement to further automated the annotation process.

The user annotations are saved and immediately backed up in json format that is supported by most modern programming languages.

The overview of the framework can be seen in Figure 2. Our user interface resembles the Insight annotation tool in its design. Using the React framework, the video player is smoothly integrated with frame-level and shot-level player controls, a timeline with shot segmentation, and a panel displaying the categories for annotation labels. A navigation bar on top allows the uploading of a video or an existing json annotation, and the download of annotations. Moreover, popup menus are designed to allow customization of the labels for annotation, and the interface scales and adjusts according to di erent screen sizes, such as by stowing the menu bar.

The tool is shown in Figure 1, showing the example annotation of the shot size (i.e. how close the camera is to the actor) in the labels panel. The side-by-side design of the video, synchronized with the display of annotation labels and timeline, make it possible to be highly precise in terms of attributing the labels to a speci c shot. It also makes changing and removing annotations easy, and all user operations are immediately recorded.

The work ow of the tool is all seamlessly connected in one window, and is depicted in Figure 3. First, the user can choose either to upload a new video, upload an existing json annotation, or select a previously-saved annotation project. In case of a new video, the tool will process the video in order to detect shot boundaries. Once nished, the interface will be fully loaded with the video, the shot timeline, and annotation labels.

After the video has been processed, the user can then perform a number of operations such as modifying the annotation labels, navigating the video through the video player controls or directly on the timeline, and applying or removing labels to each shot. Finally, the annotation le can be downloaded in json format, and it is also automatically backed up onto the server. A sample annotation of a shot is formatted as such: 1 f " a n n o t a t i o n s " : [ 2 f " s t a r t _ t i m e " : 0 , " e n d _ t i m e " : 9 . 0 8 g , 3 ... 4 ... 5 f " s t a r t _ t i m e " : 2 4 . 9 2 , " e n d _ t i m e " : 2 6 , 6 " l a b e l s " : f 7 " Shot size " : [ " MCU " ] , 8 " S c e n e Type " : [ " a c t i o n " ] 9 g g , 10 f " s t a r t _ t i m e " : 2 6 , " e n d _ t i m e " : 2 6 . 8 4 , 11 " l a b e l s " : f 12 " shot - size " : [ " medium - c l o s e u p " ] 13 14 15 16 17 18 19 g ] g g g , ... " l a b e l s " : f " s p e c i a l " : [ " over - the - s h o u l d e r " , " c o w b o y " , " s i l h o u e t t e " ] , " Shot size " : [ " MS " , " MLS " , " LS " , " MCU " , " CU " ] , " S c e n e Type " : [ " a c t i o n " , " d i a l o g u e " ]

As an improvement from existing annotation tools, MIDB has integrated shot boundary detection, and no limitation of the video length. The annotation labels can be customized directly using the tool, requiring no programming experience whatsoever, and for advanced users, the javascript framework makes it easy to design and add additional functionality through plugins. 3

Future Work

Our goal is to publish MIDB and collected data on a server for public or limited academic use, and to eventually establish a high quality lm dataset that can be used for virtual cinematography, lm analysis, and data science. To this end, the next steps to this project involve creating an interface for user-designed plugins such as automatic face detection or shot-type assignment, integrating a selection of public domain lms from Internet Archive (https://archive.org), and running pilot studies on the tool's ease of use.

Another goal is to establish a set of lm practice and theory validated vocabulary, and allow the selection of these vocabulary sets to be the default annotation labels. This is also with regard to the recent development of lm analysis and directing languages for virtual cinematography [WC15][RGB13]. [AP05]

Olivier Aubert and Yannick Prie. Advene: active reading through hypervideo. Proceedings of ACM Hypertext'05, pages 235{244, September 2005.

Michael Kipp. Multimedia annotation, querying and analysis in anvil, 09 2012. [RGB13]

Remi Ronfard, Vineet Gandhi, and Laurent Boiron. The prose storyboard language: A tool for annotating and directing movies. Chania, Crete, Greece, 05 2013.

[MWS+15] Billal

Merabti

, hui-yin Wu , Cunka Bassirou Sanokho, Quentin Galvane, Christophe Lino, and Marc Christie . Insight: An annotation tool and format for lm analysis . In Eurographics Workshop on Intelligent Cinematography and Editing , page 57 , Zurich , Switzerland, May 2015 . The Eurographics Association .

Hui-Yin Wu and Marc

Christie . Stylistic Patterns for Generating Cinematographic Sequences . In 4th Workshop on Intelligent Cinematography and Editing Co-Located w/ Eurographics 2015 , pages 47 { 53 , Zurich , Switzerland, May 2015 . Eurographics Association. The de nitive version is available at http://diglib.eg.org/.