<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MIDB: A Web-Based Film Annotation Tool</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Azzam Althaga</string-name>
          <email>aazzam@seas.upenn.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Arnav Jhala North Carolina State Univ</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Hui-Yin Wu North Carolina State Univ. huiyin</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Univ. of Pennsylvania</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present the Movie Insights Database (MIDB) annotation tool that allows for a web-based upload and annotation of lms for e cient lmdata collection. MIDB is light-weight and can be installed on a web server for easy access. The tool provides automated scene boundary detection, an interface for de ning the annotation language, upload and download of new or existing annotations, shot boundary adjustment, and tools for quick and e cient shot-by-shot annotation. Our primary motivation for the development of this tool is to create an easily accessible platform for video analysis that allows AI assisted annotation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Background</title>
      <p>For almost 15 years, the Cinemetrics annotation tool and database (www.cinemetrics.lv) has collected over 19
thousand individual annotations on the cutting tempo and cinematographic features of lms from all ages. The
tool is particularly popular due to two reasons: (1) it is easy to use directly in parallel with a lm playing on the
side, allowing the user to quickly tag shots and transitions using the keyboard, and (2) it is widely accessible with
both desktop and online applications that o er direct uploading and recording to its database. The database
and tool is widely popular among lm analysts to gain statistics for lm pacing.</p>
      <p>The recent appearance of lm annotation tools such as Advene [AP05], Insight [MWS+15], and Anvil [Kip12]
shows a growing trend of drawing observations on lm data from a data scientist point of view. The tools provide
much more advanced functionalities such as accurate timelines, automatic face detection, 3D head models, color
analysis, etc. with the goal of increasing both the quality and the dimensions of the collected data. However,
these tools have not been broadly accepted by the lm community, and compared to Cinemetrics, have much
fewer users and thus fewer annotations. This can be attributed to the fact that some of these tools require time,
and possibly technical knowledge, to install and to familiarize with their various functionalities.</p>
      <p>To this end, we propose the MIDB, which was inspired by the Insight annotation tool. It has integrated shot
boundary detection using mpeg , intuitive navigation tools (by frame, by shot, and video player controls), and
customizable side panels for specifying annotation tags. Moreover, for accessibility, MIDB is written completely
in javascript, which makes it easy to expand its functionality with plugins, and can be set up on a web server or
installed independently on a personal computer.
2</p>
    </sec>
    <sec id="sec-2">
      <title>MIDB Tool</title>
      <p>In this section we introduce the architecture, work ow, and interface design of the MIDB tool.
The tool uses node:js for backend, and React framework for the front end interface.</p>
      <p>Our chosen tool for shot boundary detection is probe, which is a multimedia analysis toolkit that is integrated
and makes use of the mpeg libraries.</p>
      <p>Other plugins that are currently under development for MIDB include a trained model for automatic shot
size assignment, and automatic actor detection. In the future, we also expect to experiment plugins for more
advanced object and camera movement to further automated the annotation process.</p>
      <p>The user annotations are saved and immediately backed up in json format that is supported by most modern
programming languages.</p>
      <p>The overview of the framework can be seen in Figure 2.
Our user interface resembles the Insight annotation tool in its design. Using the React framework, the video
player is smoothly integrated with frame-level and shot-level player controls, a timeline with shot segmentation,
and a panel displaying the categories for annotation labels. A navigation bar on top allows the uploading of a
video or an existing json annotation, and the download of annotations. Moreover, popup menus are designed
to allow customization of the labels for annotation, and the interface scales and adjusts according to di erent
screen sizes, such as by stowing the menu bar.</p>
      <p>The tool is shown in Figure 1, showing the example annotation of the shot size (i.e. how close the camera is to
the actor) in the labels panel. The side-by-side design of the video, synchronized with the display of annotation
labels and timeline, make it possible to be highly precise in terms of attributing the labels to a speci c shot. It
also makes changing and removing annotations easy, and all user operations are immediately recorded.</p>
      <p>The work ow of the tool is all seamlessly connected in one window, and is depicted in Figure 3. First, the
user can choose either to upload a new video, upload an existing json annotation, or select a previously-saved
annotation project. In case of a new video, the tool will process the video in order to detect shot boundaries.
Once nished, the interface will be fully loaded with the video, the shot timeline, and annotation labels.</p>
      <p>After the video has been processed, the user can then perform a number of operations such as modifying
the annotation labels, navigating the video through the video player controls or directly on the timeline, and
applying or removing labels to each shot. Finally, the annotation le can be downloaded in json format, and it
is also automatically backed up onto the server. A sample annotation of a shot is formatted as such:
1 f " a n n o t a t i o n s " : [
2 f " s t a r t _ t i m e " : 0 , " e n d _ t i m e " : 9 . 0 8 g ,
3 ...
4 ...
5 f " s t a r t _ t i m e " : 2 4 . 9 2 , " e n d _ t i m e " : 2 6 ,
6 " l a b e l s " : f
7 " Shot size " : [ " MCU " ] ,
8 " S c e n e Type " : [ " a c t i o n " ]
9 g g ,
10 f " s t a r t _ t i m e " : 2 6 , " e n d _ t i m e " : 2 6 . 8 4 ,
11 " l a b e l s " : f
12 " shot - size " : [ " medium - c l o s e u p " ]
13
14
15
16
17
18
19 g ] g
g g ,
...
" l a b e l s " : f
" s p e c i a l " : [ " over - the - s h o u l d e r " , " c o w b o y " , " s i l h o u e t t e " ] ,
" Shot size " : [ " MS " , " MLS " , " LS " , " MCU " , " CU " ] ,
" S c e n e Type " : [ " a c t i o n " , " d i a l o g u e " ]</p>
      <p>As an improvement from existing annotation tools, MIDB has integrated shot boundary detection, and no
limitation of the video length. The annotation labels can be customized directly using the tool, requiring no
programming experience whatsoever, and for advanced users, the javascript framework makes it easy to design
and add additional functionality through plugins.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Future Work</title>
      <p>Our goal is to publish MIDB and collected data on a server for public or limited academic use, and to eventually
establish a high quality lm dataset that can be used for virtual cinematography, lm analysis, and data science.
To this end, the next steps to this project involve creating an interface for user-designed plugins such as automatic
face detection or shot-type assignment, integrating a selection of public domain lms from Internet Archive
(https://archive.org), and running pilot studies on the tool's ease of use.</p>
      <p>Another goal is to establish a set of lm practice and theory validated vocabulary, and allow the selection of
these vocabulary sets to be the default annotation labels. This is also with regard to the recent development of
lm analysis and directing languages for virtual cinematography [WC15][RGB13].
[AP05]</p>
      <p>Olivier Aubert and Yannick Prie. Advene: active reading through hypervideo. Proceedings of ACM
Hypertext'05, pages 235{244, September 2005.</p>
      <p>Michael Kipp. Multimedia annotation, querying and analysis in anvil, 09 2012.
[RGB13]</p>
      <p>Remi Ronfard, Vineet Gandhi, and Laurent Boiron. The prose storyboard language: A tool for
annotating and directing movies. Chania, Crete, Greece, 05 2013.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [MWS+15]
          <string-name>
            <surname>Billal</surname>
            <given-names>Merabti</given-names>
          </string-name>
          , hui-yin
          <string-name>
            <surname>Wu</surname>
            , Cunka Bassirou Sanokho, Quentin Galvane, Christophe Lino, and
            <given-names>Marc</given-names>
          </string-name>
          <string-name>
            <surname>Christie</surname>
          </string-name>
          .
          <article-title>Insight: An annotation tool and format for lm analysis</article-title>
          .
          <source>In Eurographics Workshop on Intelligent Cinematography and Editing</source>
          , page
          <volume>57</volume>
          ,
          <string-name>
            <surname>Zurich</surname>
          </string-name>
          , Switzerland, May
          <year>2015</year>
          .
          <article-title>The Eurographics Association</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Hui-Yin Wu</surname>
            and
            <given-names>Marc</given-names>
          </string-name>
          <string-name>
            <surname>Christie</surname>
          </string-name>
          .
          <article-title>Stylistic Patterns for Generating Cinematographic Sequences</article-title>
          .
          <source>In 4th Workshop on Intelligent Cinematography and Editing Co-Located w/ Eurographics</source>
          <year>2015</year>
          , pages
          <fpage>47</fpage>
          {
          <fpage>53</fpage>
          ,
          <string-name>
            <surname>Zurich</surname>
          </string-name>
          , Switzerland, May
          <year>2015</year>
          . Eurographics Association. The de nitive version is available at http://diglib.eg.org/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>