Evaluation of a Video Annotation Tool Based
                  on the LSCOM Ontology
                                     Emilie Garnaud, Alan F. Smeaton and Markus Koskela


   Abstract— In this paper we present a video annotation tool               to index video, we need to support different ways for the user
based on the LSCOM ontology [1] which contains more than 800                to navigate it in order to complete the annotation process.
semantic concepts. The tool provides four different ways for the               In our annotation tool there are four distinct ways to
user to locate appropriate concepts to use, namely basic search,
search by theme, tree traversal and one which uses pre-computed             annotate content, described as follows.
concept similarities to recommend concepts for the annotator to
use. A set of user experiments is reported demonstrating the                A. Basic search
relative effectiveness of the different approaches.
                                                                            An alphabetically-ordered list of the ontology and a search
  Index Terms— Video annotation, ontology, LSCOM, semantic
concept distances.                                                          box to find matching concepts is provided which is simple but
                                                                            effective when users have a good knowledge of the ontology.

                         I. I NTRODUCTION
                                                                            B. Search by themes
   In visual media processing, a lot of progress has been made
in automatically analysing low level visual features in order               More than 700 concepts of the ontology have been arranged
to obtain a description of the content. However, annotations                into 19 different themes such as Arts & Entertainment, Busi-
by humans are still often needed to extract accurate deep                   ness & Commerce, News, Politics, Wars & Conflicts . . . so an
semantic information from within. Indeed manual tagging of                  annotator can search for a concept by first selecting a theme
visual content has become widespread on the internet through                that seems to fit with the shot.
what is known as “folksonomy” in which human annotators
provide descriptive content tags.                                           C. Recommended concepts
   One of the challenges in the area of human annotation is                 In previous work introduced in [2] we computed similarity
generating consistency across annotations in terms of both the              among all pairs of concepts in the LSCOM ontology using
vocabulary used and the way it is used. The common approach                 a combination of usage co-occurrence as the ontology was
here is to provide users with an ontology, or an organisation               used to index a corpus of 80 hours of video, combined with
of allowable semantic tags or concepts. This is popular in                  visual shot-shot (and by implication, annotation-annotation)
enterprises such as photo and video stock archives where only               similarities. We used these concept-concept co-occurrences to
a small number of people actually perform the annotation and                generate “recommended concepts” at any point after annota-
thus they are familiar with the ontology and the way it is                  tion by at least 1 concept. This worked by determining the 15
used. In more open-ended applications such as social tagging                concepts most similar to the set of concepts already used to
or tagging by untrained users then ontologies are regarded                  annotate a shot, and this top-15 was refreshed every time an
as too restrictive and too hard to learn in a short period of               additional concept was used in annotating a shot.
time and so such applications favour free form tagging at the
expense of the consistency the use of an ontology brings.
   Here, we address the issue of how an untrained user could                D. Tree organization
use a pre-defined ontology to index video in the domain of                  An hierachical version of the ontology has recently been
broadcast TV news. Specifically, we use the LSCOM ontology                  completed so we introduced some of its elements in our tool
[1], of about 850 concepts to help index media by semantics.                by creating an area where a user can navigate among different
                                                                            trees of the ontology.
                 II. V IDEO A NNOTATION T OOL
   Traditional annotation tools based on a lexicon or ontology                          III. E XPERIMENTS AND A NALYSIS
usually provide a full list of concepts with no, or very poor                  We performed preliminary experiments involving 10 native
ways to navigate it. This works quite well for a small lexicon              English-speaking users who each annotated 40 shots using
or for users who are trained to use it, but this is not scalable            different functionalities of the tool, either in a restricted
to a larger ontology or the case where the users are untrained.             timeframe or with unlimited time to complete. To replicate
Thus in order to use the LSCOM or any other large ontology                  the scenario of an untrained user annotating material on the
                                                                            internet, our users did not receive any special training in
   E. Garnaud is with Institut EURECOM, 2229, Route des Cretês, BP 193     using the annotation tool. Shots to be annotated were selected
– 06904 Sophia Antipolis Cedex, France and A. Smeaton and M. Koskela
are with the Centre for Digital Video Processing, Dublin City University,   randomly and people used functionalities in a Latin squares
Glasnevin, Dublin 9, IRELAND. email: Alan.Smeaton@dcu.ie                    protocol so as not to bias the results. We analyzed four
different aspects of the annotation process namely the overall   the same performance but after the first minute people lost time
time spent on annotating, the number of annotations per shot,    searching the ontology for additional concepts as they did not
the shot annotation rate, and the number of annotations during   have enough knowledge to know when to stop as searching the
the first minute. Results are shown below. The best annotation   ontology does not provide any kind of closure to the process.

                         Search   Search +   Search +   Entire
                          Only    Themes     Recmd.      Tool
                                                                             IV. C ONCLUSIONS AND F UTURE WORK
     Average time per                                               The approach of using recommended concepts as a way
     shot                1m 53    2m 06s     1m 53s     1m 59s
     # annotations per
                                                                 of annotating seems to be promising though the size of our
     shot (Avg)           6.9       7.2        11.3      10.9    experiment is small. The “recommended concepts” could be
     Annotation rate      6.1       5.8        10.1      9.2     improved by collecting more data to link associated concepts.
     Avg annotations                                             Indeed, some associated concepts are really good (like ”store”,
     in 1st minute        6.3       5.2        7.7       7.7
                                                                 ”landlines”, ”bank”, ”office” and ”female person” for ”admin-
                                                                 istrative assistant”) but some others are not, such as (”har-
performance is obtained using the “recommended concepts”         bors”, ”boat ship”, ”business people”, ”canal” and ”lakes” for
feature because the time spent in free annotation is the same    ”house of worship”).
as the “search only” version (representing the traditional          The tool seems to be powerful for various user profiles. For
approach) but the number of annotations is greater when          beginners, it helps them to learn the ontology and for experts
recommendations are used. Using the“themes” feature seems        it provides a way to annotate concepts that they are not used
to slow down the annotation process without increasing the       to annotating which improve their knowledge of the ontology.
number of annotations, probably due to a lack of knowledge
of the ontology and the way concepts had been organised                                 ACKNOWLEDGMENT
into different themes. Also, some shots are really good for      This work was supported by Science Foundation Ireland under
annotation by themes but others are not, which is why they       grant 03/IN.3/I361, by the EC under contract FP6-027026 (K-
are a good complement to searching for concepts to annotate.     Space) and by IRCSET.
   We also found an unexpected result from the “entire tool”
experiment which surprisingly doesn’t seem to be the most                                    R EFERENCES
effective ! Once more, this seems to be due to a lack of         [1] M. Naphade, J.R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A.
knowledge of the tool by users. Our whole point of us-               Hauptmann and J. Curtis. Large-Scale Concept Ontology for Multimedia,
ing untrained users is to replicate the common situation of          IEEE Multimedia, 13(3) July-Sept, 2006, pp.86–91.
                                                                 [2] M. Koskela and A.F. Smeaton. Clustering-Based Analysis of Semantic
untrained users annotating resources on the internet. If we          Concept Models for Video Shots In Proc. IEEE International Conference
examine the number of annotations done during the first              on Multimedia & Expo (ICME 2006). Toronto, Canada. July 2006.
minute then “recommended concepts” and “entire tool” have