<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Collaborative Video Indexing, Annotation and Discussion System For Broadband Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ronald Schroeter</string-name>
          <email>ronalds@dstc.edu.au</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jane Hunter</string-name>
          <email>jane@dstc.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Douglas Kosovic</string-name>
          <email>douglask@dstc.edu.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DSTC</institution>
          ,
          <addr-line>Brisbane, Queensland</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The University of Queensland</institution>
          ,
          <addr-line>Brisbane, Queensland</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2000</year>
      </pub-date>
      <abstract>
        <p>A number of research groups and software companies have developed digital annotation tools for textual documents, web pages, images, audio and video resources. By annotations we mean subjective comments, notes, explanations or external remarks that can be attached to a document or a selected part of a document without actually modifying the document. When a user retrieves a document, they can also download the annotations attached to it from an annotation server to view their peer's opinions and perspectives on the particular document or to add, edit or update their own annotations. The ability to do this collaboratively and in real time during group discussions is of great interest to the educational, medical, scientific, cultural, defense and media communities. But it is extremely challenging technically and demands significant bandwidth, particularly for video documents. In this paper we describe a unique prototype application developed over the Australian GrangeNet broadband research network, which combines videoconferencing over access grid nodes with collaborative, real-time sharing of an application which enables the indexing, browsing, annotation and discussion of video content between multiple groups at remote locations.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>This paper describes a unique prototype system
developed at the Distributed Systems Technology Centre,
at the University of Queensland, which enables the
realtime collaborative indexing, browsing, description,
annotation and discussion of high quality digital film or
video content. Using the GrangeNet broadband research
network [1] and access grid nodes [2] which support
large-scale group-to-group collaboration and high quality
audio/video, users are able to open an MPEG-2 file and
share the tools which enable the group to collaboratively
segment, browse, describe, annotate and discuss the
particular film or video of interest. Although annotation
tools do exist for textual documents, web pages, images,
audio and video resources, they have been designed for
use within stand-alone environments. The descriptions
and annotations can be shared by saving them to a server,
but the actual annotation applications have not been
designed to be shared in real-time collaborative
videoconferencing sessions. Hence, the Vannotea system is of
great interest to many communities, including the
educational, medical, scientific, defense and media
communities, to enable collaborative online discussions
about particular film or video content and real-time
annotation of segments, key frames or regions within
keyframes between distributed groups.</p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>Indexing and annotation systems for digital video files
have been developed in the past - but only for use within
stand-alone environments in which the annotations can be
saved and shared asynchronously. Our first task was to
carry out a detailed survey of these existing systems,
determine their best and worst features and integrate the
best features in a prototype which could be shared within
a collaborative real-time high-quality video-conferencing
environment.</p>
      <p>A survey of existing video annotation systems
revealed that the following systems were the most
advanced:
• IBM – MPEG-7 Annotation Tool [3]
• Ricoh – Movie Tool [4]
• ZGDV – VIDETO [5]
• COALA – LogCreator [6]
• CSIRO’s CMWeb tools [7]
• Microsoft’s MRAS [8]</p>
      <p>IBM's MPEG-7 Annotation Tool provides support
for both MPEG-1 and MPEG-2 files as well as regional
annotations. It also comes with a shot detection algorithm,
an easy-to-use interface and a customisable lexicon.
However, the UI is restricted to a pre-set video size and
aspect ratio. If a video has a different format than it
cannot be displayed correctly. The lexicon is also
restricted to three default categories (event, static scene
and key objects), although free text keywords can also be
added. IBM’s system doesn’t support hierarchical video
segmentation.</p>
      <p>Ricoh's MovieTool does support hierarchical
segmentation within a timeline-based representation of
the video. The automatic shot boundary detection
algorithm permits changes to threshold settings. The
MovieTool is the most mature and complete of the
systems, but has a complicated user interface which is
closely tied to the MPEG-7 specification. The user has to
have a good knowledge of the large and complex XML
Schema definition of MPEG-7 in order to browse using
the MPEG-7 Editor.</p>
      <p>In contrast, ZGDV’s VIDETO hides the complexity
of MPEG-7 basing the description properties on a simple
description template, which can then be mapped to
MPEG-7 using XSLT. Domain-specific description
templates together with their corresponding XSLT
mappings are generated. The resulting flexibility,
customisability and user-friendliness of this approach are
VIDETO's biggest advantages. VIDETO was developed
as a research tool to generate video (XML) metadata for
testing a video server and retrieval module.</p>
      <p>The LogCreator of the COALA project is a
webbased tool which supports video descriptions. It offers
automatic shot detection and a good interface for
hierarchical segmentation of videos that can be uploaded
to the server, where it is saved as MPEG-7 in a native
XML database. However, it is a domain-specific tool,
developed specifically for TV news documents with a
predefined structure. The descriptors that are used to
annotate the different video segments are predefined as
well.</p>
      <p>Two other web-based video annotation systems are:</p>
      <sec id="sec-2-1">
        <title>CSIRO’s Continuous Media (CM) Web Browser which</title>
        <p>generates a proprietary HTML-format Annodex file [9];
and Microsoft’s Research Annotation System (MRAS)
[8] – a Web-based application designed to enable students
to asynchronously annotate web-based lecture videos and
to share their annotations.</p>
        <p>None of the systems described above are designed to
be used within a collaborative video-conferencing
environment. Microsoft’s Distributed Tutored Video
Instruction (DTVI) [10] system does allow students to
replay and discuss videos of lectures collaboratively.
However it does not support real-time synchronous
annotations. It is also based on a combination of
Windows Media Player and Microsoft’s NetMeeting [11].
Net Meeting is based on the T.120 protocol [12] for
application sharing. Because T.120 has been designed for
low bandwidth and only supports low frame rates (e.g.,
10fps), the capture and transfer of mouse events,
keyboard events and screen update to the display devices
of the participants is too slow to adequately handle high
quality MPEG-2 video (24-30fps).</p>
        <p>Consequently we were unable to use the NetMeeting
application-sharing capabilities and had to develop our
own collaborative application sharing environment from
scratch using .NET Remoting. The sub-section on .NET
Remoting describes this in more detail.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>OBJECTIVES</title>
      <p>An analysis of existing systems enabled us to
determine the objectives of this project in more detail.
Our primary goal was to develop a system to enable the
collaborative indexing, browsing, annotation and
discussion of video content between multiple groups at
remote locations. In addition the system must support:
• User/group participation via access grid nodes;
• Delivery over the GrangeNet broadband research
network;
• High quality video – MPEG-2 files;
• Automatic shot detection;
• Hierarchical video segmentation;
• Simple user interfaces;
• Flexibility – different domains, communities and
metadata application profiles;
• International video metadata standards such as</p>
      <p>MPEG-7;
• Annotation of segments, shots, frames and regions
within frames;
• The ability to save, browse, retrieve and share
both the authorized, structured, objective
metadata/descriptions as well as the subjective
annotations and their source (who said what and
when).</p>
    </sec>
    <sec id="sec-4">
      <title>ARCHITECTURE</title>
      <p>Figure 1 illustrates the overall system architecture –
assuming deployment within an educational context. The
scenario is a live discussion between students and
lecturers from tertiary Film/Media Studies Departments,
communicating with curators, archivists and film/media
analysts from leading audiovisual archives and the
creative industries in Australia - via access grid nodes
over the GrangeNet broadband research network. All of
the participants of this hypothetical online
videoconference are sharing an application which enables
the retrieval of an MPEG-2 video and real-time
collaborative, synchronous indexing, browsing,
annotation and discussion of the video.</p>
      <p>
        Our assumption is that there are two separate metadata
stores: one store is for the search and retrieval of video
content from the servers (we assume that this will be
provided and maintained by the custodial organization);
and a separate metadata store for logging the shared
personal annotations. Our distinction is based on the
premise that the first one stores objective authorized
descriptions of the content, provided by trained
cataloguers using controlled vocabularies, whilst the
enable the search, retrieval and browsing of video files
stored on the streaming video servers connected to the
network. A simple application profile which combines
Dublin Core[15] and MPEG-7 [16] was developed to
enable both the resource discovery of atomic video files
as well as the fine-grained retrieval of relevant video
segments [17]. Figure 2 below illustrates the data model
for the search, retrieval and browsing metadata. An
automatic shot detection module provided by Mediaware
[
        <xref ref-type="bibr" rid="ref4">27</xref>
        ] was integrated to automate the segmentation and
hence the metadata generation, as much as possible.
      </p>
      <p>Video
formatting
mpeg7:MediaFormat
second store contains personal and highly subjective
views, expressed in free text, which are clearly attributed
to specific individuals rather than organizations. In the
real world and within the Internet, this distinction often
becomes highly fuzzy. Our software enables both types of
metadata to be entered and saved.</p>
      <p>The video content is being streamed from multiple
video servers located at different custodial organizations
e.g., ScreenSound Australia [13] (the Australian National
Audiovisual Archive) or the Australian Centre for
Moving Image [14].</p>
      <p>Indexing
Description</p>
      <p>Data
Annotation
Discussion</p>
      <p>Data
Indexing
Description</p>
      <p>Data</p>
      <p>Storage and Management
Screensound Australia</p>
      <p>AV Library Streaming</p>
      <p>Server</p>
      <p>GrangeNet
Annotation</p>
      <p>Server
Australian Centre for MovingImage</p>
      <p>AV Library Streaming</p>
      <p>Server</p>
      <p>Synchronous &amp; Collaborative</p>
      <p>Indexing, Annotation and Discussion
Access
Grid AG
Nodes</p>
      <p>Students
@University of QLD</p>
      <p>AG</p>
      <p>Curator
@ScreensoundAustralia
AG
Application ApFpillmicaEtdion VIC/RAT
Server</p>
      <p>AG</p>
    </sec>
    <sec id="sec-5">
      <title>COMPONENTS</title>
      <p>The first phase of the project consisted of the
development of a simple stand-alone video indexing,
browsing and annotation prototype which supported the
features described in Section 0. The second phase
consisted of integrating this as a shared application within
the collaborative videoconferencing environment. The
development environment chosen was Visual Studio
.NET and the C# programming language. Java Media
Framework was unsuitable because of its lack of support
for MPEG-2. Figure 1 illustrates the four major
components of the system which needed to be developed
and which are described in more detail below:
• Search and Retrieval Database;
• Annotation Database;
• Application Server;
• MPEG-2 Streaming.</p>
      <sec id="sec-5-1">
        <title>Search and Retrieval Database</title>
        <p>The first task in developing this database was to
specify the underlying metadata schema(s) necessary to</p>
      </sec>
      <sec id="sec-5-2">
        <title>Annotation Database</title>
        <p>
          The annotation database stores the annotations (which
may be associated with segments, keyframes or still
regions within frames), as well as the source of the
annotations (who, when, where). Annotations can be
notes, explanations, or other types of external subjective
remarks. We decided to base the annotation component of
our software on Annotea [
          <xref ref-type="bibr" rid="ref1">18</xref>
          ], an open system developed
by the W3C, which enables shared annotations to be
attached to any Web document or a part of the document.
“This is
great”
        </p>
        <sec id="sec-5-2-1">
          <title>Ralph</title>
          <p>Annotation
dc:title
dc:creator
rdf:type</p>
          <p>3ACF6D754
created
dc:date
annotates
context</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>Xdoc.html</title>
          <p>body
postit.html
2000-01-10T17:20Z 2000-01-10T17:20Z</p>
          <p>Annotea uses an RDF-based annotation schema [19]
and XPointer [20] for linking the annotations to the
document. Figure 3 illustrates the basic annotation
schema employed by Annotea. We have extended this to
mpeg7: Temporal
Decomposition
bibliographic</p>
          <p>Video
Segment</p>
          <p>DCMES
description
keyframe
clip
startTime
duration
support the annotation of audiovisual documents
“context” is specified through extensions to XPointer
which enable the location of specific segments, keyframes
or regions. This approach also allows us to utilize and test
prototypical annotation server implementations such as
Zope [22] or the W3C Perllib [23] server. These are RDF
databases which sit on top of MySQL and provide their
own query language, Algae [24].</p>
          <p>The "body" of an annotation is usually text or HTML.
But our architecture allows us to generate, attach and
store audiovisual annotations - small audio or video clips
captured during the video conferencing discussion.</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Application Server</title>
        <p>Application Sharing Protocols</p>
        <p>The approach adopted by application sharing protocols
such as T.120 (NetMeeting) or VNC-Protocol [25] makes
them unsuitable for our application. In such protocols, the
shared application runs on a master client or server,
which receives the keyboard and mouse events from the
participants and sends captured screen/window updates
back to the participants (Figure 4).</p>
        <p>re FilmEd application
t
s
a
M
tn
e
li
C
tn Application-Sharing
a
p
iitr
c
a
P</p>
        <p>NetMeeting
or
VNC Server
NetMeeting
or</p>
        <p>VNC Client
Mouse/Keys
events</p>
        <p>Captured screen
changes
.NET Remoting</p>
        <p>Because the Vannotea prototype is implemented in C#
within the .NET development framework, the most
flexible, modular and integrated approach to application
sharing was to develop it using .NET Remoting. .NET
Remoting provides a framework that allows objects to
interact with each other across application domains or on
different servers. All of the language constructs, such as
constructors, delegates, interfaces, methods, properties
and fields can be applied to remote objects. Calling a
remote object is the same as calling a local object. When
combined with the mechanisms of delegates and events,
remote objects can also call methods on the client. Even
arguments can be passed as long as they are serializable.</p>
        <p>Figure 5 illustrates the event-handling architecture of
our application. In this example, the client-master is in
control of the application, the remote clients are joining
the session by connecting to the same server-application.</p>
        <p>SERVER</p>
        <p>Coordinator
ClickEvent
ClickEvent</p>
        <p>Click
SendClick
SendClick
SendClick</p>
        <p>CLIENT
MASTER</p>
        <p>MainForm
Mediator SendClick Form</p>
        <p>DoSomething
CLIENT</p>
        <p>Mediator
CLIENT</p>
        <p>Mediator</p>
        <p>MainForm</p>
        <p>Form
MainForm</p>
        <p>Form
DoSomething
DoSomething</p>
        <p>The main advantage of this approach is that a single
framework can be used to share different applications.
However, these protocols were designed for
lowbandwidth networks and can not handle the high frame
rates required by MPEG-2. They also restrict the
application sharing to a single user being in control at any
one time. Because of our need to support high frame rates
and MPEG-2, such ready-made application-sharing
frameworks are unsuitable. We have had to build a
collaborative environment from scratch, using .NET
Remoting. This is described in detail in the next section.
The Mediator objects handle the communication
between the clients and the server. They can call methods
on the remote object (Coordinator). In return, the
Coordinator can call methods on the Mediator by raising
events that the Mediator has subscribed and listens to.</p>
        <p>The goal is to simulate all events on all clients. In
Figure 5 the client-master clicks a button in the
MainForm, which is then reflected in another Form of all
clients. After a button click, an event is raised and
handled by forwarding this information to the Mediator
object. The Mediator checks the information and calls a
method on the server, telling it that a button has been
clicked. The server then raises a ClickEvent that each
client's Mediator object has subscribed and listens to.
Finally all Mediators handle the event by doing
something in their Form.</p>
        <p>Mouse movement events are handled in the same way.
The client master's mouse position is updated and
transferred to all clients, where it is displayed as a pseudo
mouse pointer. This provides the necessary feedback to
users about what the other user did and where he/she
clicked.</p>
        <p>The approach described above assumes that one user is
in control at any time. Alternatively every remote client
could be the client-master at once, creating a truly
collaborative environment for the application in which
every participant is in control simultaneously, resulting in
several mouse pointers within one application. To
differentiate between users, the mice would be
colourcoded. Such a scenario may sound chaotic - however in
certain situations, it may actually be useful to have
multiple users doing different tasks synchronously.</p>
        <p>One objective of the project is to evaluate users’
behavior and obtain user feedback on the different levels
of collaboration available during video analysis and
discussion and annotation processes. Although the design
approach which we have adopted is more difficult in the
short term, over the longer term it provides the required
flexibility to explore these aspects fully and easily modify
the system in response to user feedback and evaluation.</p>
        <p>Combined with the MPEG-2 streaming architecture
described in the following section, this approach also
fully utilizes the advanced bandwidth and low latency
capabilities provided by GrangeNet.</p>
      </sec>
      <sec id="sec-5-4">
        <title>MPEG-2 Streaming</title>
        <p>The Server sends VCR-like commands (play, pause,
seek, stop) to the Streaming Server, which then streams
the section of the MPEG-2 file that needs to be played
and viewed on the remote clients.</p>
        <p>For efficiency and scalability IP Multicasting is used
for the streaming. Without multicasting, the same
information would need to be carried over the network
multiple times, via separate unicast streams for each
remote client.</p>
        <p>The transport protocol being used for the MPEG-2
multicast streams is UDP (User Datagram Protocol),
which provides end-to-end delivery services for data with
real-time characteristics, such as interactive audio and
video. To receive the MPEG-2 over UDP stream, the
clients use a DirectShow Filter for UDP reception.</p>
        <p>Mediator Pause
SERVER</p>
        <p>Pause</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>IMPLEMENTATION</title>
      <sec id="sec-6-1">
        <title>Description Architecture</title>
        <p>A key objective of the system was to provide
simplicity and flexibility for users in their choice of
metadata descriptions, whilst still supporting standards
and interoperability. This required a design which could
easily adapt to the different application profiles required
by different communities. We did this by providing a tool
which enables users to define and edit XML Description
Templates – simplified versions of XML Schemas. For
example, the Description Template in Figure 7, defines
domain-specific hierarchical structures for “Film” and
their relevant description i.e., a feature film will be
segmented into scenes and shots. A Film description
would typically include: Title, Creator, Genre, Date, etc.
TV News on the other hand might be segmented into
presentations, reports and interviews, which would
require a different Description Template.</p>
        <p>...
&lt;!-- ************************************ --&gt;
&lt;!-- User-defined hierarchal structure --&gt;
&lt;!-- ************************************ --&gt;
&lt;SegmentHierarchy&gt;
&lt;Segment type="Film"&gt;
&lt;Segment type="Scene"&gt;
&lt;Segment type="Shot"/&gt;
&lt;/Segment&gt;
&lt;/Segment&gt;
&lt;/SegmentHierarchy&gt;
&lt;!-- ************************************ --&gt;
&lt;!-- User-defined Description Elements --&gt;
&lt;!-- ************************************ --&gt;
&lt;Descriptions&gt;
&lt;Description type="Film"&gt;
&lt;DescriptionElement name="Title"/&gt;
&lt;DescriptionElement name="Creator"/&gt;
&lt;DescriptionElement name="Genre"/&gt;
&lt;DescriptionElement name="Date"/&gt;
&lt;/Description&gt;</p>
        <p>The User Interface for entering metadata, is
dynamically generated from the Description Template
and reflects the segment hierarchies and description
elements defined within it. The metadata for each video
file is represented as a Description DOM (Figure 8)
similar to the structure of the template, which makes it
simple to transform to different standards like Dublin
Core and MPEG-7 [17] using XSLT.</p>
        <p>&lt;Vannotea&gt;
&lt;Segment type="Film" id="media_1"&gt;
&lt;Description type="Film"&gt;
&lt;DescriptionElement name="Title"&gt;
&lt;DescriptionElement name="Creator"&gt;
&lt;DescriptionElement name="Genre"/&gt;
&lt;DescriptionElement name="Date"/&gt;
&lt;/Description&gt;
&lt;Segment type="Scene" id="scene_0"&gt;
&lt;Description type="Scene"&gt;
&lt;DescriptionElement name="SceneTitle"&gt;
&lt;DescriptionElement name="FreeText"&gt;
&lt;/Description&gt;
&lt;/Segment&gt;
&lt;/Segment&gt;
&lt;/Vannotea&gt;</p>
      </sec>
      <sec id="sec-6-2">
        <title>User Interface</title>
        <p>A full-size screen capture of the interface, being used
in the context of an access grid session, is available in
Appendix A. Figure 9 illustrates the three key
components of the user interface:
• The Content Player displays the video content being
streamed from the archive or custodial organization;
• The Content Description component enables the
objective and authorized segmentation and indexing
of the content, as well as search, browsing and
retrieval;
• The Annotation &amp; Discussion component enables the
input, logging, search and retrieval of shared
annotations.</p>
        <p>Content
Description</p>
        <p>Search
Hierarchical
Segmentation
&amp; Browsing
Segment
Description
Video Screen
Video Controls</p>
        <p>Annotation &amp;
Discussion</p>
        <p>Search
Browsing</p>
        <p>
          In order to streamline the indexing and segmentation
process, an automatic shot-detection capability was
added. The Mediaware SDK [
          <xref ref-type="bibr" rid="ref4">27</xref>
          ] is used to perform the
automatic shot-detection. This generates a list of shots for
the entire MPEG-2 file or a selected segment. Because the
Mediaware SDK is written in C++, C# wrappers needed
to be developed in order to integrate it.
        </p>
        <p>Once the shot-list has been generated, the
explorerstyle browser in the Content Description window
allows either further hierarchical segmentation of shots to
frames or aggregation of shots to higher-level segments
(scenes). This hierarchical structuring into: segments,
scenes, shots and frames; enables easy navigation through
the video.</p>
        <p>Also within the Content Description window,
selected segments or frames can be described either by
entering free text values or using controlled
vocabulary/terms available through pull-down menus.</p>
        <p>The Content Player features common video playback
functionalities (play, pause, seek, stop) and also allows
the annotation of the current video frame, through the use
of drawing tools to define regions. The drawing tools
support the attachment of annotations to rectangular,
point or linear regions within frames. The actual
annotation for a region is input via the Annotation and
Discussion window. Details of who attached the
annotation and the date/time of annotation are also
recorded.</p>
      </sec>
      <sec id="sec-6-3">
        <title>The Annotation and Discussion window also enables</title>
        <p>users to browse and display past annotations and to see to
whom each annotation is attributed and when they
recorded it.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <sec id="sec-7-1">
        <title>Conclusions</title>
        <p>In this paper we have described a unique system which
was developed to enable the collaborative, real-time,
indexing, browsing, annotation and discussion of high
quality video content by multiple, distributed groups
connected via access grid nodes on a broadband network.</p>
        <p>Although previous video annotation systems have
been developed, they have not been collaborative,
realtime, synchronous systems capable of supporting high
quality MPEG-2 content. These requirements have
demanded that the collaborative application-sharing
environment be developed from scratch using .NET
Remoting. To ensure that the system is as flexible as
possible, users are able to edit the Description Template
directly. The user interface is then dynamically generated
from the Template. For simplicity sake, our default
metadata application profile is a simplified aggregation of
particular MPEG-7 Description Schemes which can easily
map to MPEG-7. Metadata input is controlled through a
backend XML Schema as well as controlled vocabularies
associated with specific terms. Fine-grained metadata
generation is streamlined through the integration of
Mediaware’s automatic scene change detection algorithm.
In order to maximize interoperability and leverage
existing servers, we have chosen to extend the existing
W3C Annotea tools for annotating web pages, to enable
the annotation of audiovisual content.</p>
        <p>There is enormous interest in this application – in
particular from the medical and biological imaging
domain for the annotation of bio-medical video content.
Our goal is to use this tool to assist with the manual
indexing by domain-experts of example databases which
can then be used for machine learning to enable automatic
domain-specific video recognition.</p>
      </sec>
      <sec id="sec-7-2">
        <title>Future Work</title>
        <p>In the next 12-18 months we intend to continue the
development of the Vannotea system. In particular we
would like to improve and extend it by implementing the
following functionalities and carrying out the following
tasks:
• Enable the attachment of audio/video annotations;
• Perform user evaluations and usability studies to
obtain user feedback and refine and modify the
software accordingly;
• Enable the sharing and annotation of documents of
all media types (not just video) e.g. word documents,
web pages, images, presentations, texts;
• Enable collaborative editing of documents of all
media types;
• Investigate software and standards (MPEG-21) for
managing the digital rights associated with the video
content being delivered and annotated.
[1] GrangeNet. http://www.grangenet.net/
[2] Access Grid. http://www.accessgrid.org/
[3] IBM MPEG-7 Annotation Tool.
http://www.alphaworks.ibm.com/tech/videoannex
[4] Ricoh MovieTool.
http://www.ricoh.co.jp/src/multimedia/MovieTool/
[5] Zentrum fuer Graphische Datenverarbeitung e.V. (ZGDV).
VIDETO - Video Description Tool.
http://www.rostock.zgdv.de/ZGDV/Abteilungen/zr2/Produkte/vi
deto/index_html_en
[6] Swiss Federal Institute of Technology (EPFL). COALA
(Content-Oriented Audiovisual Library Access) – LogCreator.
http://coala.epfl.ch/demos/demosFrameset.html
[7] CSIRO. The Continuous
http://www.cmis.csiro.au/cmweb/
Media</p>
        <p>Web
(CMWeb).
[8] David Bargeron, Anoop Gupta, Jonathan Grudin, and
Elizabeth Sanocki. "Annotations for Streaming Video on the
Web: System Design and Usage Studies". Microsoft Research,
Redmond.
http://www.research.microsoft.com/research/coet/MRAS/WWW
8/paper.htm
[9] Annodex. http://annodex.nsw.cmis.csiro.au/index.html
[11] NetMeeting.
http://www.microsoft.com/windows/netmeeting/
[12] Asim Karim. "H.323 and Associated Protocols", November
26,1999.
ftp://ftp.netlab.ohio-state.edu/pub/jain/courses/cis78899/h323/index.html
[13] Screensound Australia. http://www.screensound.gov.au
[14] Australian Centre
http://www.acmi.net.au
[15] Dublin Core. http://dublincore.org/
[16] MPEG-7. http://www.mpeg-industry.com/
for</p>
        <p>Moving</p>
        <p>Image
(ACMI).
[17] J.Hunter, "An Application Profile which combines Dublin
Core and MPEG-7 Metadata Terms for Simple Video
Description", July 2002.
http://www.metadata.net/harmony/video_appln_profile.html
[19] http://www.w3.org/2000/10/annotation-ns#
[20] W3C XML Pointer, XML Base and XML Linking.
http://www.w3.org/XML/Linking
[21] Marja-Riitta Koivunen and Ralph R. Swick. "Metadata
Based Annotation Infrastructure offers Flexibility and
Extensibility for Collaborative Applications and Beyond",
W3C, MIT Laboratory for Computer Science, 26 September
2001.
http://www.w3.org/2001/Annotea/Papers/KCAP01/annotea.html
[22] Zope Annotation Server.
http://www.zope.org/Members/Crouton/ZAnnot/</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGEMENTS</title>
      <p>The work described in this paper has been funded by the
by the Cooperative Research Centre for Enterprise
Distributed Systems Technology (DSTC) through the
Australian Federal Government's CRC Programme
(Department of Industry, Science and Resources), as well
as the GrangeNet program funded by the Australian
Federal Government's BITS Advanced Network Initiative
(Department of Communications, Information
Technology and the Arts).</p>
    </sec>
    <sec id="sec-9">
      <title>APPENDIX A</title>
      <p>A Screen Shot of the FilmEd Application within an
Access Grid session.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [18]
          <article-title>W3C Annotea Project</article-title>
          . http://www.w3.org/2001/Annotea/ [23]
          <fpage>W3C</fpage>
          - Perllib Annotations server.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          http://www.w3.org/
          <year>1999</year>
          /02/26-modules/User/AnnotationsHOWTO [24]
          <string-name>
            <surname>Algae</surname>
            <given-names>HOWTO</given-names>
          </string-name>
          . http://www.w3.org/
          <year>1999</year>
          /02/26- modules/User/Algae-HOWTO.html [25]
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Network Computing (VNC).</surname>
          </string-name>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          http://www.uk.research.att.com/vnc/ [26]
          <string-name>
            <surname>Simon</surname>
            <given-names>Robinson</given-names>
          </string-name>
          , Burt Harvey, Christian Nagel, Ollie Cornes, Karli Watson, Morgan Skinner, Jay Glynn,
          <string-name>
            <given-names>Zach</given-names>
            <surname>Greenvoss</surname>
          </string-name>
          and
          <string-name>
            <given-names>Scott</given-names>
            <surname>Allen</surname>
          </string-name>
          ,
          <string-name>
            <surname>Professional</surname>
            <given-names>C</given-names>
          </string-name>
          #
          <article-title>(2nd Edition)</article-title>
          , Wrox Press,
          <year>2002</year>
          . pp.
          <fpage>981</fpage>
          -
          <lpage>1025</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Mediaware</given-names>
            <surname>Solutions</surname>
          </string-name>
          . http://www.mediaware.com.au/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>