=Paper=
{{Paper
|id=Vol-100/paper-8
|storemode=property
|title=Interactive Authoring Tool for Extensible MPEG-4 Textual Format (XMT)
|pdfUrl=https://ceur-ws.org/Vol-100/Kyungae_Cha-et-al.pdf
|volume=Vol-100
|dblpUrl=https://dblp.org/rec/conf/ecai/ChaK02
}}
==Interactive Authoring Tool for Extensible MPEG-4 Textual Format (XMT)==
Interactive Authoring Tool for Extensible MPEG-4
Textual Format (XMT)
Kyungae Cha1 and Sangwook Kim2
practices of content authors, such as the Extensible 3D (X3D)
Abstract. MPEG-4 is an ISO/IEC standard which defines a
being developed by the Web3D Consortium and the
multimedia system for communicating interactive scenes Synchronized Multimedia Integration Language (SMIL) from
containing various types of media objects. The Extensible the W3C consortium[7,11]. Thus authors can get multimedia
MPEG-4 Textual format (XMT) framework provides contents, which are exchangeable and interoperable with X3D
interoperability between existing practices such as the and SMIL, using the XMT authoring tool.
Extensible 3D (X3D) and MPEG-4. This paper introduces an In the authoring system, authors can visually make a spatial
XMT authoring tool that supports a visual environment for arrangement of media objects and compose a temporal behavior
building a spatio-temporal scenario of media objects comprising of objects with timeline approach. Authors can also modify the
a multimedia scene. The authoring tool provides a material characteristics of each object using interactive and
comprehensive set of facilitative editing tools for composing visual tools. Moreover, the visual scene is automatically
transformed into an XMT-α and XMT-Ω format document.
multimedia scene, as well as tools for automatic generation of
In section 2, XMT formats are briefly discussed. In section 3,
XMT documents and MPEG-4 contents. This paper also the various functions of the XMT authoring tool are described.
describes the functionality of the developed system and shows The implementation of the proposed system is then presented in
an example of its use. section 4. Finally section 5 gives conclusion and presents our
future plans.
1 INTRODUCTION
2 XMT-Α AND XMT-Ω FORMATS
MPEG-4, one of the leading streaming media formats, is an
ISO/IEC standard which defines a multimedia system for The XMT framework consists of two levels of textual syntax
communicating interactive scenes with various types of media and semantics: XMT-α and XMT-Ω formats[7,10].
objects. In MPEG-4, a scene is accompanied with the XMT-α is an XML-based version of MPEG-4 content which
description specifying how the objects should be combined in provides a straightforward, one-to-one mapping between the
time and space in order to form the scene intended by the author. textual and the binary formats of an MPEG-4 scene description.
The scene description is coded in a binary format called XMT-α also provides interoperability with X3D[5], which
Binary Format for Scenes or BIFS[1,4,7,8,10,11], which is built
improves upon VRML with new features such as flexible XML
on several concepts from the Virtual Reality Modeling
Language(VRML)[5]. 12 This binary form is suitable for low- encoding and a modularization approach[6]. It contains a subset
overhead transmission so that BIFS basically provides an of the X3D as well as the X3D-like representations of MPEG-4
efficient application for the sender and the receiver[1,7]. features such as Object Descriptors(OD), BIFS update
On the other hand, the Extensible MPEG-4 Textual format commands and 2D composition[7].
(XMT) is a framework for representing MPEG-4 scene XMT-Ω is a high-level abstraction of MPEG-4 features based
description using a textual syntax. on the SMIL[9]. It specifies objects and their relationships in
This paper presents an XMT document authoring tool that
terms of the author’s intention rather than coded nodes and route
enables visual composition of an MPEG-4 scene and generates
the corresponding XMT document and MEPG-4 contents. XMT mechanism in BIFS. In the respect of reusing SMIL, XMT-Ω
is designed to provide a high-level abstraction for MPEG-4 defines a subset of modules used in SMIL whose semantics are
functionalities and an easy interoperability between existing compatible. Moreover XMT-Ω format can be parsed and played
directly by a W3C SMIL player, preprocessed to the
corresponding X3D nodes and played by a VRML player. It
1 may also be compiled to an MPEG-4 representation such as
Department of Computer Science, Kyungpook National University,
mp4 which can then be played by an MPEG-4 player. Figure 1
Daegu, Korea, email : chaka@woorisol.knu.ac.kr
2 shows the interoperability of XMT between SMIL player,
Department of Computer Science, Kyungpook National University,
VRML player and MPEG-4 player.
Daegu, Korea, email : swkim@cs.knu.ac.kr
Circle, and others). These tools enable authors to compose
S M IL audio-visual scenes with direct manipulation technique and see
P a rs e P la y e r
them immediately. Figure 3 presents an overview of the
XMT graphical user interface and a simple example of a scene.
C o m p il e VRM L
B r o w s er Authors first select from the toolbar one of the tools they
want to add in the scene and then draw the selected object. For
image objects, the object is drawn in the interface window. For
M P E G -4 M P E G -4 video objects, the first frame of the video is drawn in the
R ep r es en ta ti o n P la y e r interface window.
( e.g m p 4 fil e )
Whenever a new media object is added in the scene, the
system automatically assigns the object ID, start time and end
Figure 1. The interoperability of XMT time of the object with default value. The bottom portion of
figure 3 shows the timeline window where the timelines of
objects are arranged. The layer of timeline represents the
3 XMT AUTHORING SYSTEM drawing order of corresponding objects, which is determined
following the object addition sequence. Here the timeline
This section shows the XMT document authoring environment
window shows the initial state, i.e. no modification is occurred.
of our system and the authoring process in creating an MPEG-4
scene and an XMT document. The main functionalities of the
system are also described.
3.1 System Structure
The following figure shows the system structure and every
component of the XMT authoring tool.
Media
Media Graphical User Interface
Decoders
data
Parser
Scene composition tree Manager
XMT documents
Scene composition tree
XMT_Ω
XMT_Ω Generator
document
Figure 3. Graphical user interface
XMT_α XMT_α
document Generator
3.2.1 Spatial composition
In the user interface, each object participated in a scene is
Figure 2. System Structure contained in a rectangular tracker so that they are treated as
individual objects. Thus the author can move, resize or remove
Authors compose an MPEG-4 scene with various editing the objects directly for composing a spatial arrangement of the
tools provided in the graphical user interface. Following the scene.
authoring process, the scene composition tree, which represents The spatial attributes of an object can be specified in terms of
the visual scene as internal data structure, is built and modified. the spatial position of the object’s bounding rectangle, which is
Using the scene composition tree, the XMT-α or XMT-Ω represented as a rectangular tracker containing the object in the
generator makes a corresponding XMT format document. At user interface. The spatial position of bounding rectangle of an
this time the author can choose the output format that he/she object (i.e. the spatial attribute of the object) is specified as the
wants. The XMT format files can be parsed and then displayed form of (x,y,h,w), where w denotes the width of the bounding
in the user interface as a visual scene. The author can also rectangle; h denotes the height, while x and y denote the
modify the visual scene and recreate the XMT file. coordinates of the center of the rectangle with referring to the
center of whole rectangle of the presentation as origin of
3.2 Graphical User Interface coordinate system.
The author can also apply material characteristics such as
The graphical user interface provides a set of drawing tools and color, transparency, and border type using editing tools. These
editing tools for various media types such as JPEG image, material properties of an object are specified as object property
MPEG-1 video, G.723 audio and graphical objects (Rectangle,
node in the internal form of our authoring system. The spatial a hierarchical structural form. Whenever a new object is created
and material attributes of each object are automatically specified in the user interface, the corresponding object node is also
by the system from the visual scene. created.
The object node has its corresponding object type, object ID
3.2.2 Interactive scenario composition and values specifying spatio-temporal attributes. The scene
composition tree is modified through the attachment of the new
In the presentation of an MPEG-4 scene, user interaction is object node. At the same time, the property node of the object is
possible within the set in the scene description. Assume that the attached as a child node of the new object node. The property
author designs the following scenario for the scene in figure 3. node as well as the tree structure can be changed throughout the
Example 1. If an end user clicks the circle object, the fill color authoring process. The tree structure can be changed while
of the rectangle object will be changed through the gradient objects are added, replaced, or removed. If the author creates
from red to green. event information, an event object which contains destination
Here, the circle object and the rectangle object refer to the object ID, event type and values of transition status is created
source object and the destination object respectively. To make and attached to the source object node as its child node. Thus an
an interactive scenario, the event type(e.g. user’s click), the event object does not specify its source object ID.
source and destination object and the responding action type(e.g.
change fill color), etc., should be specified. We denote the
interactive information as event object which is represented as a
3.4 Generation of XMT Document
quadruple (destination object ID, event type, action type, key The resulting graphical user interface is represented as the scene
values). The key values mean an array of values to be used to composition tree. From the scene composition tree, both of the
change the parameters of the action type field. The event object
XMT-α and XMT-Ω document corresponding to the visual
for the above example is specified as (3000, click, fill color,
((1.00 0.00 0.00),(0.00 0.50 0.00)), if the rectangle object as the scene are directly generated.
destination has the number 3000 for its object ID.
We provide a dialog based interface in order to facilitate the 3.4.1 XMT-α generation
interactive scenario authoring process. The event object
specification is done by selecting an event type and attributes of In XMT-α format, each object is represented as an element
the destination object that the author wants the event type to similar to the object node described in BIFS. Thus, the XMT-α
change, without the need for an extra description. format document can be generated following the BIFS
generation rules.
3.2.3 Temporal scenario composition The XMT-α generator searches the scene composition tree
until it meets the audio and visual object node. It then creates
For composing temporal scenario of objects, the author can the corresponding object element of the XMT-α document using
modify the timeline of each object, i.e., the author directly spatio-temporal attributes of the object node. With the value
modifies the length and position of timelines in the timeline specified in the object’s property node in the scene composition
window. Moreover the author can declare the temporal tree, the XMT-α generator can describe geometric attributes
relationships among objects, which are maintained through the such as position, size and shape of the object or material
authoring process. Consider the following scenario for the scene attributes such as fill color and border style.
in figure 3. Figure 5 and Figure 6 show a portion of XMT-α and BIFS
Example 2. The text object is rendered at end of the image text for the scene of example 1 respectively. In this case, when
object. the XMT-α generator finds the circle object node in the scene
The scenario can be specified if the author modifies the composition tree, it also meets the circle object’s property node
timelines of the two objects like figure 4 and he/she declares the as well as its event node at the object node’s child. Using the
two objects as a sequence group which maintains the objects information written in the event node, the route and sensor
play sequentially. nodes can be described.
Sequence group
3.4.2 XMT-Ω generation
image
XMT-Ω syntax and semantics have been designed using
text
extensible media (xMedia) objects as basic building blocks[7].
time
The elements within XMT-Ω abstract the geometry and the
Figure 4. An example of timeline modification and temporal behavior of the corresponding object in the visual scene. Thus,
relationship declaration
if an object is associated with an event object node, its behavior
The timeline of the image object is automatically updated to should be defined by a set of animation and timing element.
maintain the relationship each time the duration of the text Figure 7 shows the XMT-Ω format document corresponding
object is modified. the XMT-α format in figure 5. The rectangle object is defined
with the elements describing the object’s spatial and material
3.3 Scene Composition Tree attributes as well as the animate elements describing a change of
The resulting graphical user interface is represented as a scene fill color which responds to a click on the circle object.
composition tree designed to organize the composed scene into Likewise figure 8 shows a portion of XMT-Ω document
specifying the scenario of example 2. It represents a temporal transparency -1.00
relationship and synchronization module expression using SMIL ...
geometry Rectangle {
timing constraints. A ‘seq’ container defines a sequence of
size 162.00 110.00 }
elements in which elements play one after the other. The text DEF TimeSI3000I0 TimeSensor {
object starts one second after the presentation begins and 19 cycleInterval 3.00
seconds later disappears. When the text object disappears, the enabled FALSE
loop TRUE
image object whose temporal duration is 23 seconds starts. startTime 0.00
Figure 9 represents the BIFS text corresponding XMT-Ω in stopTime -1.00 }
figure 8. DEF ColorInter3000I0 ColorInterpolator {
1.00 ]
keyValue [
0.00 0.50 0.00
1.00 0.00 0.00 ]
geometry Circle { radius 57.00 }
...
ROUTE TouchS3001.isActive TO TimeSI3000I0.enabled
ColorInter3000I0.set_fraction
ROUTE ColorInter3000I0.value_changed TO
Material2D3000.emissiveColor
key="0.00 1.00" 17" scale="1.00 1.00">
dur="1s" begin="circle_3001.click"
… values="#000000; #010000" keyTimes="0.00; 1.0" />
…
…
Figure 7. A portion of XMT-Ω corresponding XMT-α in figure 5
toNode="Material2D3000" textLines="MPEG-4 ......" dur="19s">
translation -163.00 17.00
scale 1.00 1.00
children [
Shape {
material DEF Material2D3000 Material2D {
emissiveColor 0.75 0.75 0.75
filled TRUE
scene composition tree using DOM API [2]. DOM API provides
…
tree structure and navigate the tree.
Media elements described within the parsed XMT document
are represented as object nodes with their corresponding
Figure 8. A portion of XMT-Ω for the example 2 property nodes. Thus the scene described in the XMT document
can be visualized by rendering the corresponding media object
DEF Switch3002 Switch { nodes using the scene composition tree. The visualized scene
whichChoice 1
choice [
can also be modified and rewritten as XMT document.
DEF Transform2D3002 Transform2D {
... 4 IMPLEMENTATION
Shape {
appearance Appearance { The proposed XMT authoring tool is developed using C++
material DEF Material2D3002 Material2D under the Windows 95/98/NT platform. The system supports the
{ Complete2D profile for MPEG-4 contents.
emissiveColor 1.00 1.00 0.00
filled TRUE
transparency -1.00 5 CONCLUSION
... The XMT document authoring tool provides visual and direct
geometry Text { string [ "MPEG-4 ......"] manipulating authoring technique. In the system, common users
fontStyle DEF FontStyle3002
FontStyle {
can create an MPEG-4 scene and its XMT format document
family "Arial " although they are not familiar with XMT syntax and semantics.
horizontal TRUE Moreover, the visual scene is automatically transformed into
justify "BEGIN" XMT-α or XMT-Ω document without syntax error. Likewise, a
language "(null)" sophisticated scene, which may be very difficult to create using
leftToRight TRUE text description, can be generated. In the future, it is necessary
size -21.00
spacing 34.00
to support more types of media data and scene nodes such as 3D
style "PLAIN" objects and a more facilitative authoring interface.
topToBottom TRUE . . .
DEF Switch1000 Switch { REFERENCES
whichChoice 1
choice [ [1] A. Puri and A. Eleftheriadis, “MPEG-4: An Object-Based
DEF Transform2D1000 Transform2D { Multimedia Coding Standard Supporting Mobile Applications,”
... Mobile Networks and Applications, vol. 3, pp. 5–32, 1998.
appearance Appearance { [2] Document Object Model (DOM) Level 1 Specification, W3C
texture ImageTexture { Recommendation, October, 1998. http://www.w3.org/TR/REC-
url 1 DOM-Level-1/
repeatS TRUE [3] http://www.alphaworks.ibm.com/tech/xml4c/
repeatT TRUE [4] ISO/IEC 14496-1:1999 Information technology - Coding of audio-
} geometry Bitmap { visual objects - Part 1: Systems ISO/IEC JTC1/SC29/WG11 N2501,
... 1999.
AT 1000 { REPLACE Switch3002.whichChoice BY 0 } [5] ISO/ICE FDIS 14772:200x, Information Technology-Computer
AT 20000 { REPLACE Switch3002.whichChoice BY 1 } graphics and image processing--The Virtual Reality Modeling
AT 20000 { REPLACE Switch1000.whichChoice BY 0 } Language (VRML)
AT 43000 { REPLACE Switch1000.whichChoice BY 1 }
[6] ISO/IEC xxxxx:200x, X3D, Information technology -- Computer
graphics and image processing -- X3D.
Figure 9. A portion of BIFS text corresponding XMT-Ω in figure 8
[7] M. Kim, S. Wood, L.T. Cheok, “Extensible MPEG-4 textual format
(XMT),” in Proc. on ACM multimedia 2000 workshops, Los
All the XMT and BIFS text which are shown the above, are Angeles, California, United States, 2000, pp. 71–74.
generated automatically from the visual scene. [8] S. Battista, F. Casalino and C. Lande, “MPEG-4: A Multimedia
Standard for the Third Millennium, Part 1,” IEEE Multimedia, vol. 6,
3.5 XMT Parsing no. 4, pp.74–83, 1999.
[9] Synchronized Multimedia Integration Language (SMIL) 1.0
The XMT framework is based on XML, thus valid XMT Specification, W3C Recommendation, June, 1998.
element nesting can be defined in the Document Type http://www.w3.org/TR/1998/REC-smil-19980615
[10]WG11(MPEG), MPEG-4 Overview (V.16 La Baule Version)
Declaration (DTD) and parsed using XML parser. XML4C[3] is
document, ISO/IEC JTC1/SC29/WG11 N3747, October 2000.
used as a validating XML parser written in a portable subset of [11]WG11(MPEG), MPEG-4 Overview (V.18 Singapore Version)
C++ for parsing XMT documents. document, ISO/IEC JTC1/SC29/WG11 N4030, March 2001.