A Framework and User Interface for Automatic Region Based Segmentation Algorithms Kevin McGuinness, Gordon Keenan, Tomasz Adamek, Noel O’Connor Abstract— In this paper we describe a framework and tool developed Region-Map Format: The framework encodes region-maps us- for running and evaluating automatic region based segmentation algo- ing an efficient, portable format based on a subset of PNG. This rithms. The tool was designed to allow simple integration of existing allows segmenting video sequences with minimal space overhead. and future segmentation algorithms, both single image based algorithms and those that operate on video data. Our framework supports plug-in User Interface: The user interface provides a lot of function- segmenters, media decoders, and region-map codecs. We provide several allity, including automatic decoder selection, concurrent browsing of sophisticated implementations of these plug-ins, including a video decoder video frames and segmented images, selected-range segmentation, capable of frame accurate decoding of a large variety of video formats, an useful visualization methods, and a simple interface for selecting image decoder which also handles a comprehensive collection of formats, and a efficient implementation of a region-map codec. The tool includes algorithms and their parameters. both a graphical user interface to allow users to browse, visually inspect, Batch Processing Interface: The batch processing interface and evaluate the algorithm output, and a batch processing interface for allows command line segmentation of large image/video collections. segmentation of large data collections. All the parameters that can be selected in the graphical user interface The application allows researchers to focus more on the development and evaluation of segmentation methods, relying on the framework for can be input into a parameter file. Files, ranges and increments can encoding/decoding input and output, and the front end for visualization. be selected for highly configurable segmentation. Index Terms— Image Segmentation, Video Segmentation, Framework, III. A RCHITECTURAL OVERVIEW User Interface, Integration, Evaluation. I. I NTRODUCTION Several different approaches to segmentation were developed and contributed by each of the partners in the K-Space1 project. Each method has its own particular merits and limitations, often as a result of being designed with a different application domain in mind. Generally, each tool has its own unique interface, and can only accept one or two input formats. Output formats also tend to vary across tools. With such a rich set of tools, the task of selecting and integrating the best tool for a given experiment or domain is time consuming and non-trivial. Automatic evaluation of segmentation algorithms is a very difficult task. The effectiveness of an algorithm in a domain (semantic reasoning applications, search tasks) is often not possible to evaluate automatically. Most automatic evaluation methods compare, in some way, a manual human segmentation with an automatic segmentation, and produce a measure of the match. This is not usually a adequate Fig. 2. High Level Overview of Software Architecture. representation of the usefulness of a segmentation in an application context. A user, however, may be able to intuitively determine what The framework is arranged into three main areas. The top-level algorithm would be best for a particular domain context by simply module, the Application, hosts the user interface, user preferences, examining some segmentation results. batch processing interface and integration logic. The application As one of out research activities is development, testing and layer implements all of its encoding, decoding and segmentation via evaluation of segmentation algorithms, we decided that a tool that interfaces specified in the module below this, the External API. This would allow us to easily integrate currently available algorithms, and API consists of a set of interfaces for plug-in developers, as well develop future ones would be invaluable. as commonly required utilities to simplify development. The bottom layer contains of all the plug-ins; built-in plug-ins and externally II. F EATURES AND F UNCTIONALITY developed plug-ins are treated the same. The following is an overview of the main features of the platform. Application: The main components hosted by the Application Image and Video Formats: The framework provides an interface module are the user interface and the batch interface. for seek-able, frame accurate video decoding. The built in video The user interface provides a convenient and powerful way to per- decoder supports many video formats, including MPEG-1, 2, 4, form segmentation operations, parameter selection, frame browsing, Motion-JPEG, Quicktime and WMF. We also provide an image region visualization and plug-in configuration. This interface provides decoder capable of decoding both individual images and sequences of two visualization modes for viewing region maps, contrast stretching key-frames transparently. It supports a large range of image formats, and color averaging mode. including JPEG, PNG, PNM, GIF and BMP. The batch interface is designed for off-line processing of larger data sets. It is completely configurable from a parameter file, including Centre for Digital Video Processing, Dublin City University, Glasnevin, Dublin 9, Ireland. decoder/segmenter/output selection and parameters, input files, ranges 1 K-Space - Knowledge Space of Semantic inference for automaic automatic and increments. Output of batch operations can later be loaded and annotation and retrieval of multimedia content. browsed in the user interface. Fig. 1. Screenshot of the Application User Interface Segmentation: Developers wishing to integrate segmenters must Region Storage: For the standard region map codec provided, implement the Segmenter interface. This includes all the functions we decided to utilize the open and widely accepted PNG format [1]. required to configure parameters and perform the segmentation. When Specifically, the 8 and 16 bit gray-level PNG compression strategies. a segmenter is implemented and added to the platform, the algorithm For region maps of less than 256 regions, we employ the 8-bit gray- name and parameter configuration will appear in the user interface. level encoding strategy, for more regions, the 16-bit gray-level format. The segmentation interface contains a segment responsible for per- The codec can thus support up to 65536 regions. Our experiments forming segmentation on a single frame. For each frame, the segment revealed that the compression rate of the codec was quite favorable. method is passed a context object. This contains information that may A typical segmentation of 10 seconds of MPEG-1 video (resolution be required to perform the operation, including the frame and index, 352x240, frame rate 29.97fps), required less than 500KB of storage. a frame decoder, region map object, and an interface for acquiring Advantages of our chosen format are that it can be viewed previously segmented frames. This design allows each segmentation in various imaging applications, simply by stretching the contrast to be a single operation, while also providing enough contextual between the regions. There are several software libraries for decoding information for segmenters that require previous segmentations or PNG images freely available, like libpng [5], ImageMagick, and JAI frames. It simplifies the integration of single frame based segmenters, ImageIO, making the format suitable for interchange. but provides enough information for segmenters that operate in IV. I NTEGRATED A LGORITHMS the temporal domain. Of course, the internal implementation of a segmenter is entirely up to the developer, who may decide to buffer For our experiments, we integrated the Syntactic Modified RSST previous segmentations internally. In this case, no runtime overhead Algorithm [2], a fast Mean-Shift Algorithm [3], and a version of is incurred by the segmentation. the Normalized Cuts [4] algorithm. Work is currently in progress to integrate more algorithms into the framework. Image and Video Decoder: As the tool is frame based, a single interface is provided for both image and video decoders. This way the V. F UTURE W ORK segmenter can handle single images (sequences of length 1), multiple Possible enhancements for the framework include; more visu- images (e.g. key-frames) and videos in the same way. A powerful set alization algorithms, MPEG-7 region description output, API for of decoders are provided with the application, and the framework’s integrating automatic evaluation tools, and the ability to label regions plug-in mechanism ensures additional decoders can easily added. for semantic reasoning applications. We would also like to use the The tool’s integrated video decoder provides frame-accurate decod- framework components to develop a semi-automatic segmentation ing of a multiple video formats. To achieve this, we decided to use the tool for ground truth generation. ffmpeg audio visual codec library [7] as a base for the video decoder. FFmpeg supports many video formats, so was ideal for our purposes. VI. ACKNOWLEDGMENT However, ffmpeg does not natively support frame-accurate video This material is based upon work supported by by the European seeking. A frame-accurate decoder is required to ensure consistency Commission under contract FP6-027026, K-Space: Knowledge Space across runs and for frame-accurate segmentation. of semantic inference for automatic annotation and retrieval of To attain fast, frame-accurate decoding from an arbitrary stream multimedia content. index, it was necessary to add a video packet parsing layer to de- R EFERENCES termine (and sometimes interpolate) packet presentation timestamps, [1] Portable Network Graphics (PNG): Functional specification, ISO/IEC durations and other necessary information in advance of seeking in 15948:2004, March, 2004. a stream. This and some additional functionallity is provided by the [2] N. OConnor, T. Adamek, S. Sav, N. Murphy, S. Marlow, Qimera: a ffmpeg proxy layer. A standalone C++ and Java interface were built software platform for video object segmentation and tracking, WIAMIS for this layer, and are fully re-usable. 2003, London, pp. 204-209, Apr., 2003. [3] W. Bailerand, P. Schallauer, H. B. Haraldsson, H. Rehatschek, Optimized One advantage of using ffmpeg as a base for the video decoder is mean shift algorithm for color segmentation in image sequences Proceed- that new codecs and improvements are constantly being added to it. ings of the SPIE, Volume 5685, pp. 522-529 (2005). As ffmpeg grows to support more formats, a simple recompilation of [4] J. Shi, J. Malik, Normalized Cuts and Image Segmentation, IEEE Trans- the ffmpeg proxy layer automatically adds this support to the tool. actions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888-905, Aug., 2000. The provided image and key-frame decoder plug-ins use the built- [5] libpng, PNG reference library: http://www.libpng.org/. in Java image decoders as well as the JAI Image-IO library [6], which [6] JAI Image I/O: https://jai-imageio.dev.java.net/. together support a comprehensive collection of image formats. [7] FFmpeg Multimedia System: http://ffmpeg.mplayerhq.hu/.