A MULTI-CHANNEL OBJECTIVE MODEL
                FOR THE FULL REFERENCE ASSESSMENT OF COLOR PICTURES

                         Francesca De Simone, Michael Ansorge and Touradj Ebrahimi

            Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
                        ABSTRACT                                  an experiment with human subjects so as to understand
                                                                  “how the color information affects the overall assessment of
This paper presents a new approach for the design of a full       the distorted image with respect to the perceived quality on
reference objective quality metric for the assessment of          the luminance-only version of the same distorted image”.
color pictures. Our goal is to build a multi-channel metric       The test methodology is described in Section 2. The task is
based on the perceptual weighting of single-channel metrics.      then to fit an objective model to the subjective data resulting
A psycho-visual experiment is thus designed in order to           from this test, as described in Section 3.. We refer to this
determine the values of the weighting factors. This metric is     objective model as “multi-channel metric”, since it consists
expected to provide a new useful tool for the quality             of the weighted average of single-channel quality measures
assessment of compressed pictures in the framework of             on the luminance channel and the two chrominance channels
codec performance evaluation.                                     of the picture. Conclusions are presented in Section 4.

                   1. INTRODUCTION                                    2. PSYCHO-VISUAL TEST METHODOLOGY

The compression efficiency of an image coding algorithm           In our experiment the test material is presented to the
expresses its ability to maximize the visual quality of the       subjects according to a slightly modified version of the
compressed data while minimizing the number of bits used          Double Stimulus Continuous Quality Scale (DSCQS)
to store the data, for a range of compression rates. Being the    method [1]. Pairs of pictures are shown, with the reference
human subjects the end users of the digital data, subjective      picture on one side of the screen and a distorted version on
tests can be performed, where a significant sample of human       the other side. The assessor is asked to judge the quality of
subjects is asked to rate the quality of the processed            the distorted picture with respect to the reference, and to rate
material. Since these tests are time consuming and                it by choosing a rate on a continuous scale in the range
expensive, usually objective metrics are used in order to         going from 0 (Very bad quality) to 100 (Excellent quality).
assess the quality of the compressed images. These metrics        This test is performed by considering pairs where the true
are called Full-Reference (FR) quality metrics, because they      color images are shown (i.e. reference image and distorted
assume as input both the original image (i.e. reference) and      version in the same screen); and the corresponding pairs
the compression version of it. A substantial effort has been      where just the luminance components of each true color
recently deployed by the research community to design             image previously considered are displayed (i.e. luminance
objective visual quality metrics which achieve a good             component of the reference shown together with the
correlation with the subjective quality evaluation.               luminance component of the distorted image). The
Nevertheless, most of the well-known and widely used FR           luminance-only and true color pairs are randomly mixed
quality metrics take into account only the luminance channel      while assuring that luminance-only and full color displays
of the picture under analysis. Hence, in the state of practice,   are always consecutive. The test room conditions, as well as
the quality performance evaluation and optimization of full       the number of subjects, the test duration, the training session
color image algorithms are usually done by mean of                details and the processing of the subjective data are assumed
methods which are applied on the luminance component              to be standard compliant [1].
only of the signals. Due to the evident influence of the color    By computing for each test condition the mean score over
information on the human perception of the quality of visual      the entire set of subjects’ rates, two sets of Mean Opinion
data, this approach is of course limiting a priori the            Score (MOS) results are obtained: one set of subjective
correlation that can be met with such a single-channel            scores is referred to the quality evaluation of luminance only
metric compared to the subjective judgment.                       stimulus and we will call it MOSluma; the other set is
This paper presents a general model for designing a metric        related to the quality assessment of the full color stimulus
for the assessment of color pictures, by exploiting data          and we call it MOScolor. The first interesting analysis made
collected from human subjects by means of a properly              at this stage is related to the influence of the color
designed psycho-visual experiment. The idea is to perform         information on the overall quality assessment by the human
subject: the same distorted picture can be rated in different     ratio evaluated between the MOSluma and the PSNRy.
ways if the subject sees the true color image or just the         These three measures are the standard tools used to compare
luminance component of the true color image.                      the performance of an objective metric to the subjective
                                                                  scores. They provide an estimate of the accuracy,
2.1. Dataset selection                                            monotonicity and consistency of the objective metric under
                                                                  analysis [1]. A system of equations is thus defined,
For choosing the input data to be used in the psycho-visual       imposing that the values of these three indexes computed
experiment, as a starting step we restrict our field of           between the MOScolor and the multi-channel metric are the
investigation by considering a dataset of pictures including      known values computed for the luma only objective and
only natural images and the distortions introduced by three       subjective data. By solving the system of equations, the
different JPEG compression algorithms specified hereafter.        values of the weighting factors are defined.
Five different contents, having different spatial and color       Further conditions could be that the sum of the values of the
features, have been selected for building our test set. The       three weighting factors is equal to 1 and that the weights of
selected pictures are 8 bits per channel high resolution          the two chrominance channels are the same. The
pictures chosen from the dataset established by Microsoft         investigation of more sophisticated conditions is also
and T. Richter [2]. The test pictures are produced by 4:4:4       currently under progress.
coding using i) JPEG with conventional visually optimized
quantization matrix, ii) JPEG 2000 with frequency                                       4. CONCLUSIONS
weighting of quantization steps, and iii) the new coding
algorithm recently proposed by Microsoft and currently            The final goal of the proposed approach is to design a
under evaluation by the JPEG standardization body under           general method for the extension of a generic mono-channel
the name of JPEG XR [3]. Five levels of quality of each           metric to a multi-channel case, in order to achieve a better
compressed data have been considered. Our dataset is thus         correlation with the end user quality judgment compared to
composed of 5X3X5=75 test pictures, corresponding to 5            the objective metric applied on the luminance channel only.
different original contents, 3 different coding techniques and    One of the main advantage of our approach is that it allows
5 different samples in the perceivable quality difference         the usage of well-know single-channel metrics (i.e. PSNR,
range of interest. Finally, since we are using very high          SSIM) in the multi-channel scenario, thus providing a new
resolution input data, the subjective data are collected by       tool to researchers who are already applying these metrics
presenting just a selected representative crop of each picture    for the luminance-only objective quality assessment, by
in order to fit the data in the native resolution of the screen   simply suggesting the right weights to be used for the
(e.g. 1920x1440 pixels screen resolution).                        weighting of the three channels.
                                                                  The psycho-visual experiment described in this paper is
  3. SUBJECTIVE TO OBJECTIVE DATA FITTING                         currently running, so that the analysis of the results and the
      METHODOLOGY: A BASIC APPROACH                               proposition of more sophisticated fitting methodologies will
                                                                  be reported in future publications.
The results of the subjective experiment described in the
previous section can be used in order to extend any generic                              5. REFERENCES
mono-channel metrics to the multi-channel case. In
particular, to illustrate the proposed approach, we will          [1] Recommendation ITU‐R BT.500‐11, Methodology for the
consider, as an example, the application of our method to         subjective assessment of the quality of television pictures, Geneva,
the widely used Peak Signal to Noise Ratio (PSNR) metric.         June 2002.
Considering the representation of an image in the Y’CbCr
                                                                  [2] Test dataset available on the JPEG committee sftp server,
color space [4], the PSNR metric can be applied on each           http://www.jpeg.org.
color channel by computing the PSNR index on the
luminance (PSNRy) and to the two chrominance                      [3] S. Srinivasan, C. Tu, S.L. Regunathan, R.A. Rossi, G.J.
components (PSNRcb and PSNRcr). A multi-channel metric            Sullivan, HD Photo: a new image coding technology for digital
can thus be built by simply considering the weighted              photography, Proceedings of SPIE, Applications of Digital Image
average of the PSNR indexes computed on the single                Processing XXX, vol. 6696, San Diego, CA USA, August 2007.
channels (i.e. multi-channel PSNR= w1PSNRy +
w2PSNRcb + w3PSNRcr). The values of the weighting                 [4] Recommendation ITU‐R BT.601-6, Studio encoding
factors w1, w2 and w3 are defined by maximizing the               parameters of digital television for standard 4:3 and wide screen
                                                                  16:9 aspect ratio, Geneva, January 2007.
correlation between the multi-channel model and the
subjective results collected on the full color inputs
(MOScolor). In particular, we assume as “reference values”
the values of the Pearson correlation coefficient, the
Spearman rank order correlation coefficient, and the outliers