A MULTI-CHANNEL OBJECTIVE MODEL FOR THE FULL REFERENCE ASSESSMENT OF COLOR PICTURES Francesca De Simone, Michael Ansorge and Touradj Ebrahimi Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland ABSTRACT an experiment with human subjects so as to understand “how the color information affects the overall assessment of This paper presents a new approach for the design of a full the distorted image with respect to the perceived quality on reference objective quality metric for the assessment of the luminance-only version of the same distorted image”. color pictures. Our goal is to build a multi-channel metric The test methodology is described in Section 2. The task is based on the perceptual weighting of single-channel metrics. then to fit an objective model to the subjective data resulting A psycho-visual experiment is thus designed in order to from this test, as described in Section 3.. We refer to this determine the values of the weighting factors. This metric is objective model as “multi-channel metric”, since it consists expected to provide a new useful tool for the quality of the weighted average of single-channel quality measures assessment of compressed pictures in the framework of on the luminance channel and the two chrominance channels codec performance evaluation. of the picture. Conclusions are presented in Section 4. 1. INTRODUCTION 2. PSYCHO-VISUAL TEST METHODOLOGY The compression efficiency of an image coding algorithm In our experiment the test material is presented to the expresses its ability to maximize the visual quality of the subjects according to a slightly modified version of the compressed data while minimizing the number of bits used Double Stimulus Continuous Quality Scale (DSCQS) to store the data, for a range of compression rates. Being the method [1]. Pairs of pictures are shown, with the reference human subjects the end users of the digital data, subjective picture on one side of the screen and a distorted version on tests can be performed, where a significant sample of human the other side. The assessor is asked to judge the quality of subjects is asked to rate the quality of the processed the distorted picture with respect to the reference, and to rate material. Since these tests are time consuming and it by choosing a rate on a continuous scale in the range expensive, usually objective metrics are used in order to going from 0 (Very bad quality) to 100 (Excellent quality). assess the quality of the compressed images. These metrics This test is performed by considering pairs where the true are called Full-Reference (FR) quality metrics, because they color images are shown (i.e. reference image and distorted assume as input both the original image (i.e. reference) and version in the same screen); and the corresponding pairs the compression version of it. A substantial effort has been where just the luminance components of each true color recently deployed by the research community to design image previously considered are displayed (i.e. luminance objective visual quality metrics which achieve a good component of the reference shown together with the correlation with the subjective quality evaluation. luminance component of the distorted image). The Nevertheless, most of the well-known and widely used FR luminance-only and true color pairs are randomly mixed quality metrics take into account only the luminance channel while assuring that luminance-only and full color displays of the picture under analysis. Hence, in the state of practice, are always consecutive. The test room conditions, as well as the quality performance evaluation and optimization of full the number of subjects, the test duration, the training session color image algorithms are usually done by mean of details and the processing of the subjective data are assumed methods which are applied on the luminance component to be standard compliant [1]. only of the signals. Due to the evident influence of the color By computing for each test condition the mean score over information on the human perception of the quality of visual the entire set of subjects’ rates, two sets of Mean Opinion data, this approach is of course limiting a priori the Score (MOS) results are obtained: one set of subjective correlation that can be met with such a single-channel scores is referred to the quality evaluation of luminance only metric compared to the subjective judgment. stimulus and we will call it MOSluma; the other set is This paper presents a general model for designing a metric related to the quality assessment of the full color stimulus for the assessment of color pictures, by exploiting data and we call it MOScolor. The first interesting analysis made collected from human subjects by means of a properly at this stage is related to the influence of the color designed psycho-visual experiment. The idea is to perform information on the overall quality assessment by the human subject: the same distorted picture can be rated in different ratio evaluated between the MOSluma and the PSNRy. ways if the subject sees the true color image or just the These three measures are the standard tools used to compare luminance component of the true color image. the performance of an objective metric to the subjective scores. They provide an estimate of the accuracy, 2.1. Dataset selection monotonicity and consistency of the objective metric under analysis [1]. A system of equations is thus defined, For choosing the input data to be used in the psycho-visual imposing that the values of these three indexes computed experiment, as a starting step we restrict our field of between the MOScolor and the multi-channel metric are the investigation by considering a dataset of pictures including known values computed for the luma only objective and only natural images and the distortions introduced by three subjective data. By solving the system of equations, the different JPEG compression algorithms specified hereafter. values of the weighting factors are defined. Five different contents, having different spatial and color Further conditions could be that the sum of the values of the features, have been selected for building our test set. The three weighting factors is equal to 1 and that the weights of selected pictures are 8 bits per channel high resolution the two chrominance channels are the same. The pictures chosen from the dataset established by Microsoft investigation of more sophisticated conditions is also and T. Richter [2]. The test pictures are produced by 4:4:4 currently under progress. coding using i) JPEG with conventional visually optimized quantization matrix, ii) JPEG 2000 with frequency 4. CONCLUSIONS weighting of quantization steps, and iii) the new coding algorithm recently proposed by Microsoft and currently The final goal of the proposed approach is to design a under evaluation by the JPEG standardization body under general method for the extension of a generic mono-channel the name of JPEG XR [3]. Five levels of quality of each metric to a multi-channel case, in order to achieve a better compressed data have been considered. Our dataset is thus correlation with the end user quality judgment compared to composed of 5X3X5=75 test pictures, corresponding to 5 the objective metric applied on the luminance channel only. different original contents, 3 different coding techniques and One of the main advantage of our approach is that it allows 5 different samples in the perceivable quality difference the usage of well-know single-channel metrics (i.e. PSNR, range of interest. Finally, since we are using very high SSIM) in the multi-channel scenario, thus providing a new resolution input data, the subjective data are collected by tool to researchers who are already applying these metrics presenting just a selected representative crop of each picture for the luminance-only objective quality assessment, by in order to fit the data in the native resolution of the screen simply suggesting the right weights to be used for the (e.g. 1920x1440 pixels screen resolution). weighting of the three channels. The psycho-visual experiment described in this paper is 3. SUBJECTIVE TO OBJECTIVE DATA FITTING currently running, so that the analysis of the results and the METHODOLOGY: A BASIC APPROACH proposition of more sophisticated fitting methodologies will be reported in future publications. The results of the subjective experiment described in the previous section can be used in order to extend any generic 5. REFERENCES mono-channel metrics to the multi-channel case. In particular, to illustrate the proposed approach, we will [1] Recommendation ITU‐R BT.500‐11, Methodology for the consider, as an example, the application of our method to subjective assessment of the quality of television pictures, Geneva, the widely used Peak Signal to Noise Ratio (PSNR) metric. June 2002. Considering the representation of an image in the Y’CbCr [2] Test dataset available on the JPEG committee sftp server, color space [4], the PSNR metric can be applied on each http://www.jpeg.org. color channel by computing the PSNR index on the luminance (PSNRy) and to the two chrominance [3] S. Srinivasan, C. Tu, S.L. Regunathan, R.A. Rossi, G.J. components (PSNRcb and PSNRcr). A multi-channel metric Sullivan, HD Photo: a new image coding technology for digital can thus be built by simply considering the weighted photography, Proceedings of SPIE, Applications of Digital Image average of the PSNR indexes computed on the single Processing XXX, vol. 6696, San Diego, CA USA, August 2007. channels (i.e. multi-channel PSNR= w1PSNRy + w2PSNRcb + w3PSNRcr). The values of the weighting [4] Recommendation ITU‐R BT.601-6, Studio encoding factors w1, w2 and w3 are defined by maximizing the parameters of digital television for standard 4:3 and wide screen 16:9 aspect ratio, Geneva, January 2007. correlation between the multi-channel model and the subjective results collected on the full color inputs (MOScolor). In particular, we assume as “reference values” the values of the Pearson correlation coefficient, the Spearman rank order correlation coefficient, and the outliers