Using fuzzy logic for analysis and optimization of color-
                                difference databases
                                Rafael Huertas1,∗, Daniel Arranz2, Pedro Latorre-Carmona3 and Samuel Morillas4

                                Dpto. de Óptica, Universidad de Granada, Avda. de Fuentenueva s/n, 18071, Granada (Spain)
                                2 Escuela de Ing. Informática, Universidad de Valladolid, P.º de Belén, 15, 47011 Valladolid (Spain)

                                3 Dpto. de Ingeniería Informática, Universidad de Burgos, Avda. Cantabria s/n, 09006. Burgos (Spain)

                                4 Instituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València, Camino de Vera s/n,

                                46022, Valencia (Spain)


                                                Abstract
                                                The precise measurement and specification of color, as well as the measurement of color differences
                                                between two color pairs of samples, are very important issues in Colorimetry, with applications in
                                                various fields such as automotive industry, textiles, agriculture, healthcare, etc. This relevance is
                                                especially pronounced in areas where color is not only an attribute but also adds significant value to
                                                the final product. Research carried out in this topic is mainly based on psychophysics experimentations
                                                on color differences perception. In this work, an exhaustive analysis of various color-difference
                                                databases has been carried out using data analysis techniques through fuzzy logic. The ultimate goal of
                                                this analysis has been to identify pairs of colors that are inconsistent compared to other pairs, in order
                                                to improve the quality and consistency of the initial databases. To achieve this, a methodology based
                                                on detecting discrepancies between visually perceived color differences and calculated color
                                                differences compared to the rest of the data has been implemented, using fuzzy logic methods. The
                                                result of this work is the identification, analysis, and elimination of pairs of colors considered
                                                inconsistent in various databases. This data cleansing significantly contributes to improving the quality
                                                of color-difference databases, which play a fundamental role in the development and evaluation of new
                                                color-difference formulas. In this way the results of these formulas align more precisely with the visual
                                                perception of color differences by the human visual system.

                                                Keywords
                                                Fuzzy logic, color differences, STRESS index. 1


                                1. Introduction
                                In the field of color differences, databases are essential, both for the development of new color
                                difference formulas and checking their performance [1]. Each of the data in these databases is
                                formed by the color coordinates of a pair of color stimuli and the perceived color difference
                                between them, V, which is measured through psychophysical experiments, and is the average
                                of a considerable number of observers and/or several repetitions [2]. Thus, producing these
                                databases is both time and resources consuming.
                                   With the color coordinates of each of the stimuli in the pair, a color difference between them
                                can be computed using different mathematical formulas, which can be just the Euclidean
                                distance or other more sophisticated formulas. The calculated color difference is generally called
                                E, and it is desirable that V and E been as correlated as possible throughout the entire
                                database.


                                ∗ Corresponding author.

                                   rhuertas@ugr.es (Rafael Huertas); daniel.arranz.ort@gmail.com (Daniel Arranz); plcarmona@ubu.es (Pedro
                                Latorre-Carmona); smorillas@mat.upv.es (Samuel Morillas)
                                   0000-0001-6606-0151 (Rafael Huertas); 0000-0001-6984-5173 (Pedro Latorre-Carmona); 0000-0001-9262-
                                6139 (Samuel Morillas)
                                           © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   The measurement of agreement between perceived and calculated color differences
constitutes another field of study within Colorimetry. Different statistical measures have been
proposed, and today the STRESS index [3] is one of the most used due to its best properties.
   As we have mentioned, the consistency of databases is important for their reliability. We
consider an inconsistency when two data, which must be close in a color space, and whose
distance between the colors of the pair is similar, have very different perceived or computed
color differences. The CIELAB color space is used in this work [4]. The closeness of the two data
in the color space is computed by the Euclidean distance in CIELAB (Eab), while the distance
between the two colors of each pair, E, is computed by the CIEDE2000 (E00) [5], the current
CIE/ISO-recommended color-difference formula. Other possibilities of color spaces and color
difference formulas can be explored. To study the consistency of the database we will use fuzzy
logic techniques, as described in the following previous works [2, 3], where two fuzzy logic rules
are proposed.
   The objective of this work is to analyze the effects of the different parameters and variables
involved in the definition of this consistency analysis method. Preliminary results are shown
here. A deeper analysis in the database is desirable to eventually provided cleaned version of the
database.

2. Method
   In [6, 7] authors have developed different fuzzy rules to identify the inconsistent data in a
color-differences dataset. Here we analyze several experimental datasets through the
application of this method. In addition, instead of a fixed threshold as in the published papers, in
this work we study the effect of considering any value for the threshold in the application of the
method.

2.1. Color-differences databases
In this work, a set of databases obtained from a series of psychophysical experiments will be
considered. Specifically, four different databases with different structure and composition are
studied. Each data of these databases consists of a pair of colors, specified by their color
coordinates in a color space (i.e. CIELAB), the perceived color difference by a panel of observers
and sometimes other significant data.
   The considered databases are:

   •   COMData

   The combined dataset was used in the development and performance testing of CIEDE2000
color-difference formula [5]. Besides, it has been used extensively for formula development and
testing since its built, combining (COM) four different datasets: BFD-P, Witt, Leeds, and RIT-
Dupont, as can be seen in [8]. Thus, it is an important and well-known database in the field of
Colorimetry, used and studied in different works. It is one of the most extensive databases, with
3813 data points. The color differences, in CIELAB units, range from 0.04 to 18.21, average of
2.67 and standard deviation of 2.30. A complete description of this database can be found in [8].

   •   Pointer

   Developed by researcher Michael Pointer and Geoffrey G. Attridge, at United Kingdom. It has
1308 color pairs distributed across 27 color centers. For each one, approximately 48 points,
varying in L*, a*, and b* have been tested. The color differences, in CIELAB unit, range from 0.88
to 26.21, average of 8.91 and standard deviation of 4.47. Some details can be found in [9].
   •   LCAM-WDC

   Developed by researcher Michal Vik in the Faculty of Textile Engineering of the University of
Liberec, in Czech Republic. It has 284 color pairs around 9 color centers. The color differences,
in CIELAB units, range from 0.05 to 13.72, average of 2.53 and standard deviation of 2.64.

   •   RIT-Dupont-Ind

   The complete name of the database is RIT-DuPont Supra-Threshold Color-Tolerance
Individual Color-Difference Pair. This is the complete database of an extensive experiment
conducted at the Rochester Institute of Technology (RIT) in the USA. These original judgments
were transformed, by probit analysis, to 156 color pairs in the so-called RIT-DuPont dataset,
which is one of the four included in the COMData. Thus, it is the extensive original dataset, which
would be worthy analyses independently of the synthesized version inside COMDATA. It has 828
color pairs, distributed across 19 color centers. Each experimental data was assessed by a panel
of 50 observers, consequently this database is considered as one of the most reliable. The color
differences, in CIELAB units, range from 0.03 to 5.42, average of 1.50 and standard deviation of
0.64. A complete description of this database can be found in [10].

2.2. Standardized Residual Sum of Squares (STRESS)
The figure of merit known as STRESS[3] (Standardized Residual Sum of Squares) is used in a not
insignificant number of different contexts [11] to check the accuracy of a model, i.e. color
difference formulae. In this work, STRESS is considered to evaluate the agreement between the
set of visual data (V) and the corresponding color differences (E) computed with different
formulae. The lower the value of STRESS, the better agreement between the two sets, V and E.
   Assuming that a color difference formula is accurate enough, STRESS can be considered as a
measurement of the consistency of certain dataset. Therefore, removing inconsistent data will
imply a reduction in the value of STRESS.
   As shown in the previous paper[3], to compute STRESS, a previous factor (scaling factor, F)
must be computed, to scale either V or E, in such a way that both are in the same scale. Please,
note that in Colorimetry it is not important that the scales of V and E could be different, while
V and E are correlated. In [3] it can also be seen that the STRESS index can be computed by
three different equations, each one of them having a different scaling factor, called F1 (applied to
V), F2 (applied to E), and F3 (applied to V).

2.3. Fuzzy rules
Each data of the dataset is compared with the rest of the data to inquire into its consistency.
Thus, let us consider two data (i.e. two color-pairs), which are close in the color space. The two
data are considered inconsistent if any of these cases applied:

   •   The two pairs are very close in color space and have similar E values but very different
       V values.
   •   The two pairs are very close in color space and have similar V values but very different
       E values.

   Thus, the important variables are the distance between the color pairs and the comparison
between the perceived and computed color differences. All this reasoning is formulated by
means of vague linguistic terms; thus, fuzzy logic can be used for its numerical representation.
The two former cases, were implemented in two fuzzy rules, known as Fuzzy Rule 1 (FR1) and
Fuzzy Rule 2 (FR2), which detect inconsistencies in a color-differences dataset. The details can
be found in [7]. Please, note that in the definition of these fuzzy rules, 1 and 2 are identical
exchanging E and V. Fuzzy Rule 1 focuses on evaluating inconsistency based mainly on
differences in perceived color difference (V) between data pairs, while Fuzzy Rule 2 does this
in computed color differences (E). Usually, the same data inconsistent with a fuzzy rule are also
with the other, but this is not always the case.
    The first condition, that the two pairs are close, is related to the distance in the corresponding
color space. In fuzzy logic, for an inference rule to compute certainty of the rule consequent it
needs the antecedent to have not null certainty. Otherwise, no inference is carried out. The
certainty of the antecedent for two color pairs is called fuzzy degree of neighborhood and it is
defined for each pair considering the rest of pairs in the database. It takes into account the
distance and the similarity of V between the two data.
    Therefore, these fuzzy rules are applied for each color pair with respect to all other color pairs
in the set to find out whether there exist inconsistencies. Eventually, all possible color pairs are
compared. It may happen that for some color pairs there is no other enough similar color pairs
to be compared to. This should not be confused with color pairs for which there indeed exist
similar color pairs to be compared to but no inconsistence if observed for them. To distinguish
between these two cases, we propose to compute what is called the number of fuzzy neighbors
for each color pair. This is computed as the sum of the certainty of the antecedent of the fuzzy
rule when computed for all other data. For the result of the fuzzy rule for a color pair to be
considered representative, its number of fuzzy neighbors must be at least equal to 1. Otherwise,
the procedure considers that there is not enough evidence to make conclusions about the color
pair.
    For the application of any of defined fuzzy rules it is necessary that V and E are on the same
scale, so a normalization of V is carried out following the procedure in [3]. There are two
possibilities to scale V, F1 or F3. Similar results are obtained for both, but F3 is preferable, as
shown in a previous work [12], and it is considered in this study.
    Finally, these fuzzy rules provide a number, Iij in the interval [0,1], representing the degree of
inconsistency between the two data. Higher values (its maximum value being 1) indicate greater
inconsistency between the couple of color pairs. We will analyze and compare the results after
the application of each of the two fuzzy rules, and the application of the combination of both, i.e.
joining the inconsistency data from the application of both fuzzy rules, with means the logical
operator FR1 or FR2.
    Once the degree of inconsistency is computed, a threshold value must be defined in order to
consider an inconsistency or not. The threshold represents the measure of how similar two pairs
of colors need to be to be considered consistent. When the threshold is low, such as 0.05, a very
high similarity value between the pairs of colors is required for them to be considered consistent.
Therefore, it is more likely that inconsistent pairs will be detected. Authors in [7] considered 0.5
as a reasonable value for a specific database. In this work the results have been computed for
thresholds between 0 and 1 with a step of 0.05.
    In the next step, in order to clean the database, it is decided which of the two color-pairs must
be removed when the degree of inconsistency is higher than the threshold. The decision is to
remove, between the two pairs, the one with higher difference between V and E.

3. Results and conclusions
Firstly, the distribution of experimental data based on the number of fuzzy neighbors is analyzed
for the different databases.
Figure 1: Number of fuzzy neighbors for COM (top left), Pointer (top right), LCAM-WDC (bottom
left) and Rit-Dupont-Ind. (bottom right).

    Fig. 1 shows the clustering of the experimental data from the four databases based on the
number of fuzzy neighbors (the total fuzzy degree of neighborhood sum). When the number of
fuzzy neighbors is lower than 0.5, there is a significant lack of surrounding information to
determine the quality of an experimental datum. In this scenario, the evaluation can be highly
uncertain due to the scarcity of nearby or similar reference points. It is desirable a number of
fuzzy neighbors at least equal to 1, or higher, to make informed decisions about their consistency
of the data. In Fig. 1 we can see that most of the data in Poiter database has too low number of
fuzzy neighbors, while in COM, RIT-Dupon-Ind and LCAM-WDC databases most of the data has
enough fuzzy neighbors to do the comparison.
    Once we know the number of fuzzy neighbors, the degree of inconsistency can be worked out
between each combination of two data in each database. Fig. 2 shows the number of inconsistent
pairs as a function of the threshold. These plots provide valuable insight into how the choice of
threshold affects the identification of inconsistent experimental data.
    As expected, the plots show an increase in the number of inconsistent pairs as the threshold
decreases and vice versa. The number of inconsistent pairs against the thresholds has a tendency
between linear and exponential depending on the database. In some databases, the number of
inconsistent pairs reaches zero as the threshold increases, suggesting that the data is highly
consistent within that threshold range. However, this point can vary depending on the specific
dataset, highlighting the importance of customizing the threshold according to the context and
requirements of the study. Accordingly, initially the most consistent database is the COMData. In
general, some differences can be found between the two fuzzy rules, also depending on the
database. A particular case is RIT-Dupont-Ind, where only RF2 obtains inconsistencies. The
accuracy of this database makes that no inconsistencies at all are found with FR1, and only few
with FR2.
Figure 2: Number of inconsistent pairs as a function of the threshold for COMData (upper left),
Pointer (upper right), LCAM-WDC (bottom left) and Rit-Dupont-Ind. (bottom right).

   The next step, after identifying the inconsistencies, is cleaning and optimizing the database.
To check the efficiency of the process, the STRESS value is compared with the uncleaned/initial
version. This methodology can give us the criterion to choose the threshold: A threshold can be
admitted if/while the STRESS improves considerably when removing the corresponding
inconsistent data. Obviously, the quality of the database will improve as more inconsistent data
are removed. However, there is a critical point in this process. There comes a point when the
experimental data remaining in the database after removal are equally consistent compared to
those that were removed. At this point, the STRESS value stops improving and remains
practically constant. This approach seeks to find a balance between improving the quality of the
database by removing inconsistent data and preserving the original nature of the database. If too
many data are removed, the database could lose its original representation and become a
smaller, but highly consistent, set. Therefore, this criterion aims to find the balance at which the
quality of the database is maximized without losing its initial integrity.
   Fig. 3 shows the results for the COMData. The combination of the two fuzzy rules is the most
effective approach, removing more data and achieving the lowest STRESS values. This result will
be general for any database. In this case, thresholds between 0.55 and 0.50 get the lowest
improvement in STRESS, being these values the desired balance.
Figure 3: STRESS values for the COMData after removing the inconsistent pairs determined by
the FR 1, FR2 and the combination, as a function of the threshold. For each threshold the STRESS
value and the number of removed data, in brackets, are given.


Figure 4: STRESS values for the Pointer after removing the inconsistent pairs determined by the
FR 1, FR2 and the combination, as a function of the threshold. For each threshold the STRESS
value and the number of removed data, in brackets, are given.

    In the case of Pointer, similar results are obtained, but FR1 is better than FR2. In this case, a
threshold of 0.5 is a suitable candidate. This means that by removing experimental data below
this threshold, a substantial improvement in the STRESS level is achieved, as it can be seen in
Fig. 4. For FR2, less than 4.2% of the data, 6.0% for FR1, and for the combination less than 9.5%
of the data would be removed.
Figure 5: STRESS values for the LCAM-WDC after removing the inconsistent pairs determined
by the FR 1, FR2 and the combination, as a function of the threshold. For each threshold the
STRESS value and the number of removed data, in brackets, are given.

   For the LCAM-WDC database, as it is shown in Fig. 5, the results are not so lineal. Three
different values of the threshold could be considered. The most conservative one is 0.95,
achieving an important improvement in the STRESS value, by removing only 2 data (0.7%). This
means that these 2 data are highly inconsistent in this dataset and could be a typo. Threshold
must decrease until 0.75 to find more inconsistent data (3 data). Again, a good balance seems to
be a threshold of 0.5, which removes 10 data (3.5%) and reduces the STRESS value in 1.2 points.


Figure 6: STRESS values for the RIT-Dupont-Ind after removing the inconsistent pairs
determined by the FR1, FR2 and the combination of both, as a function of the threshold. For each
threshold the STRESS value and the number of removed data, in brackets, are given.

   Fig. 6 shows the result of the RIT-Dupont-Ind database, which is a special case, as commented
above. FR 1 is no able to detect inconsistencies, thus FR2 and the combination give the same
results. This is a quite consistent dataset, since inconsistencies only appear for threshold values
lower than 0.80 (3 data, 0.36%) and 0.55 (4 data, 0.48%). Even with the extremely low threshold
value of 0.1, only 25 data (3%) can be considered as inconsistent.
   As a result of this analysis, we have observed that applying fuzzy logic to color-differences
databases provides substantial information about the consistency of them. Also, it is possible to
identify and remove the inconsistencies up to a balance between improvement and reducing the
number of data is achieved. These results indicate that optimal threshold values vary
significantly depending on the database and the applied rule. Finding a balance is crucial, as
overly low thresholds can lead to excessive removal of useful data, while overly high thresholds
may fail to detect inconsistencies.

References
[1] M. Melgosa, "Request for existing experimental datasets on color differences," Color
     Research & Application: Endorsed by Inter‐Society Color Council, The Colour Group (Great
     Britain), Canadian Society for Color, Color Science Association of Japan, Dutch Society for
     the Study of Color, The Swedish Colour Centre Foundation, Colour Society of Australia,
     Centre Français de la Couleur 32, 159 (2007).
[2] E. Kirchner, N. Dekker, M. Lucassen, L. Njo, I. van der Lans, P. Koeckhoven, P. Urban, and R.
     Huertas, "How color difference formulas depend on reference pairs in the underlying
     constant stimuli experiment," JOSA A 32, 2373-2383 (2015).
[3] P. A. Garcia, R. Huertas, M. Melgosa, and G. Cui, "Measurement of the relationship between
     perceived and computed color differences," JOSA A 24, 1823-1829 (2007).
[4] C. Colorimetry, "CIE Publication No. 15.2, Central Bureau of the CIE, Vienna, 1986," The
     commonly used data on color matching functions is available at the CIE web site at
     http://www.cie.co.at.
[5] ISO/CIE 11664-6: 2014 (Formerly CIE S 0146/E: 2013), "Colorimetry—Part 6: CIEDE2000
     Colour-Difference Formula," (2014).
[6] S. Morillas, L. Gómez-Robledo, R. Huertas, and M. Melgosa, "Fuzzy analysis for detection of
     inconsistent data in experimental datasets employed at the development of the CIEDE2000
     colour-difference formula," Journal of Modern Optics 56, 1447-1456 (2009).
[7] S. Morillas, L. Gómez-Robledo, R. Huertas, and M. Melgosa, "Method to determine the
     degrees of consistency in experimental datasets of perceptual color differences," JOSA A 33,
     2289-2296 (2016).
[8] M. Melgosa, R. Huertas, and R. S. Berns, "Performance of recent advanced color-difference
     formulas using the standardized residual sum of squares index," JOSA A 25, 1828-1834
     (2008).
[9] M. R. Pointer and G. G. Attridge, "Large colour differences in colour reproduction–the
     relationship between print acceptability and colour difference," The Journal of Photographic
     Science 44, 155-164 (1996).
[10] R. S. Berns and B. Hou, "RIT‐DuPont supra‐threshold color‐tolerance individual color‐
     difference pair dataset," Color Research & Application 35, 274-283 (2010).
[11] P. Latorre-Carmona, R. Huertas, M. Pedersen, and S. Morillas, "Proposal of a new fidelity
     measure between computed image quality and observers quality scores accounting for
     scores variability," Journal of Visual Communication and Image Representation 90, 103704
     (2023).
[12] D. Arranz, R. Huertas, P. Latorre-Carmona, and S. Morillas, "Optimization of a color
     difference database by fuzzy logic," in XIV Reunion Nacional de Optica, Anonymous
     (Accepted).