<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data bias measurement: a geometrical approach through frames</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Trenta UNINFO UNI CT</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>- In previous papers [8], [9] we discussed ISO/IEC 25000 application when new quality measures are defined. In continuity with papers above we show, through the definition of new data quality measures for bias, how to handle additional and new measures in a SQuaRE perspective. The method proposed is intended applicable in general. In the present paper: - data bias is identified as a quality issue - some notions about frames theory are recalled and - two quality measures for data bias are proposed and - one of them is proposed as ISO/IEC 25024 conforming measure</p>
      </abstract>
      <kwd-group>
        <kwd>data quality</kwd>
        <kwd>measures</kwd>
        <kwd>eigenvalue</kwd>
        <kwd>bias</kwd>
        <kwd>fairness</kwd>
        <kwd>ISO</kwd>
        <kwd>ISO/IEC 25024</kwd>
        <kwd>frame</kwd>
        <kwd>metric</kwd>
        <kwd>AI</kwd>
        <kwd>ML</kwd>
        <kwd>PCA</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        A well-known problem in ML is the bias-variance
dilemma: to find the optimal complexity of the model that
minimize output errors while giving independency from
changes of training dataset (i.e. balancing underfitting and
overfitting) [
        <xref ref-type="bibr" rid="ref17">19</xref>
        ]:
=   2[ ( ;  )− ℎ( )]+   [{ ( ;  )−   [ ( ;  )]}2]
  =  2 +
      </p>
      <p>=
where:
 is the bias
 is the variance
x is the input vector
  is the expected squared output error
ℎ( )is the regression function characterizing the model
y(x;D) is the prediction function of x over the dataset D
So, the expected squared error ED is due to the squared
error generated by the regression function adopted (bias)
and also due to the behavior of the prediction function
around its average for the dataset D (variance), in other
words the sensitivity to the variation of dataset.</p>
      <p>
        The bias-variance decomposition is of low practical
value, because it requires to know all the datasets the
machine will handle, whereas in practice we have only a
single observed dataset and we need to predict\train the
behavior of the machine at its best. Moreover, the U-shaped
error function of the bias-variance optimization doesn’t
hold for deep neural networks [
        <xref ref-type="bibr" rid="ref27">29</xref>
        ].
      </p>
      <p>
        For those reasons, in the following we don’t refer to the
bias-variance dilemma, but simply refer to data bias as
statistical features of a dataset in a ML context i . Such
statistical features can be measured by several indexes (e.g.
Gini, Shannon, see [
        <xref ref-type="bibr" rid="ref14">16</xref>
        ], [
        <xref ref-type="bibr" rid="ref15">17</xref>
        ], [
        <xref ref-type="bibr" rid="ref16">18</xref>
        ]) and the new ones that
we are going to introduce in this paper.
      </p>
      <p>
        Moreover, this paper tries to recall a wider issue: how to
address the manifold of measures that are continuously
discovered, including, but not only, AI measures: in our
view they can be all addressed under the ISO/IEC 25000
umbrella [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ][
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        In the following we introduce an application and some
considerations taken from frame theory and close to the
Principal Component Analysis [
        <xref ref-type="bibr" rid="ref17">19</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Intuitively, PCA
finds the (hyper)ellipsoid that best fit the dataset, by
centering dataset in the origin, and the axis of the
(hyper)ellipsoid are the eigenvectors of the covariance
matrix. In a similar view, we reshape the (hyper)ellipsoid
into an (hyper)sphere and translate the dataset over its
surface to assess its spread with the help of frame theory.
      </p>
      <p>Figure 1 PCA of a multivariate gaussian distribution
(source: Nicoguaro - wikimedia)</p>
      <p>In our application, we consider each sample-point as the
edge of a vector with the other edge in the origin and
measure the overall span of such vectors.</p>
      <p>
        Firstly, we recall the definition of frame and frame
bounds with an example taken from [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>Definition 1: For a Hilbert space Hm of dimension m and
with inner product &lt;·,·&gt; Hm , a finite or countable collection
of vectors   ( ∈ )⸦ Hm is said to be a frame of Hm if there
exist constants 0&lt; c ≤C such that
 ∥  ∥2 ≤ ∑ ∈ |&lt;   ,  &gt;|2 ≤  ∥  ∥2
(1)
for all  ∈  
A frame is said to be tight if  = 
(A)
(B)
The   vectors are the black ones in (A) and (B) and are
frames of R2. Blue and green vectors are instances of  .</p>
      <p>With reference to figure 2 above, the black vectors are
our frame and it is easy to realize that in (A) they are
spreader in the space than in (B).</p>
      <p>It is also understandable at a glance that in (B) the green
vector maximizes, as it forms narrow angles, the sum of dot
product of each black vectors with the green one, leading
to find C; and the blue vector minimizes, as it forms wide
angles, the sum of dot product of each black vectors with
the blue one, so leading to find c.</p>
      <p>In the same way, it is easy to check that in (A) the sum
of the dot products of the green vector with the black ones,
has the same value of the sum1 of the dot product of the
blue vector with the black ones, leading to  =  , so the
frame in (A) is tight.</p>
      <p>Fortunately, it is possible to calculate tightness, that is
the difference between C and c, not by  trial and error but
by the covariance matrix generated with mutual dot
product among vectors, as C and c are respectively its
maximum and the minimum eigenvalue.</p>
      <p>II.</p>
    </sec>
    <sec id="sec-2">
      <title>DATASET AND THE FRAME MODEL</title>
      <p>The first proposed bias measure is based on the
following theorems and definitions [14]:</p>
      <p>-when the frame bounds c and C are equal, a frame is
said to be tight.</p>
    </sec>
    <sec id="sec-3">
      <title>1 We mean the ∥  ∥2normalized squared sum according</title>
      <p>(1)
Defined Φ = ( . .2 ) the matrix NxM that collects
 
vectors   of the frame, then:</p>
      <p>-the upper and lower frame bounds (see C and c above)
of a frame are given by the largest and smallest eigen
values of the frame operator S = Φ Φ T respectively.</p>
      <p>-the non-zero eigen values of the frame operator S are
the same of the non-zero eigen values of the Gram matrix
G= Φ TΦ.</p>
      <p>-the rank of G is M. The M eigenvalues of G are
positive.</p>
      <p>In this proposal we:
(a) handle a numeric dataset as it was a frame: for a set
of N tuples over a set of M attributes, then in the frame
view the number M of attributes is the space dimension and
the number N of tuples is the number of vectors of the
frame;</p>
      <p>(b) then, we measure data bias in the same way we
measure frame tightness, and in particular:</p>
      <p>(b.1) measure difference between upper and lower
bounds of the frame</p>
      <p>(b.2) measure Frame Potential
From those assumptions follows that a non-biased dataset
is found when the corresponding frame is tight.</p>
      <p>The measures b.1 and b.2 can be considered equivalent
for the purpose of evaluating bias of dataset; in this case
even only one of them can be adopted, and the choose
between the two can be driven by the computational effort
required. In this paper we explore mainly the measure b.1.</p>
      <sec id="sec-3-1">
        <title>Measure b.1</title>
        <p>The basis of our analysis is the calculation of lower and
upper frame bounds with the following steps:</p>
        <p>1. Collect a numeric data table with N tuple and M
attributes and define it as a set of N row vectors { Φ1, Φ2,…
ΦN };
Φ1
2. Build the matrix (NxM) Φ = (Φ. .2)
ΦN
then
3. Compute the Gramian matrix (MxM) G = Φ TΦ
4. Compute M eigenvalues λi (i=1,..M) of G
5. Sort the (non-zero) eigenvalues of G in descending
order</p>
        <p>6. Find the upper eigenvalue λmax and the lower
eigenvalue λmin
7. Compute the difference D = λmax - λmin
8. Assess the value of D considering that D = 0 means
a tight frame.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Measure b.2</title>
        <p>The second proposed bias measure is based on the
following theorems and definitions.</p>
        <p>
          The frame potential FP is defined as [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]:
FP = ∑ ,
        </p>
        <p>|〈  ,   〉|2
where   are the frame vectors.</p>
        <p>In this measure, the step 4 above and further ones are
replaced by the following
4. Compute the FP from matrix G = Φ TΦ
5. Assess the value of FP considering that a minimum
value of FP, is reached when the frame is tight.</p>
        <p>For
computing
step
4,
consider
that
〈  ,   〉 with  ,</p>
        <p>are the diagonal and upper -right
elements (or lower – left as G is symmetric) of the Gramian
matrix; FP in other words is the sum of the squared upper
(or lower) elements of the Gramian, including diagonal
ones; this measure may be easier to compute than the
previous one.</p>
        <p>A first example for measure b.1 and b.2</p>
        <p>Consider a dataset with attributes “Age” and “Income”;
domains are 6 age groups [20-30), [30-40), [40-50),
[5060), [60-70), [70-80) and 7 income categories [10-20K€),
[20-30k€), [30-40k€), [40-50k€), [50-60k€), [60-70k€),
[70-80k€); here three samples of (Age, Income) are
collected:</p>
        <p>Age Income
Φ= (3</p>
        <p>
          1) = ( )
1
6
4
7


From G we calculate measures b.1:
D = λmax - λmin = 106 – 6 = 100
and measure b.2:
FP = 462+662+492= 8873
G=(46
49
49
66
)
2as for equiangular tight frames holds G=N\M*I, where I is
the identity matrix [
          <xref ref-type="bibr" rid="ref23">25</xref>
          ].
        </p>
        <p>The measure b.1 is responsive to tuples order (e.g.
swapping tuples in general leads to different measure
values) and so it gives a measure of tightness of ordered
tuples, where tightness is defined according (1). As we
want a measure not responsive to the tuples order, in the
following we explain how to solve this issue.</p>
        <p>III.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>A COMPARISON WITH PCA</title>
      <p>To explain visually the approach, we compare PCA
(figure 1) with our method:
(i)
(ii)
in PCA, if we find equal G eigenvalues  1 =
 2 … =  M</p>
      <p>, we can conclude that there isn’t any
dominant component and the volume fitting data
is an (hyper)sphere;
similarly in our method, if we firstly project the
data over an (hyper)sphere surface, and if we then
find equal G’ eigenvalues  ′1 =  ′2 … =  ′ , we
can conclude that the projected data are evenly
spread over the (hyper)sphere surface because
they are the edges of an equiangular tight frame2.</p>
      <p>Possibly, to gain more information about data bias, both
the approaches (i) and (ii) can be adopted. In this paper we
consider only the approach (ii).</p>
      <p>IV.</p>
    </sec>
    <sec id="sec-5">
      <title>APPLICATION REMARKS</title>
      <p>According the method (ii), before applying steps 1-8,
we apply the following 0.a, 0.b, 0.c steps:
0.a discretize vectors coordinates domains
0.b vertex mean translation so that the barycenter (average
of the translated vertices) is zero.
0.c normalize vectors module to unit</p>
      <p>As an example, we apply the measure b.1 over the
dataset
of
covid-19
vaccinated
people
in</p>
      <p>Italy
(https://github.com/italia/covid19-opendata-vaccini).</p>
      <p>The dataset contains the number of COVID-19 vaccine
injections grouped by 9 age range [16-20), [20-30),
[304000000
3000000
2000000
1000000</p>
      <p>0
7000000
6000000
5000000
4000000
3000000
2000000
1000000
0
1
2
3
4
5
6
7
8
9
Starting from elder people, vaccination was progressively
extended to mid-age people, so about one and half month
later it is found a different shaped histogram in figure 5.</p>
      <p>injections 23.5.2021</p>
      <p>As elder people were firstly vaccinated (generally 2
injections required for vaccination), the histogram of
injections people shows the desired polarization in the
higher age groups.
As we said before, measure b.1 is not understandable if
directly applied to the original dataset: as shown in figure
4, it leads to a sort of evaluation of the shape of the
histogram, so we instead apply step 0.a dividing the
domain of #vaccine_injections in 9 intervals and then
apply step 0.b and 0.c. After normalization, we process
Φnorm8.4.2021 and Φnorm23.5.2021 and we have respectively the
results (figure 6):
λnorm8.4.2021_1=2,92,
λnorm8.4.2021_2=6,10</p>
      <p>
        Dnorm8.4.2021=3,18
3 Note that is also fulfilled [
        <xref ref-type="bibr" rid="ref22">24</xref>
        ] the normalized frames
condition N= ∑ =1  
-1
-0,5
0
0,5
1
1,5
      </p>
      <sec id="sec-5-1">
        <title>Normalized group age</title>
        <p>-1,5
-1,0
-0,5
0,0
0,5
1,0
1,5</p>
      </sec>
      <sec id="sec-5-2">
        <title>Normalized group age</title>
        <p>“tight”). Visually, in figure 7 some points got closer than in
figure 6 and two more collapsed; and this is what we
expected, as moving towards a uniform distribution means
that even more points get closer or collapse.</p>
        <p>
          It is interesting transpose, for the purpose of assessing bias,
some results from frame theory, for example:
-care should be taken in choosing M and N, because some
couples (M, N) don’t correspond to Equiangular Tight
Frames4 [
          <xref ref-type="bibr" rid="ref23">25</xref>
          ] and\or don’t correspond to “highly symmetric
frames”5 [
          <xref ref-type="bibr" rid="ref24">26</xref>
          ].
        </p>
        <p>
          From a bias point of view, this could mean that for some
(M, N) couples it’s easier to build non-biased data.
Moreover, the minimum value for Frame Potential FP for
unit-norm tight frames, [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] is:
FP = N
FP = M2\N
if M ≤ N
if M ≥ N
and this helps to assess the data that have an FP close to
the minimum, as it means they are “as orthogonal” to each
other as possible.
        </p>
        <p>V.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>PROPOSAL</title>
      <p>To sum up, with this proposal we address the issue of
finding a data quality measurement function (i.e. metric)
through geometrical calculation.</p>
      <p>
        Its application is envisaged for, but not for only, evaluation
of sampling bias across multiple attributes, as for example
the protected ones [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]: a well-known issue in modern
societies are the inequalities and with the measure above we
can overall assess the bias of a population dataset over
multiple attributes like “income”, “ethnicity”, “group age”,
instead of assessing bias against single or couples of
attributes.
      </p>
      <p>
        At the same time, we highlight the need to handle the
manifold of measures that are discovered by the community
of researchers with the approach explained in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]: the new
measure b.1 “tightness” can be defined in terms of a new
measure conforming to [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and\or to [
        <xref ref-type="bibr" rid="ref26">28</xref>
        ].
      </p>
      <p>
        The measure b.1 is documented in SQuaRE format in Table
1. For the time being, we make the assumption that
“tightness” is relevant to completeness characteristic,
further refinements about characteristic relevance will
depend on the work progress in [
        <xref ref-type="bibr" rid="ref26">28</xref>
        ]. For the scope of this
paper, Table 1 shows a simplified measure documentation;
4 From [
        <xref ref-type="bibr" rid="ref23">25</xref>
        ]: ∄ RETF (19, 76) and ∄ RETF (20, 96); notation
means “there not exists a real equiangular tight frame
with parameters (M, N)”
      </p>
      <sec id="sec-6-1">
        <title>5 E.g.: there are no “highly symmetric” tight frames of five</title>
        <p>
          vectors in C3, but there are tight frames of five vectors like
vertexes of a trigonal bipyramid
a comprehensive way of measure documentation is
described in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>ID
Name
Description
Measurement
function
DLC
Target Entity
Property</p>
        <p>Com-I-4-IT-10
Data values completeness
Tightness of normalized dataset
X= A-B= λmax - λmin
λmax,λmin are max, min eigenvalue of G = Φnorm T Φnorm
matrix Φnorm is built from dataset normalized
according steps 0.a, 0.b, 0.c
All Data Life Cycle
Dataset with N tuples and M attributes</p>
        <p>
          Data value
NOTE X=0 means “tight” according the definition of frame theory
NOTE ID includes additional part IT-10 [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
VI. FURTHER STUDIES
        </p>
        <p>Whereas data bias measurement starting from a given
dataset appears relatively easy, designing a dataset (frame)
starting from a level of bias (tightness), is not so simple
[13]; this result, as far as possibly others from frame theory,
can be taken into consideration when looking for an optimal
training dataset for Machine Learning.</p>
        <p>Dataset spread measurement appears useful in
conjunction with classification6 and it holds also for not
pre-trained machine like SVM (Support Vector Machines).
Due to the use of ML models in many fields (see Perceptron
in mechanical statistic [31]), we cannot exclude other fields
of application for this early study.</p>
        <p>
          The metric is applicable with some tricks [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] also to
images and it will be analyzed in a future paper.
        </p>
        <p>VII. CONCLUSION</p>
        <p>
          In this early study the measures b.1 and b.2 appear
suitable to measure data sample bias [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], that in turns is
mainly related to accuracy and completeness data quality
model characteristic [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], [
          <xref ref-type="bibr" rid="ref26">28</xref>
          ]. They can be considered in
SC7 WG6 and appear relevant to SC42 WG2 and WG3
work in progress on A.I. [
          <xref ref-type="bibr" rid="ref25">27</xref>
          ], [
          <xref ref-type="bibr" rid="ref26">28</xref>
          ].
        </p>
        <p>
          The manifold of metrics available for industry and
research7, including the one introduced in this paper, can be
addressed in the ISO/IEC 25000 perspective: applying the
process described in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], all the measures, including b.1 and
b.2 presented in this paper, can be defined as ISO/IEC
25000 conforming measures.
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>6 In general classification is easier when data are not</title>
        <p>spread.</p>
      </sec>
      <sec id="sec-6-3">
        <title>7 See an example of the manifold of benchmarks in</title>
        <p>https://paperswithcode.com/sota and [30].</p>
        <p>VIII. ACKNOWLEDGEMENTS</p>
        <p>The author would like to thank Antonio Vetrò, who
encouraged this work, Alessandro Simonetta, who
suggested some enhancements, Domenico Natale, for his
foundational contributions in the field of data quality, and
Roberto Li Voti, who believed in this project.
[13]
https://www3.math.tu</p>
        <p>berlin.de/numerik/www.fusionframe.org/index_application.html
[14] Shailesh Kumar - Results on equiangular tight frames
https://www.slideshare.net/
i Some definitions:
Data bias: “data properties that if unaddressed lead to AI
systems that perform better or worse for different
groups”
Bias: “systematic difference in treatment of certain
objects, people, or groups in comparison to others” and
[30] ISO/IEC DIS 23053 Framework for Artificial Intelligence (AI)</p>
        <p>Systems Using Machine Learning (ML)</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1] ISO/IEC 25010:
          <article-title>2011 Systems and Software engineering - Systems and software Quality Requirements and Evaluation (SQuaRE) - System and software quality models</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2] ISO/IEC 25012:
          <article-title>2008 Systems and Software engineering - Systems and software Quality Requirements and Evaluation (SQuaRE) - Data quality model</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3] ISO/IEC 25020:
          <year>2019</year>
          ,
          <article-title>Systems and Software engineering - Systems and software Quality Requirements and Evaluation (SQuaRE) - Quality measurement framework</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4] ISO/IEC 25022:
          <year>2016</year>
          ,
          <article-title>Systems and Software engineering - Systems and software Quality Requirements and Evaluation (SQuaRE) - Measurement of quality in use.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5] ISO/IEC 25023:
          <year>2016</year>
          ,
          <article-title>Systems and Software engineering - Systems and software Quality Requirements and Evaluation (SQuaRE) - Measurement of system and software product quality</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6] ISO/IEC 25024:
          <year>2015</year>
          ,
          <article-title>Systems and Software engineering - Systems and software Quality Requirements and Evaluation (SQuaRE) - Measurement of data quality</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Trenta</surname>
          </string-name>
          <article-title>: ISO/IEC 25000 quality measures for A.I.: a geometrical approach Proceedings APSEC IWESQ</article-title>
          <year>2020</year>
          (http://ceurws.org/Vol-
          <volume>2800</volume>
          /, ISSN 1613-0073)
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Natale</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <article-title>Trenta: Examples of practical use of ISO/IEC 25000 Proceedings APSEC IWESQ</article-title>
          <year>2019</year>
          (http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>2545</volume>
          /, ISSN 1613-0073)
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jelena</given-names>
            <surname>Kovacevic</surname>
          </string-name>
          , Amina Chebira:
          <article-title>An Introduction to Frames</article-title>
          . Found.
          <source>Trends Signal Process</source>
          .
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>94</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Mishal</surname>
            <given-names>Assif P. K.</given-names>
          </string-name>
          , Mohammed Rayyan Sheriff, Debasish Chatterjee:
          <article-title>Measure of quality of finite-dimensional linear systems: A frame-theoretic view</article-title>
          . CoRR abs/
          <year>1902</year>
          .04548 (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>John J Benedetto and Joseph D Kolesar</surname>
          </string-name>
          .
          <article-title>Geometric properties of Grassmannian frames for R2 and</article-title>
          R3
          <source>EURASIP Journal on Advances in Signal Processing</source>
          ,
          <year>2006</year>
          (1):
          <fpage>049850</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>[12] ISO/IEC 24027 draft Information technology - Artificial Intelligence (AI</source>
          )
          <article-title>- Bias in AI systems and AI-aided decision making</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Heusel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramsauer</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Unterthiner</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nessler</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>GANs trained by a two time-scale update rule converge to a local nash equilibrium</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          (pp.
          <fpage>6626</fpage>
          -
          <lpage>6637</lpage>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Mecati</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cannavò</surname>
            ,
            <given-names>F. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vetrò</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Torchiano</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2020</year>
          ,
          <article-title>August)</article-title>
          .
          <article-title>Identifying Risks in Datasets for Automated DecisionMaking</article-title>
          . In International Conference on Electronic Government (pp.
          <fpage>332</fpage>
          -
          <lpage>344</lpage>
          ). Springer, Cham.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Beretta</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vetrò</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lepri</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Martin J. C. D.</surname>
          </string-name>
          (
          <year>2021</year>
          )
          <article-title>Detecting discriminatory risk through data annotation based on Bayesian inferences</article-title>
          .
          <source>In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency</source>
          (pp.
          <fpage>794</fpage>
          -
          <lpage>804</lpage>
          ). https://doi.org/10.1145/3442188.3445940
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Beretta</surname>
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vetrò</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lepri</surname>
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Martin J. C.</surname>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Ethical and socially-aware data labels</article-title>
          .
          <source>In Annual International Symposium on Information Management and Big Data</source>
          (pp.
          <fpage>320</fpage>
          -
          <lpage>327</lpage>
          ). Springer, Cham. https://link.springer.com/chapter/10.1007%
          <fpage>2F978</fpage>
          -
          <fpage>3</fpage>
          -
          <fpage>030</fpage>
          - 11680-4_
          <fpage>30</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Bishop</surname>
            <given-names>C.</given-names>
          </string-name>
          , (
          <year>2006</year>
          )
          <article-title>Pattern recognition and machine learning, chapter 7 Sparse Kernel</article-title>
          Machines ISBN-
          <volume>13</volume>
          :
          <fpage>978</fpage>
          -
          <lpage>0387</lpage>
          -31073-2
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Kailash</given-names>
            <surname>Ahirwar (2019) Generative Adversarial Networks Projects</surname>
          </string-name>
          ISBN-
          <volume>13</volume>
          :
          <fpage>978</fpage>
          -
          <lpage>1789136678</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Borji</surname>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Pros and Cons of GAN Evaluation Measures https://arxiv</article-title>
          .org/pdf/
          <year>1802</year>
          .03446.pdf
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>[22] ISO/IEC 22989 Information technology - Artificial intelligence - Artificial intelligence concepts</source>
          and
          <source>terminology (2021 draft)</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Charan</surname>
            <given-names>Reddy</given-names>
          </string-name>
          , Deepak Sharma, Soroush Mehri, Adriana Romero, Samira Shabanian, Sina
          <string-name>
            <surname>Honari</surname>
          </string-name>
          (
          <year>2021</year>
          )
          <article-title>Benchmarking Bias Mitigation Algorithms in Representation Learning through Fairness Metrics https</article-title>
          ://openreview.net/forum?id=OTnqQUEwPKu
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Ole</given-names>
            <surname>Christensen</surname>
          </string-name>
          (
          <year>2016</year>
          )
          <article-title>An Introduction to Frames and Riesz Bases 2nd edition</article-title>
          <source>ISBN: 978-3-319-25613-9</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Matthew</surname>
            <given-names>Fickus</given-names>
          </string-name>
          , Dustin G. Mixon (
          <year>2016</year>
          )
          <article-title>Tables of the existence of equiangular tight frames https://arxiv</article-title>
          .org/abs/1504.00253
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Helen</surname>
            <given-names>Broome</given-names>
          </string-name>
          , Shayne
          <string-name>
            <surname>Waldron</surname>
          </string-name>
          (
          <year>2013</year>
          )
          <article-title>On the construction of highly symmetric tight frames and complex polytopes http://dx</article-title>
          .doi.org/10.1016/j.laa.
          <year>2013</year>
          .
          <volume>10</volume>
          .003
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [27]
          <article-title>ISO/IEC 25059 (draft) Software engineering - Systems and software Quality Requirements and Evaluation (SQuaRE) - Quality Model for AI-based systems</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [28] ISO/IEC 5259-
          <article-title>2 (draft) Artificial intelligence - Data quality for analytics and ML - Part 2: Data quality measures</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [29]
          <article-title>Double Trouble in Double Descent: Bias and Variance(s) in the Lazy Regime (</article-title>
          <year>2020</year>
          )
          <article-title>Stephane d'Ascoli, Maria Refinetti</article-title>
          , Giulio Biroli, Florent Krzakala https://arxiv.org/pdf/
          <year>2003</year>
          .01054.pdf
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>