<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Ital-IA</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Fairness, Debiasing and Privacy in Computer Vision and Medical Imaging</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Carlo Alberto Barbano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edouard Duchesnay</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benoit Dufumier</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pietro Gori</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Grangetto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Dept., University of Turin</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LTCI</institution>
          ,
          <addr-line>Télécom Paris, IP Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>NeuroSpin, CEA, Université Paris-Saclay</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>3</volume>
      <fpage>29</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>Deep Learning (DL) has become one of the predominant tools for solving a variety of issue, often with superior performance compared to previous state-of-the-art methods. DL models are often able to learn meaningful and abstract representations of the underlying data; however, they have also been shown to often learn additional features in the data, which are not necessarily relevant or required for the desired task. This could pose a number of issues, as the additional features can contain bias, sensitive or private information, that should not be taken into account (e.g. gender, race, age, etc.) by the model. We refer to this information as collateral. The presence of collateral information translates into practical issues when deploying DL models, especially if they involve users' data. Learning robust representations which are free of biased, private, and collateral information can be very relevant for a variety of fields and applications, for example for medical applications and decision support systems. In this work we present our group's activities aiming at devising methods to ensure that representations learned by DL models are robust to collateral features, biases and privacy-preserving with respect to sensitive information.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Fairness</kwd>
        <kwd>Debiasing</kwd>
        <kwd>Privacy</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Representation Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        associated with the information they learn and how it is with novel contrastive losses that show increased
robusthandled. Referring to all the above cases, we define as ness compared to the current literature [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. We provide
collateral any information that is not necessarily required a unified framework to analyze and compare existing
for the desired task, but that is picked-up by the model. formulations of contrastive losses, such as the InfoNCE
This concept which was conceptualized by John Dewey loss [
        <xref ref-type="bibr" rid="ref4 ref6">4, 6</xref>
        ], the InfoL1O loss [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], and the SupCon loss [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
as Collateral Learning, describes the accidental learning Using our proposed metric learning approach, we can
that occurs in and outside the classroom [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Based on reformulate each loss as a set of highly explainable
metthis definition, and extending it to the deep learning con- ric conditions. Our analysis provides a comprehensive
text, we say that collateral learning occurs when a model understanding of the diferent loss functions, explaining
learns more information than intended. In order to be their behavior from a metric point of view. Furthermore,
robust, DL models should not be afected by the collateral leveraging our metric learning approach, we investigate
learning problem. the issue of biased learning. We point out the limitations
of the studied contrastive loss functions when dealing
1.1. Representation Learning with biased data, especially when the loss on the training
set is apparently minimized. By analyzing such cases, we
A more throughout understanding of how deep mod- provide a more formal characterization of bias, which
els can learn powerful representations can certainly be eventually allows us to derive a new set of general
reguhelpful in all the above cases. Learning fair and robust larization constraints for debiasing that can be added to
representations of the underlying samples, especially any contrastive or non-contrastive loss.
when dealing with biased data or sensitive information,
is the main objective of the activities described in this Foundamentals Let  ∈  be an original sample
work. In the recent years, the topic of representation (i.e., anchor), + a similar (positive) sample, − a
dislearning has increasingly gained traction in the deep similar (negative) sample and  and  the number of
learning community. Contrastive learning has become positive and negative samples respectively. Contrastive
the most widespread approach for this purpose, and many learning methods look for a parametric mapping
funclosses and frameworks have been proposed [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">4, 5, 6, 7</xref>
        ]. tion  :  → S− 1 that maps “semantically” similar
Contrastive learning approaches aim at pulling positive samples close together in the representation space, a
(samples representations (e.g. of the same class) closer 1)-sphere, and dissimilar samples far away from each
together while repelling representations of negative ones other. Once pre-trained,  is fixed and its
representa(e.g. diferent classes) apart from each other. It has also tion is evaluated on a downstream task, such as
clasbeen shown that, in a supervised setting, this kind of opti- sification, through linear evaluation on a test set. In
mization can sometimes yield better results than standard general, positive samples + can be defined in
difercross-entropy [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and is also more robust against label ent ways depending on the problem: using
transformacorruption [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] which can be seen as an instance of collat- tions of  (unsupervised setting), samples belonging to
eral features. However, a lot remains to be done about the same class as  (supervised) or with similar image
this matter, and research should focus on how to provide attributes of  (weakly-supervised). The definition of
reliable guarantees for avoiding collateral features learn- negative samples − varies accordingly. Here, we
foing. Furthermore, another relevant line of research is cus on the supervised case, thus samples belonging to
addressing this issue from an unsupervised perspective the same/diferent class, but the proposed framework
(i.e. automatically recognizing and excluding all bias and could be easily applied to the other cases. We define
collateral information without any prior knowledge). ( (),  ()) as a similarity measure (e.g., cosine
simi
      </p>
      <p>
        In summary, there is a need for a reliable way to learn larity) between the representation of two samples  and
robust representations which are free of biased, private . Please note that since || ()||2 = || ()||2 = 1,
and collateral information. using a cosine similarity is equivalent to using a
L2distance (( (),  ()) = || () −  ()||22). Similarly
2. Metric framework for to [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11, 12, 13, 14</xref>
        ], we propose to use a metric learning
approach which allows us to better formalize recent
concontrastive learning trastive losses, such as InfoNCE [
        <xref ref-type="bibr" rid="ref4 ref6">4, 6</xref>
        ], InfoL1O [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and
      </p>
      <sec id="sec-1-1">
        <title>SupCon [5], and derive new losses that better approximate the mutual information and can take into account data biases.</title>
      </sec>
      <sec id="sec-1-2">
        <title>In our research activities, we explore representation</title>
        <p>learning from a theoretical perspective. We propose a
metric-learning based framework for supervised
representation learning, which allows us to derive and
formalize a more robust set of debiasing constraints, along
Derivation of  -SupInfoNCE Using an  -margin
metric learning point of view, probably the simplest
contrastive learning formulation is looking for a mapping
function  such that the following  -condition is always
satisfied:
( (),  (− )) − ( (),  (+) ≤ −

∀, 
(1)
−
⏞
⏟</p>
        <p>⏞
+
where  ≥ 0 represents a margin between positive and
negative samples, as shown in Fig. 1. The constraint
of Eq. 1 can be transformed into an optimization
problem using, as it is common in contrastive learning, the
max operator and its smooth approximation LogSumExp.</p>
      </sec>
      <sec id="sec-1-3">
        <title>The can lead to the derivation of diferent loss functions.</title>
      </sec>
      <sec id="sec-1-4">
        <title>Some of them can be found in [9]. We propose to use the</title>
        <p>following one, that we call  -SupInfoNCE:
∑︁ max(− , {− − +}=1,..., ) ≈
∑︁ log
exp(−  ) + ∑︁ exp(− − +)</p>
        <sec id="sec-1-4-1">
          <title>Experiments and Results</title>
        </sec>
      </sec>
      <sec id="sec-1-5">
        <title>Results on general com</title>
        <p>
          puter vision datasets are presented in Tab. 1, in terms
of top-1 accuracy. We report the performance for the
best value of  ; the complete results can be found in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
      </sec>
      <sec id="sec-1-6">
        <title>The results are averaged across 3 trials for every config</title>
        <p>uration, and we also report the standard deviation. We
obtain significant improvement with respect to all
baselines and, most importantly, SupCon, on all benchmarks:
on CIFAR-10 (+0.5%), on CIFAR-100 (+0.63%), and on</p>
      </sec>
      <sec id="sec-1-7">
        <title>ImageNet-100 (+1.31%). For the experiments, we use the</title>
        <p>
          original setup from SupCon [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], employing a ResNet-50.
        </p>
      </sec>
      <sec id="sec-1-8">
        <title>The complete experimental setup is provided in [9].</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Debiasing with FairKL</title>
      <p>good downstream performance, however, it does not take
into account the presence of biases (e.g. selection biases).
To tackle this issue, we propose FairKL, a set of
debiasing constraints that prevent the use of the bias features
within the proposed metric learning approach. In order
to give a more in-depth explanation of the  -InfoNCE
failure case, we employ the notion of bias-aligned and
bias-conflicting</p>
      <p>samples as in Nam et al. [15]. In our
context, a bias-aligned sample shares the same bias attribute
of the anchor, while a bias-conflicting sample does not.</p>
      <sec id="sec-2-1">
        <title>In this work, we assume that the bias attributes are ei</title>
        <p>ther known a priori or that they can be estimated using
a bias-capturing model, such as in [16].
minimal margin  , between the distance + of a positive
sample + (+ symbol inside) from an anchor  and the distance
−
of the closest negative sample − (</p>
        <p>− symbol inside). By
tween positive and negative samples.
increasing the margin, we can achieve a better separation be- Satisfying the  -condition (1) can generally guarantee
⏟

≈
= −
⏟

︃(</p>
        <p>︃(
∑︁ log


⏞
exp(+)
exp(+ −  ) + ∑︀ exp(− )
 − 
)︃
)︃</p>
        <sec id="sec-2-1-1">
          <title>Characterization of bias</title>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>We denote bias-aligned sam</title>
        <p>ples with · , and bias-conflicting samples with
· ,′ .</p>
        <p>Given an anchor , if the bias is “strong” and
easy-tolearn, a positive bias-aligned sample +, will probably
be closer to the anchor  in the representation space than
a positive bias-coniflcting</p>
        <p>
          sample (of course, the same
reasoning can be applied for the negative samples). This is
why even in the case in which the  -condition is satisfied
(2) and the  -SupInfoNCE is minimized, we could still be able
to distinguish between bias-aligned and bias-conflicting
samples. Hence, we say that there is a bias if we can
identify an ordering on the learned representations, e.g.:
Here, we can notice that when  = 0, we retrieve a
generalization of InfoNCE loss, whereas when  → ∞
we obtain a generalization of InfoL1O loss. It has been
shown in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] that these two losses are the lower and upper
bound of the Mutual Information (+, ) respectively:
InfoNCE ≤ (+, ) ≤ InfoL1O
(3)
By using a value of 
∈
        </p>
        <p>[0, ∞), one might find a
decreases as  increases.
tighter approximation of (+, ) since the
exponential function at the denominator exp(−  ) monotonically
− −  ≤ + ,′ &lt; +,
∀, , 
(4)</p>
      </sec>
      <sec id="sec-2-3">
        <title>This represents the worst-case scenario, where the order</title>
        <p>ing is total (i.e., ∀, , , ). Of course, there can also be
cases in which the bias is not as strong, and the ordering
may be partial.</p>
        <sec id="sec-2-3-1">
          <title>FairKL regularization for debiasing</title>
          <p>Ideally, we
would enforce the conditions 
+,′
−
+, = 0
∀, 
and, meaning that every positive bias-conflicting sample
should have the same distance from the anchor as any
other positive bias-aligned sample. However, in
practice, this condition is very strict, as it would enforce
uniform distance among all positive samples. A more
relaxed condition would instead force the distributions
of distances, {·,′ } and {·,}, to be similar. Here, we
propose two new debiasing constraints using either the
ifrst moment (mean) of the distributions or the first two
moments (mean and variance). Using only the average
of the distributions, we obtain:
4. Multi-site acquisition noise in
brain age prediction
1 ∑︁ |</p>
          <p>+,′
 
| −
1 ∑︁ |+,| = 0
 
where  and  are the number of positive bias-aligned
and bias-conflicting samples, respectively 2.
Coincidentally, this constraint is also known as EnD [17], which
we proposed in 2021. Denoting the first moments with
 +, = 1 ∑︀ +,,  +,′ = 1 ∑︀ + ,′ , and the
second moments of the distance distributions with  +2, =
1 ∑︀(+, −  +,)2,  +2,′ = 1 ∑︀(+ ,′ −  +,′ )2,
and making the hypothesis that the distance
distributions follow a normal distribution, we can define a new
debiasing constraint ℛ  using, for example, the</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>Kullback–Leibler divergence:</title>
      </sec>
      <sec id="sec-2-5">
        <title>In this section, we present our recent work in the field</title>
        <p>21 [︃  +2, + (  ++2,,′−  +,′ )2 − log   +2+2,,′ − 1]︃ = 0 aoMcfRcnuIe.ruTarthoeiismmioasgdaienclsgh,cafalolpecanubgsliiennoggftoganseknbertrahaianltizariegnqegupairrceerdsoisrcsotibdouinfesrtferanontmd
(6) imaging sites. Dealing with multi-site dataset is a delicate
The proposed debiasing constraint can be easily added matter in biomedical imaging in general, as the
collatto any contrastive loss using the method of the Lagrange eral noise related to the diferent acquisition sites often
multipliers, as a regularization term. Thus, our final loss limits the generalization capability of DL models. In this
function is: context, together with our partners at Télécom Paris (IP</p>
      </sec>
      <sec id="sec-2-6">
        <title>Paris) and NeuroSpin (CEA), we have developed a novel</title>
        <p>ℒ =  ℒ −  +  ℛ  (7) contrastive learning loss for regression of brain age from</p>
      </sec>
      <sec id="sec-2-7">
        <title>MRI [19], which is based on our metric learning frame</title>
        <p>where  and  are positive hyperparameters. work. We validated it on the OpenBHB challenge [20], a
recently released3 public challenge, which provides one
Experiments and results We perform experiments on of the largest datasets of healthy brain MRIs. Based on
our proposed loss on five biased datasets: Biased-MNIST, the framework presented in Sec. 2, we propose a novel
Corrupted-CIFAR10, bFFHQ, and 9-Class ImageNet along contrastive learning regression loss for brain age
prewith ImageNet-A. For brevity, in this presentation we diction, achieving state-of-the-art performance on the
report Biased-MNIST only, the results are reported in OpenBHB challenge.</p>
      </sec>
      <sec id="sec-2-8">
        <title>2The same reasoning can be applied to negative samples (omitted</title>
        <p>for brevity.)</p>
      </sec>
      <sec id="sec-2-9">
        <title>3https://baobablab.github.io/bhb/</title>
        <p>0 ≤
 ≤</p>
        <sec id="sec-2-9-1">
          <title>Contrastive Learning Regression Loss</title>
        </sec>
      </sec>
      <sec id="sec-2-10">
        <title>The notion</title>
        <p>of negative and positive samples is rooted in the
contrastive learning framework. The loss formulation of
Sec. 2 is thus not adapted for regression (i.e. continuous
labels), as it is not possible to determine a hard boundary
between positive and negative samples. All samples are
somehow positive and negative at the same time. Given
the continuous label  for the anchor and  for a
sample , one could threshold the diference ∆
between 
and  at a certain value  in order to create positive and
negative samples (i.e. k is positive if ∆( , ) &lt;  ). The
problem would then be how to choose  . Diferently, we
propose to define a degree of “positiveness” between
samples using a kernel function  = ( −
), where
1. Our goal is thus to learn a parametric
function  :  → S− 1 that maps samples with a high
degree of positiveness ( ∼
and samples with a low degree ( ∼
1) close in the latent space
0) far away from
each other. To adapt such a framework to continuous
labels, we propose to use a kernel function , and we
develop multiple formulations. A first approach would
be to consider as “positive” only the samples that have
a degree of positiveness greater than 0, and align them
with a strength proportional to the degree:
( − ) ≤ 0
∀, ,  ̸=  ∈ ()
(8)

∑︀ 
where we have normalized the kernel so that the sum
over all samples is equal to 1 and we denote with ()
the indices of samples in the minibatch distinct from .
From Eq. 8 we can derive the following loss:
ℒ
−  = −
∑︁</p>
        <p>∑︀ 
log
︃(</p>
        <p>
          exp()
∑︀
=1 exp()
)︃
. However, it still focuses more on the closest
sample “less positive” than , i.e.  s.t  &gt;  and
 ≤
gin with respect to the closest “negative” sample works
 ∀ ̸= . As noted in [
          <xref ref-type="bibr" rid="ref5 ref9">9, 5</xref>
          ], increasing the
marwell for classification; however we argue it might not be
ℒ
best suited for regression. For this reason, we propose
a second formulation ( ) that takes an opposite
approach. Instead of focusing on repelling the closest “less
positive” sample, we increase the repulsion strength for
samples proportionally to their distance from the anchor
in the kernel space:
[(1 − ) − ] ≤ 0
        </p>
        <p>∀,  ̸=  ∈ ()
1</p>
        <p>∑︁
 = − ∑︀  ∈()
 log ∑︀</p>
        <p>exp()
̸= exp((1 − ))
(11)
 formulation, the weighting factor
In the resulting ℒ</p>
        <p>acts like a temperature value, by giving more
weight to the samples which are farther away from the
anchor in the kernel space. Also, for a proper kernel
choice, samples closer than  will be repelled with very
low strength (∼ 0). We argue that this approach is more
suited for continuous attributes (i.e., regression task), as
(9) it enforces that samples close in the kernel space will be
close in the representation space.</p>
      </sec>
      <sec id="sec-2-11">
        <title>Interestingly, this is exactly the y-aware loss proposed in</title>
        <p>[21] for classification with weak continuous attributes.</p>
        <sec id="sec-2-11-1">
          <title>Results</title>
        </sec>
      </sec>
      <sec id="sec-2-12">
        <title>With our proposed loss, we achieve the best</title>
        <p>
          Due to the non-hard boundary between positive and neg- results (at this time) [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] on the OpenBHB leaderboard, as
ative samples, both  and  are defined over the entire
shown in Tab. 3 (ℒ). Compared to the L1 and ComBat
minibatch. The kernel  is used to avoid aligning sam- baselines [19], we achieve a lower generalization error
ples not similar to the anchor (i.e.  ≈
noted that, while the numerator aligns , in the
denominator, the uniformity term (as defined in [ 22]) focuses
0). It can be
to unseen sites (Ext. MAE), meaning that our method is
more robust to the collateral information related to the
site noise. We are currently carrying out further research
more on the closest samples in the representation space: to gain further insights on the reasons of this behavior.
this could be undesirable, as these samples might have a
greater degree of positiveness than the considered . To
avoid that, we formulate a first extension ( ℒ
which limits the uniformity term (i.e., denominator) to
        </p>
        <p>ℎ) of (8),
the samples that are at least more distant from the anchor
than the considered  in the kernel space (omitting the
normalization in the starting condition):
( − ) ≤
0 if  −  ≤
0</p>
        <p>∀,  ̸=  ∈ ()
ℒ
ℎ = −
∑︀</p>
        <p>∑︀  &lt;  log</p>
        <p>exp()
∑︀̸=  &lt; exp()
︁(
︁)</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Privacy in deep learning</title>
      <p>We investigated the possibility of utilizing debiasing
technique also to prevent privacy leakage. In this context,
we are interested in recovering some private attribute
of the data, starting from the model outputs or
embeddings. These kind of private attributes can be, in the
example of natural or facial images, age, gender, race,
etc. We observed that, under certain conditions, some
of the debiasing approaches are also suitable for privacy
preservation. We discovered the determining condition with Multi-class N-pair Loss Objective, in:
to be the capability of efectively suppressing the bias re- Advances in Neural Information Processing
lated information inside of the model, rather than simply Systems, volume 29, Curran Associates, Inc.,
re-weighting it. We show in [23] that debiasing tech- 2016. URL: https://papers.nips.cc/paper/2016/hash/
niques can be used for privacy preservation purposes 6b180037abbebea991d8b1232f8a8ca9-Abstract.
when they allow to retain a high accuracy on the target html.
class, while making it harder to determine the private [12] J. Wang, Y. song, T. Leung, C. Rosenberg, J. Wang,
attributes. In our work, we successfully remove collateral J. Philbin, B. Chen, Y. Wu, Learning Fine-grained
private information, e.g. gender or age, from the latent Image Similarity with Deep Ranking, in: CVPR,
representation of the DL models on a variety of datasets, 2014.
including medical images; thus ensuring that they cannot [13] X. Wang, Y. Hua, E. Kodirov, N. M. Robertson,
leak from the model outputs. Ranked List Loss for Deep Metric Learning, in:
CVPR, 2019.
[14] B. Yu, D. Tao, Deep Metric Learning With Tuplet
References Margin Loss, in: IEEE ICCV, 2019, pp. 6489–6498.
[15] J. Nam, H. Cha, S. Ahn, J. Lee, J. Shin, Learning
from failure: Training debiased classifier from
biased classifier, in: Advances in Neural Information</p>
      <sec id="sec-3-1">
        <title>Processing Systems, 2020.</title>
        <p>[16] Y. Hong, E. Yang, Unbiased classification through
bias-contrastive and bias-balanced learning, in:</p>
      </sec>
      <sec id="sec-3-2">
        <title>Thirty-Fifth Conference on Neural Information Pro</title>
        <p>cessing Systems, 2021. URL: https://openreview.net/
forum?id=2OqZZAqxnn.
[17] E. Tartaglione, C. A. Barbano, M. Grangetto, End:
Entangling and disentangling deep representations
for bias correction, in: IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2021,
2021.
[18] B. Kim, H. Kim, K. Kim, S. Kim, J. Kim, Learning
not to learn: Training deep neural networks with
biased data, in: The IEEE Conference on Computer</p>
      </sec>
      <sec id="sec-3-3">
        <title>Vision and Pattern Recognition (CVPR), 2019.</title>
        <p>[19] C. A. Barbano, B. Dufumier, E. Duchesnay,</p>
      </sec>
      <sec id="sec-3-4">
        <title>M. Grangetto, P. Gori, Contrastive learning for</title>
        <p>regression in multi-site brain age prediction, in:
International Symposium on Biomedical Imaging
(ISBI), 2023.
[20] B. Dufumier, et al., Openbhb: a large-scale
multisite brain mri data-set for age prediction and
debiasing, NeuroImage (2022).
[21] B. Dufumier, et al., Contrastive learning with
continuous proxy meta-data for 3d mri classification,
in: MICCAI, Springer, 2021.
[22] T. Wang, P. Isola, Understanding Contrastive
Representation Learning through Alignment and
Uniformity on the Hypersphere, ICML (2020). URL:
http://arxiv.org/abs/2005.10242.
[23] C. A. Barbano, E. Tartaglione, M. Grangetto,
Bridging the gap between debiasing and privacy for deep
learning, in: 2021 IEEE/CVF International
Conference on Computer Vision Workshops (ICCVW),
IEEE, 2021, pp. 3799–3808.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] EidosLab, Image proecessing, computer vision</article-title>
          and virtual reality, https://eidos.di.unito.it,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] CVPL, Italian Association for Computer Vision, Pattern Recognition and Machine Learning</article-title>
          , http: //www.cvpl.it,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Dewey</surname>
          </string-name>
          , Experience And Education, Free Press,
          <year>1997</year>
          . URL: https://books.google.fr/books?id= UWbuAAAAMAAJ.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kornblith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Norouzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>A Simple Framework for Contrastive Learning of Visual Representations</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>1597</fpage>
          -
          <lpage>1607</lpage>
          . URL: http://proceedings.mlr.press/v119/ chen20j.html, iSSN:
          <fpage>2640</fpage>
          -
          <lpage>3498</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Khosla</surname>
          </string-name>
          , et al.,
          <article-title>Supervised contrastive learning</article-title>
          ,
          <source>in: NeurIPS</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>A.</surname>
          </string-name>
          v. d. Oord,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Vinyals</surname>
          </string-name>
          ,
          <article-title>Representation Learning with Contrastive Predictive Coding</article-title>
          , arXiv:
          <year>1807</year>
          .03748 [cs, stat] (
          <year>2019</year>
          ). URL: http: //arxiv.org/abs/
          <year>1807</year>
          .03748, arXiv:
          <year>1807</year>
          .03748.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Poole</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ozair</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          v. d. Oord,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Alemi</surname>
          </string-name>
          , G. Tucker, On Variational Bounds of Mutual Information, in: ICML,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Graf</surname>
          </string-name>
          , et al.,
          <article-title>Dissecting supervised contrastive learning</article-title>
          ,
          <source>in: ICML</source>
          ,
          <year>2021</year>
          . URL: https://proceedings. mlr.press/v139/graf21a.html.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Barbano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dufumier</surname>
          </string-name>
          , E. Tartaglione,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grangetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gori</surname>
          </string-name>
          ,
          <article-title>Unbiased supervised contrastive learning</article-title>
          ,
          <source>in: The Eleventh International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2023</year>
          . URL: https://openreview.net/forum? id=
          <fpage>Ph5cJSfD2XN</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chopra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hadsell</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <article-title>LeCun, Learning a Similarity Metric Discriminatively, with Application to Face Verification</article-title>
          , in: CVPR, volume
          <volume>1</volume>
          , IEEE,
          <year>2005</year>
          , pp.
          <fpage>539</fpage>
          -
          <lpage>546</lpage>
          . URL: http://ieeexplore. ieee.org/document/1467314/. doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2005</year>
          .
          <volume>202</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Sohn</surname>
          </string-name>
          , Improved Deep Metric Learning
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>