=Paper=
{{Paper
|id=Vol-3908/paper_67
|storemode=property
|title=Proxy Fairness Under the Gdpr and the Ai Act: a Perspective of Sensitivity and Necessity
|pdfUrl=https://ceur-ws.org/Vol-3908/paper_67.pdf
|volume=Vol-3908
|authors=Ioanna Papageorgiou
|dblpUrl=https://dblp.org/rec/conf/ewaf/Papageorgiou24
}}
==Proxy Fairness Under the Gdpr and the Ai Act: a Perspective of Sensitivity and Necessity==
Proxy Fairness under the GDPR and the AI ACT: A perspective of sensitivity and necessity⋆ An Extended Abstract Ioanna Papageorgiou1 1 Leibniz University Hannover, Institute for Legal Informatics, Germany Introduction The increasing adoption of AI systems at high stake areas of public life along with extensive studies on the discriminatory potential of AI [1] have prompted a proliferation of algorithmic methods that study and pursue fairness in AI systems (Fair-AI) ([2, 3, 4]. These methods are centered on the detection, mitigation and evaluation of bias across legally protected groups, and almost invariably require access to sensitive attributes, like demographics, that determine group membership. However, this often implies the processing of personal sensitive data, which is in principle prohibited or extensively protected according to the EU data protection law, posing challenges to the feasibility of Fair-AI approaches. In response to this challenge, a growing line of AI research [5, 6, 7, 8, 9, 10, 11] has studied computational methods that enable fairness operationalization in the absence of demographic data, notably through the use of proxy variables and inferential techniques (Proxy Fairness). However, scant attention has been given thus far to the interaction of these methods with existing data protection regulations, posing significant legal uncertainty regarding their legiti- macy. This uncertainty intensifies in the face of ongoing regulatory developments. Particularly, the upcoming AI Act has also addressed the challenge of data scarcity in the context of Fairness, by enabling, on grounds of public interest, the processing of personal sensitive data for the purposes of bias detection and correction in high-risk AI systems. Precisely, according to the Article 10 (5) AI Act, the processing of personal sensitive data is permitted only "to the extent that it is strictly necessary for the purposes of ensuring bias detection and correction in relation to the high-risk AI systems..[emphasis added]". While the enabling provision appears to be method-agnostic, meaning that it’s not restricted to a particular fairness approach, the stipulated necessity requirement significantly influences the choice of fairness methods, and to a greater extent, the scope of Proxy Fairness. By utilizing the legal notions of data- Sensitivity and processing- Necessity, the paper examines the legal implications of Proxy Fairness under the General Data Protection Regulation and the AI Act, providing a normative foundation to this line of Fair-AI approaches. Precisely, the paper scrutinizes the nature of data involved in Proxy Fairness approaches- including proxy variables and data inferences- demonstrating that inferential methods are in principle not exempt from the reach of the GDPR and its extensive regime for sensitive data. Subsequently, the paper EWAF’24: European Workshop on Algorithmic Fairness, July 01–03, 2024, Mainz, Germany ⋆ This is an extended abstract. Forthcoming paper in 7th AAAI/ACM Conference on AI, Ethics, and Society (AIES-24). © 2024 Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings examines the lawfulness of processing sensitive data for Proxy Fairness under article 10 (5) of the AI Act through a comparative assessment of proxy fairness approaches versus default alternatives along the necessity axes of intrusiveness, effectiveness, and reasonableness. Proxy Fairness under the GDPR: a sensitivity perspective In order to assess Proxy fairness under article 10 (5) of the AI Act, it is necessary to first investigate the extent to which it involves the processing of sensitive data under the meaning of the GDPR. For this purpose, the paper distinguishes between two main data-pillars involved in Proxy Fairness, namely Proxy and Inferred data, and assesses them under the legal notion of sensitivity. Particularly, through on a grammatical and systematic interpretation of article 9 (1) GDPR, which defines sensitive personal data, and by consulting the jurisprudence of the European Court of Justice [12, 13], guidelines from the Article 29 Working Party [14, 15, 16, 17] and a substantial corpus of legal scholarship [18, 19, 20, 19, 18, 21, 22, 23, 24, 25], the paper supports that both proxy and inferred data used in the context of Proxy Fairness may be considered sensitive within the meaning of the GDPR. Proxy Fairness under the AI Act: a necessity perspective As mentioned above, according to article 10 (5) AI Act, the processing of sensitive data is permitted only ”to the extent that it is strictly necessary for the purposes of ensuring negative bias detection and correction in relation to the high-risk AI systems [emphasis added]”, i.e. only under the requirement of legal necessity. The necessity principle, which has been a recurrent condition to the processing of personal data, essentially dictates that data processing is permissible only to the extent that there is not a less intrusive but similarly effective alternative available, which can reasonably achieve the objective at hand [26, 27]. AI providers seeking to rely on the exception of the AI Act and process sensitive personal data for bias detection and correction must thus conduct a necessity test, which involves comparing available alternatives based on their levels of a) intrusiveness, b) effectiveness and c) reasonableness. The paper examines proxy fairness approaches under the necessity requirement, particularly by comparing them with default approaches that directly collect and use real sensitive attributes, along the necessity axes. a. intrusiveness Core criteria for assessing the intrusiveness of a data processing operation — i.e. the severity of the interference with the right to data protection— include the volume and type of data processed and the associated risks of data misuse [27]. Examining these criteria, the paper argues that Proxy Fairness not only de facto involves a larger volume of personal data compared to default approaches, but also a larger volume of de jure sensitive data, thereby being more intrusive under the first two criteria. Subsequently, the paper discusses the lack of data subjects’ control over their personal data and the risk of discrimination as relevant instances of data misuse in the cases of Proxy and Default Fairness respectively, highlighting the complexity of comparing different methods in terms of data misuse risks. b. effectiveness Compliance with the requirement of necessity does not require prioritiz- ing any kind of milder alternative, but only those milder alternatives that can attain the pursued objective in a comparably effective manner. In a second step, AI providers must thus compare the identified alternatives with respect to their effectiveness in detecting and correcting bias, by relying on theoretical and/or empirical evidence regarding the utility and limitations of the fairness methods under consideration. This includes qualitative and quantitative arguments about the way relevant demographic groups would be better served by the planned interven- tion, such as performance and fairness metrics, accuracy of fairness and associated trade-offs. Accordingly, the paper conducted a high-level effectiveness- comparison between Default and Proxy Fairness approaches based on evidence discussed in the Fairness literature. c. reasonableness According to the last element of the necessity, AI providers are required to prioritize milder effective alternatives only if those are reasonable in terms of financial, legal, and operational feasibility. Particularly, nothing prohibitively costly, practically impossible or illegal shall be demanded. The paper argues that this step provides space not only for a utility-based calculus but also for ethical considerations, demonstrating how current research on critical ethics can gain normative relevance in the context of the GDPR and the AI Act. Conclusion In the face of the increasing popularity of proxy fairness approaches and the lack of a thorough corresponding legal framework, this paper explored aspects of Proxy Fairness under the General Data Protection Regulation and the AI Act. By shedding light on the regulatory nuances involved in Proxy Fairness and providing interpretational tools for a lawful processing of sensitive data in this context, the paper aims to assist AI providers in regulatory compliance and safeguard the data protection rights of data subjects, while laying the groundwork for further research at the intersection of data protection law, ethics, and Fair-AI. Acknowledgments This work has received funding from the European Union’s Horizon 2020 research and innovation program under Marie Sklodowska-Curie Actions (Grant Agreement Number 860630) for the project “NoBIAS—Artificial Intelligence without Bias”. This work reflects only the author’s views and the European Research Executive Agency (REA) is not responsible for any use that may be made. References [1] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, A. Galstyan, A survey on bias and fairness in machine learning, ACM Comput. Surv. 54 (2021) 115:1–115:35. [2] E. Ntoutsi, et al., Bias in data-driven artificial intelligence systems - an introductory survey, WIREs Data Mining Knowl. Discov. 10 (2020). [3] R. Schwartz, A. Vassilev, K. Greene, L. Perine, A. Burt, P. Hall, Towards a Standard for Identifying and Managing Bias in Artificial Intelligence, Technical Report 1270, NIST Special Publication, 2022. [4] S. Mitchell, E. Potash, S. Barocas, A. D’Amour, K. Lum, Algorithmic fairness: Choices, assumptions, and definitions, Annual Review of Statistics and Its Ap- plication 8 (2021) 141–163. URL: https://ssrn.com/abstract=3800687. doi:doi: 10.1146/ annurev-statistics-042720-125902. [5] C. Ashurst, A. Weller, Fairness without demographic data: A survey of approaches, in: Proceedings of the 3rd ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization, Association for Computing Machinery, New York, NY, USA, 2023. URL: https://doi.org/10.1145/3617694.3623234. doi:doi: 10.1145/3617694.3623234. [6] Centre for Data Ethics and Innovation and Department for Science, Innovation and Technology, Enabling responsible access to demographic data to make ai systems fairer, Research and analysis report, 2023. URL: https://www.gov.uk/government/publications/ enabling-responsible-access-to-demographic-data-to-make-ai-systems-fairer/ report-enabling-responsible-access-to-demographic-data-to-make-ai-systems-fairer, published on 14 June 2023. [7] R. Awasthi, A. Beutel, M. Kleindessner, J. Morgenstern, X. Wang, Evaluating fairness of machine learning models under uncertain and incomplete information, in: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 2021, pp. 206–214. [8] J. Chen, N. Kallus, X. Mao, G. Svacha, M. Udell, Fairness under unawareness: Assessing dis- parity when protected class is unobserved, in: Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, Association for Computing Machinery, New York, NY, USA, 2019, p. 339–348. URL: https://doi.org/10.1145/3287560.3287594. doi:doi: 10.1145/3287560.3287594. [9] S. Yan, H. te Kao, E. Ferrara, Fair class balancing: Enhancing model fairness without observing sensitive attributes, in: Proceedings of the 29th ACM, 2020. [10] Z. Zhu, Y. Yao, J. Sun, H. Li, Y. Liu, Weak proxies are sufficient and preferable for fairness with missing sensitive attributes, in: Proceedings of the 40th International Conference on Machine Learning, ICML’23, JMLR.org, 2023. [11] M. R. Gupta, A. Cotter, M. M. Fard, S. L. Wang, Proxy fairness, CoRR abs/1806.11212 (2018). URL: http://arxiv.org/abs/1806.11212. arXiv:1806.11212. [12] Nowak, Judgment of the court (second chamber) of 20 december 2017, court of justice of the european union c-434/16, 2017. [13] K. Egan, M. H. v European Parliament, Egan and hackett v parliament, ECLI:EU:C:2019:1064, 2012. Judgment of the General Court (Fifth Chamber) of 28 March 2012. [14] Article 29 Data Protection Working Party, Opinion on the concept of personal data, 2007. [15] A. . W. P. Art29WP, Article 29 data protection working party opinion 3/2012 on devel- opments in biometric technologies, 2012. URL: https://ec.europa.eu/justice/article-29/ documentation/opinion-recommendation/files/2012/wp193_en.pdf. [16] Article 29 Data Protection Working Party, Guidelines on the right to data portability, Data Protection Working Party (2016) 9–11. URL: https://ec.europa.eu/newsroom/document. cfm?doc_id=44099, on file with the Columbia Business Law Review. [17] Article 29 Data Protection Working Party, Guidelines on automated individual decision- making and profiling for the purposes of regulation 2016/679, 2017. URL: https://ec.europa. eu/newsroom/article29/items/612053, document No. 17/EN, WP251rev.01. [18] S. Wachter, B. Mittelstadt, A right to reasonable inferences: Re-thinking data protection law in the age of big data and ai, Columbia Business Law Review 2019 (2018). URL: https://ssrn.com/abstract=3248829, october 5, 2018. [19] P. Quinn, G. Malgieri, The difficulty of defining sensitive data – the concept of sensitive data in the eu data protection framework, German Law Journal (2020). URL: https://ssrn. com/abstract=3713134. doi:doi: 10.2139/ssrn.3713134, (Forthcoming). [20] G. Malgieri, G. Comandè, Sensitive-by-distance: quasi-health data in the algorithmic era, Information & Communications Technology Law 26 (2017) 229–249. URL: https: //doi.org/10.1080/13600834.2017.1335468. doi:doi: 10.1080/13600834.2017.1335468. [21] D. J. Solove, Data is what data does: Regulating based on harm and risk instead of sensitive data, Northwestern University Law Review 118 (2024) 1081. URL: https://ssrn. com/abstract=4322198. doi:doi: 10.2139/ssrn.4322198, gWU Legal Studies Research Paper No. 2023-22, GWU Law School Public Law Research Paper No. 2023-22. [22] A. Schiff, Ehmann, Selmayr, Datenschutz-Grundverordnung DS-GVO Kommentar, 3. au- flage ed., C.H.BECK, 2017. [23] P. Gola, D. Heckmann, Datenschutz-Grundverordnung, Bundesdatenschutzgesetz: DS- GVO / BDSG, 3 ed., C.H. Beck, 2022. [24] M. Finck, Hidden Personal Insights and Entangled in the Algorithmic Model: The Limits of the GDPR in the Personalisation Context, Cambridge University Press, 2021, pp. 95–107. [25] D. Hallinan, F. Zuiderveen Borgesius, Opinions can be incorrect! in our opinion. on the accuracy principle in data protection law, International Data Privacy Law ipz025 (2020). doi:doi: 10.1093/idpl/ipz025. [26] European Data Protection Supervisor, Assessing the necessity of measures that limit the fundamental right to the protection of personal data: A toolkit, 2023. URL: https://www. edps.europa.eu/data-protection/our-work/publications/papers/necessity-toolkit_en. [27] P. Schantz, H. A. Wolff, Das neue Datenschutzrecht: Datenschutz-Grundverordnung und Bundesdatenschutzgesetz in der Praxis, C.H.BECK., 2017.