GDPR Privacy Policies in CLAUDETTE: Challenges of Omission, Context and Multilingualism Rūta Liepin, a Giuseppe Contissa Kasper Drazewski Law Department, EUI, Florence, Italy CIRSFID, University of Bologna, Italy Law Department, EUI, Florence, Italy Francesca Lagioia Marco Lippi Hans-Wolfgang Micklitz EUI, Florence, Italy DISMI, University of Modena and Law Department, EUI, Florence, Italy CIRSFID, University of Bologna, Italy Reggio Emilia, Italy Przemysław Pałka Giovanni Sartor Paolo Torroni Yale Law School EUI, Florence, Italy DISI, University of Bologna, Italy New Haven, United States CIRSFID, University of Bologna, Italy Abstract: The latest developments in natural language process- linguistic and legal complexity, and the need for methodologies ing and machine learning have created new opportunities in legal that can be transferred between different European languages. text analysis. In particular, we look at the texts of online privacy policies after the implementation of the European General Data 2 BACKGROUND Protection Regulation (GDPR). We analyse 32 privacy policies to Legal texts, such as regulations, contracts, privacy policies, and design a methodology for automated detection and assessment of cases, provide a rich source for different formal analyses, due to compliance of these documents. Preliminary results confirm the the complexity of language and legal norms within those texts. pressing issues with current privacy policies and the beneficial One of the aims of artificial intelligence and law research [8, 10] use of this approach in empowering consumers in making more is to find methods for accurately and efficiently extracting the informed decisions. However, we also encountered several serious knowledge from legal texts and for providing a level of evaluation issues in the process. This paper introduces the challenges through for the extracted data. This paper focuses on the legal texts of online concrete examples of context dependence, omission of information, privacy policies. We identified three main dimensions for evaluation and multilingualism. based on the GDPR and its guidelines: completeness, compliance with the data processing rules, and level of readability. A selection 1 INTRODUCTION of the research studies in these fields is introduced below. The changes in online privacy policies following the European Completeness: one of the core criticisms against unfair privacy General Data Protection Regulation (GDPR) have further high- policies regards withheld or missing information on the data pro- lighted the increasing information asymmetry between online ser- cessing, such as the purpose and retention time of personal data, vice providers and consumers. Studies [3, 5] in consumer behaviour including sensitive data. Constante et al. [7] use machine learning in reading privacy policies show that long and complex legal doc- and pre-annotated privacy policies to check for the completeness uments are seldom read and understood by users. Moreover, [13] of information pre-GDPR. To this end, they designed a client-end show that comprehending the rights and obligations outlined in solution, allowing consumers to read summarised policies on pri- these online documents is costly both in terms of time and monetary vacy categories of their choice (6 core categories and 11 additional value. categories). This paper presents a work in progress that includes the latest de- velopments of our methodology [12] in designing the Gold Standard Compliance: service providers, consumers and law enforcement of privacy policy compliance that could be used to build a platform authorities are interested in assessing the compliance of online empowering consumers to gain easier access and support in un- privacy policies. However, it has proven to be a challenging task. derstanding their rights and obligations. We aim to provide such a Research in this area focuses on formalising legal norms [4] and solution through the use of legal analysis, natural language process- designing methodologies [17] for automating the assessment of ing, and machine learning. In Section 4, we describe three challenges privacy policies. One of the risks identified [10] relates to the misin- faced by the AI and Law researchers working on automating eval- terpretation of norms as well as to the failure in connecting different uation of legal documents and illustrate them through examples specifications of norms within a legal document. found in the privacy policies analysed in our study. Among other Readability: a different area of research focuses on the language issues, we focus on the problem of context dependence of (legal) and accessibility of privacy policies. A new study [5] provides terms, the challenges in formalising the privacy policies due to their empirical evidence on the readability levels of privacy policies post- In: Proceedings of the Third Workshop on Automated Semantic Analysis of Information GDPR, concluding that “these policies are often unreadable”.1 Fol- in Legal Text (ASAIL 2019), June 21, 2019, Montreal, QC, Canada. lowing previous work by [14], their results support the conclusion © 2019 Copyright held by the owner/author(s). Copying permitted for private and academic purposes. 1 For readability scores the study employed the Flesch Reading Ease (FRE) test and the Published at https://ceur-ws.org Flesch-Kincaid (F-K) test. ASAIL 2019, June 21, 2019, Montreal, QC, Canada R. Liepina et al. that an unreasonable level of expertise is required to comprehend Each of the top-level dimensions has been further divided into the privacy policies. The average score, among the 300 analysed the relevant categories and corresponding evaluation criteria. Dia- policies, was at a level of “the usual score of articles in academic gram 1 shows the layered structure of the methodology by exem- journals” [5], supporting the claim that policies are not written plifying a good privacy policy: one that satisfies all the criteria.2 to be accessible and understandable by the general public. Such To meet the requirements of comprehensiveness, a privacy policy barriers further discourage consumers from reading privacy poli- should declare the purposes of the processing precisely and exhaus- cies [16]. Some solutions, such as automatically generated privacy tively. Thus, clauses providing only examples must be considered policy summaries [19] and interactive solutions of privacy analysis as insufficiently informative. In the dimension of substantive com- through apps [1], are emerging to provide consumers with tools pliance, using personal data for targeted advertising is fair only to better understand the contents of agreements and exercise their if based on the data subject’s consent and whenever an opt-out rights. is possible. Regarding the clarity of expression, i.e. whether a pri- vacy policy is framed in understandable, precise, and intelligible 3 DESIGNING METHODOLOGY language, certain unspecific language qualifiers should be avoided This project aims to design a methodology for creating an open (e.g. indeterminate conditioners, creating a dependency of a stated and high quality annotated corpus of online privacy policies. Such action or activity on a variable trigger such as “as necessary”, “from a data set could be used for automated detection and evaluation of time to time”, etc). We have designed detailed annotation guidelines problematic privacy clauses given the GDPR as the basis for inte- that are being further tested with a new data set of policies. grated normative guidelines. Here, we present an overview of the (1) Comprehensiveness of Information. The clause satisfies the crite- current methodology for detecting and assessing the problematic ria if the privacy policy includes sufficient information on the 23 privacy clauses, and how the new guidelines have improved on categories defined in the annotation guidelines. These include: previous versions [6]. identity of the data controller, categories of personal data concerned, and the period for which the personal data will be 3.1 The Gold Standard stored. Where ‘sufficiency’ is defined as fully informative privacy We designed a methodology that reflects the overall aims of the clauses that include all the details required by the regulation (e.g. GDPR in regards to collection and processing of personal data. In ). Everything that does not satisfy the given criteria, as speci- particular, we focus on three ways a privacy policy can be deemed fied in the guidelines has been marked as sub-optimal (e.g. ). unlawful according to articles 13 and 14 of the GDPR: (1) if the pol- We use the numerical values of 1 and 2 in the XML tags to refer to icy omits information required by the regulation, (2) if the policy the level of comprehensiveness of the information given. The earlier defines data processing beyond the prescribed limits, and (3) if it is version of the methodology distinguished 12 relevant categories. written in unclear language. The number of categories was increased to 23 to provide a more fine-grained annotation of functions. The improvements from the previous annotation guidelines [6] consist of the further specifica- tion of the different functions of the rights granted to consumers, The Gold Standard and the steps needed to exercise them. In particular, the clauses implementing the duty to inform the data subject about their rights, under article 13.2(b) and 14.2(c) of the GDPR, initially falling under a single category of required information[6] and identified with Comprehensive Substantive the tag, have been distinguished in multiple categories. information provided compliance Clear expression The reason for further differentiating between such categories is twofold. Firstly, from the legal point of view, the right to request access to, and rectification or erasure of, personal data or restriction 23 categories 11 categories clear language not tagged; for of processing and to object to processing, as well as the right to data e.g. for the e.g. for the use of purposes of data processing personal data for ads unclear expressions portability, are conceptually distinct and independent. Secondly, in analysing the privacy policies, we noted that the different rights and steps needed to exercise these rights are usually addressed in Optimal or sub-optimal Fair processing, 4 indicators: conditionals, separate clauses. Thus we chose the units for our tagging method depending on whether problematic processing, generalisations, modality, as single phrases. Indeed, with clauses covering multiple sentences, sufficient info included or unfair processing non-spec. quantifiers we chose to tag each sentence separately, by treating statements independently from one another. Hence, also the clauses contain- Figure 1: Dimensions - categories - criteria ing information about the rights are now classified separately from those outlining the steps needed to exercise these rights. Consider, for instance, the following example: We chose three top-level dimensions for the evaluation: You can request access to your personal in- (1) comprehensiveness of information formation, or correct or update out-of-date (2) substantive compliance (3) clarity of expression 2 In the diagram, the underlined criteria illustrate a good privacy policy. GDPR Privacy Policies in CLAUDETTE ASAIL 2019, June 21, 2019, Montreal, QC, Canada or inaccurate personal information we hold We identified 11 categories of clauses based on how issues per- about you. You can most easily do this by taining to such categories might affect individual rights. For in- visiting the "Account" portion of our web- stance, the unfair processing of sensitive () data, or unau- site, where you have the ability to access thorised transfer of data to third parties (tp) can have negative and update a broad range of information consequences for the consumer. Other categories pertain to the about your account, including your contact consent by using practice, the take it or leave it approach, policy information, your Netflix payment informa- changes and whether there has been a fair warning, cross-border tion, and various related information about data transfer, consent for processing children’s data, licensing data, your account (such as the content you have advertising, any other types of consent, as well as one category for viewed and rated, and your reviews. tracking any other types of problematic clauses. Under the previous version of the tagging guidelines, the two (3) Clarity of Expression. Art 12 specifies that a privacy policy clauses, considered separately, were not deemed as exhaustive with should be framed “in a concise, transparent, intelligible and eas- regard to the initial category and were marked as insuf- ily accessible form, using clear and plain language”. To integrate ficiently informative (for instance, the first clause fails to inform the this requirement into the assessment criteria, four indicators for data subject about the existence of the right to object to processing, vagueness (categories of linguistic expressions possibly generating as well as about the right to data portability). In the example below, indeterminacy, depending on the context) were defined [18]: (1) we illustrate how we now further distinguish for the right indeterminate conditioners, creating a dependency of a stated ac- to request access to personal data from the data controller, tion or activity on a variable trigger, such as “as necessary”, “from the right to request the rectification of personal data, the time to time”, etc.; (2) expression generalisations, abstracting ac- categories of personal data concerned, and the steps needed tions and activities under unclear conditions and contexts, such as to exercise the right to access their personal data. “generally”, “normally”, “ largely”, “often”, etc.; (3) modality, includ- [Current version]You can ing adverbs and non-specific adjectives, which create uncertainty request access to your personal information, with respect to the possibility of certain actions and events, and or correct or update out-of-date or inac- (4) nonspecific numeric quantifiers, creating ambiguity as to the curate personal information we hold about actual measure of a certain action and activity, such as “numerous”, you. “some”, “most”, “many”, “including (but not limited to)”, etc. Note that a single clause may fall into different categories, in different You can most easily do dimensions, and consequently may have multiple tags. For example, this by visiting the "Account" portion of if the clause allows for a problematic processing of sensitive data our website, where you have the ability to and includes vague terms, it is marked as: access and update a broad range of infor- The sentence. mation about your account, including your contact information, your Netflix payment 3.2 A Preliminary Corpus information, and various related informa- In the privacy policy assessment, we worked with a corpus of 32 tion about your account (such as the con- policies, manually tagged by two independent annotators. Privacy tent you have viewed and rated, and your policies were selected on the basis of the number of users and reviews). the platform’s global relevance, as well as taking into account our The 23 category guidelines for comprehensiveness of informa- previous work [6, 12] analysing Terms of Services for the same tion are currently being tested against the hypothesis that the added online services. We used XML mark-up language for annotations. categories will enhance the precision of answers given to the con- The data set contains 6,275 sentences. As we observed above, sumers. the sentences were tagged according to 35 categories (23 under the comprehensiveness of information dimension, 11 under substantive (2) Substantive Compliance. In dimension of substantive compliance, compliance, and 1 under clarity of expression). In the remainder of we distinguish 11 categories of clauses pertaining to the types of the paper we will only mention some of these categories and we processing. A clause is considered fair if the defined data processing will report on experiments concerning three categories (, practices are permitted by, and thus compliant with, the GDPR , and ): one for each dimension of the Gold Standard (Art.5, 6, and 9). We assumed that each clause can be classified either defined in Section 3.1. for the comprehensiveness of in- as a fair processing clause , problematic processing , formation, for substantive compliance, and finally for or unfair processing clause. We used the numerical values of unclear language. The corpus contains 773 sentences tagged with 1, 2, and 3 for each XML tag to indicate the level of fairness. In this , out of which 281 and 492 sentences refer to cases of suf- dimension, the two levels of sub-optimal achievement of the Gold ficient () and partial () information, respectively. Standard distinguish between problematic clauses, where it may be As for advertising, 91 sentences in the corpus are tagged as prob- reasonably doubted that the clause meets the GDPR requirements, lematic () whereas 95 are tagged as unfair (). Finally, and unfair clauses, where the data processing clearly fails to meet 714 sentences are tagged as unclear (). the GDPR requirements, i.e. the data processing defined in the We hereby remark that, in this paper, we are presenting a pre- policy document is forbidden by the regulation. liminary version of the corpus for which the tagging guidelines ASAIL 2019, June 21, 2019, Montreal, QC, Canada R. Liepina et al. directed to annotators have been revised multiple times. We plan third parties. to make these guidelines stable and publicly available in the near We process this information given our le- future, once the corpus is finalised. At that stage, we also intend gitimate interest in protecting the Airbnb to measure the inter-annotator agreement in order to assess the Platform, to measure the adequate perfor- quality of the deployed data set. mance of our contract with you, and to com- ply with applicable laws. 4 CHALLENGES In this section, we describe the challenges that we envision when As it can be seen, the last sentence taken separately fails to aiming to develop an automatic system for the assessment of com- specify the legitimate interest at stake, the specification there pro- pliance of privacy policies according to the GDPR. All examples vided “protecting the Airbnb Platform, to measure the adequate have been extracted from the Airbnb Privacy Policy document, last performance of our contract with you, and to comply with appli- updated 16 April 2018. cable laws", which is very generic. However, the sentence offers an adequate specification when it is read in conjunction with the 4.1 Context preceding list. This means that for the detector to identify defec- One of the earliest challenges encountered in the automated de- tiveness of a clause, it should evaluate the whole section, rather tection of problematic clauses in privacy policies is the fact that than the individual sentences. the examination of single sentences is insufficient for the deter- mination of their defectiveness within the three dimensions. For 4.2 Omission of Information this purpose we need to link several sentences. Conversely, our In our previous work [12] on Terms of Service, we used machine previous experiments showed that the analysis of single sentences learning and natural language processing techniques for the detec- is adequate to identify unlawful or unfair clause in terms of services. tion of (potentially) unfair clauses. In the context of privacy policies For instance, consider the following example taken from the Airbnb we have different goals, which are defined in the Gold Standard privacy policy. guidelines (see Section 3.1). In particular, our purpose lies not only in detecting the unfairness, and the unclear language,3 but also in [Line 80] 2.2 Create and Maintain a Trusted checking whether certain information is present and sufficient in and Safer Environment. Detect and prevent view of the regulatory framework. fraud, spam, abuse, security incidents, and The latter is conceptually a completely different task for two other harmful activity. main reasons: (i) we aim to identify the presence of a sentence, Conduct security investigations and risk as- rather than the fact that its content is not compliant with the law, sessments. and (ii) we need to verify whether some information is sufficient, Verify or authenticate information or iden- or not, with respect to the Gold Standard. tifications provided by you (such as to ver- In case of Terms of Service, classic NLP approaches, such as ify your Accommodation address or compare statistical classifiers or neural networks, worked quite well since your identification photo to another photo the detection of unfair clauses can be easily framed as a sentence you provide). classification problem, where (potential) unfairness is clearly de- Conduct checks against databases and other fined and statistics collected from a wide corpus can be sufficient information sources, including background to identify target clauses. In contrast, in the privacy policy analysis or police checks, to the extent permitted our goal is not pure detection of content, since it also involves the by applicable laws and with your consent capability to spot some missing, hidden, or insufficient information. where required. For humans, this problem is typically addressed with a number of Comply with our legal obligations. reasoning steps. Therefore, we argue that more sophisticated artifi- Resolve any disputes with any of our Mem- cial intelligence approaches are needed, for example coming from bers and enforce our agreements with third the neural-symbolic community [9], or from the neural architec- parties. tures that have been specifically developed to deal with reasoning Enforce our Terms of Service and other poli- tasks [11]. Another path for development could be explored by cies. adding contextual information to the classifier. For instance, when In connection with the activities above, we classifying a single sentence, taking into account also the informa- may conduct profiling based on your inter- tion regarding surrounding sentences, or even the whole document, actions with the Airbnb Platform, your pro- could in fact provide crucial information for a correct classification file information and other content you sub- of the clause. mit to the Airbnb Platform, and information As an example of the complexity of such a task, we hereby report obtained from third parties. In limited some clauses related to the purpose of processing () within cases, automated processes may restrict or the comprehensiveness dimension. Following the GDPR, the data suspend access to the Airbnb Platform if controller is required to provide clear information on the purposes such processes detect a Member or activity that we think poses a safety or other risk 3 The detection of unclear language is also per se a slightly different task, as it moves to the Airbnb Platform, other Members, or the attention towards a purely linguistic perspective. GDPR Privacy Policies in CLAUDETTE ASAIL 2019, June 21, 2019, Montreal, QC, Canada as to why data are collected and how such data will be used. These [ENGLISH] We may retain information processes should be transparent and within the limits prescribed in as required or permitted by applicable laws articles 13(1)(c) and 14(1)(c). To assess whether the privacy policy and regulations, including to honor your is compliant in this regard, we distinguish between optimal (fully choices, for our billing or records pur- informative) and sub-optimal (missing some information) clauses. poses and to fulfill the purposes described For example the following clause satisfies the criteria since it in this Privacy Statement. provides an exhaustive list of the purposes for data processing. We take reasonable measures to destroy or de-identify personal information If you are a Host, the Payments Data in a secure manner when it is no longer re- Controller may require identity verifica- quired. tion information (such as images of your government issued ID, passport, national ID card, or driving license) or other authen- Let us now consider the corresponding clauses in German as tication translated and marked. information, your date of birth, your ad- [GERMAN] Wir können Informationen, wie dress, email address, phone number and other gemäß geltenden Gesetzen und Bestimmungen information in order to verify your iden- erforderlich oder zugelassen, einschließlich tity, provide the Payment Services to you, unter Einbeziehung ihrer Auswahl, zu zwecken and to comply with applicable law. der Rechnungstellung oder Buchführung und um den zwecken dieser Datenschutz Erklärung In contrast, clauses that use vague language and only give general nachzukommen, speichern. examples are considered problematic, since they can be interpreted Wir ergreifen angemessene Maß- to justify the use of personal data beyond what the consumer might nahmen, um personenbezogene Daten auf eine have intended when consenting to the policy. It raises concerns sichere Weise zu zerstören oder unkenntlich around informed consent. Consider, for instance the following ex- zu machen, wenn diese nicht länger erforder- ample from the Airbnb Privacy Policy. lich sind. We may use your personal data to de- In this test case, the machine translation reference file was gen- velop new services erated in an accurate manner and the tags were successfully trans- ferred, given that the English and German language versions did 4.3 Multilingualism not bear discrepancies in the clauses used. Considering that the GDPR governs data processing in all European Clearly, there would be major challenges involved with transfer- Union states, it is important to take into account its 24 official ring tags in cases where the text in English is different from the text languages. Linguistic diversity and equal legal status between the in target language, not only in terms of syntax, but also regarding different European languages are among the core values in access the legal obligations that might be unique to a certain jurisdiction. to justice in the EU. Therefore, when offering any solution aimed at Moreover, English is by far the most widely studied language in informing and protecting consumers, researchers should also design natural language processing, thus the existing resources in other its methodology to preserve the original functions and accuracy languages are often not as accurate or rich as those developed across these many different languages. This task is particularly for English. Nevertheless, a lot of effort in artificial intelligence relevant for NGOs and consumer organisations that very often is currently being dedicated to tools and platforms dealing with struggle with the diversity of language and the comparison of multilingualism (e.g., see [2, 15] and references therein). different versions of the same documents. In our project, we have chosen English as the base language, and 5 EXPERIMENTS have started experimenting with transfer of tags from annotated In this section we present some preliminary results, based on the documents in English to privacy policies in German. This process data set of 32 annotated privacy policies, as described in Section 3.2. involves the use of three types of documents: (1) the original, an- We focus on the task of sentence detection only, leaving to future notated text in English, (2) the original text in German, and (3) the work the challenges related to multilingualism. automatic translation of the original English text into German. In particular, in our experimental evaluation we used SVMHMM, Consider, for instance, the following examples of original, an- a machine learning approach that combines Support Vector Ma- notated clauses in English. The first clause pertains to the period chines (SVM) and Hidden Markov Models (HMM) [20], and which for which the personal data will be stored. It has been marked as enables to collectively classify all the sentences in a document, thus , i.e. insufficiently informative, since it does not clearly de- taking into account the order of the examples. We started with fine the retention period of the personal data. The second clause a very basic set of features, namely the bag-of-words (unigrams pertains to both the data retention and the categories of data col- and bigrams) describing each sentence, leaving to future research a lected. It has been marked as insufficient since the retention period deeper investigation of richer feature sets, possibly exploiting deep and the categories of personal data are not defined, as indicated by learning in order to directly learn sentence representations. the expressions ‘reasonable measures’ and ‘when it is no longer In all the experiments we used the leave-one-document-out required’. (LOO) procedure, where each document is used, in turn, as the ASAIL 2019, June 21, 2019, Montreal, QC, Canada R. Liepina et al. Table 1: Macro-averaged results achieved by SVMHMM on 6 DISCUSSION AND FUTURE WORK the LOO setting. To highlight the difficulty of the task, we Considering the number of independent research projects working also report the performance of a random predictor, and a in this area, an identification of the current problems aims to estab- trivial classifier always predicting the positive class. lish a common ground for fruitful discussions of the future work. In this paper, we have presented a work in progress of a methodology Tag Method P R F1 (the Gold Standard) for annotating post-GDPR privacy policies to SVMHMM 0.408 0.565 0.421 identify and assess the compliance with the regulation. We have Random 0.034 0.034 0.034 identified three challenges that should be addressed to progress Always Positive 0.032 1.000 0.061 in assessing the privacy policies with NLP and ML tools. While SVMHMM 0.602 0.586 0.552 we have made some progress in each of the identified areas, there Random 0.126 0.126 0.126 remains a lot of work to reach the overall objectives of the project. Always Positive 0.126 1.000 0.221 The first challenge concerns the fact that the privacy policies are SVMHMM 0.412 0.612 0.460 written in a language that tends to be more broad in its possible Random 0.112 0.112 0.112 interpretations, and it is not uncommon to define the meaning of Always Positive 0.112 1.000 0.196 certain terms early in the document and use such terms without direct references back to the original definitions. Such references can be both internal and external, increasing the complexity for comprehension of the consumer’s rights and duties based on the signed agreement. Since our project aims at providing consumers test set, and all the remaining are merged into the training set. We with a tool that would facilitate an increased understanding of the consider the following performance measures: (i) precision P, that privacy policies, it is essential that the automated evaluation of is the fraction of sentences predicted as positive, which are actually clauses is able to build context for such an understanding. positive; (ii) recall R, that is the fraction of positive sentences that The second challenge focused on the omission of information, are correctly detected; (iii) F -measure F 1 , that is the harmonic mean which requires both the knowledge of what information should between P and R. For each measure, we report the macro-average, be included in the document and a way to identify the absence of that is the average computed over the measures obtained for each the required information. Such a task requires exploring methods single document. beyond pure text mining approaches. We consider the tasks of detecting the clauses concerning the Lastly, we looked at the need to consider an approach that is purpose of processing (thus considering the union of and able to use the results achieved in working with privacy policies in as the positive class), those problematic or unfair related English and transfer the annotations to different language versions to advertising (with the union of and as the positive without losing the accuracy and efficiency. class), and finally those that contain unclear language (the In sum, with ever more scientific research going open-access, the tag only). Results are reported in Table 1. To highlight the difficulty need for clear and transparent annotation guidelines and shared of the task, we compare the results achieved by SVMHMM against corpora is increasingly pressing. As part of our future work, we two trivial baselines: a random classifier, which predicts the positive aim to publish the annotated privacy policy corpora online, as we class accordingly to class distribution, and a second system that have done with the Terms of Service agreements. Future work also always predicts the positive class. SVMHMM achieves a value of includes moving beyond pure language processing and introducing F 1 equal to 0.552 for the detection of clauses regarding the purpose a level of reasoning that allows context comprehension by machines. of processing (against 0.126 and 0.221 of the two baselines, respec- We maintain our overall objective to design a methodology and tively) and 0.421 for advertising (against 0.034 and 0.061 for the provide a tool for consumers and NGOs that would empower them two baselines, respectively). A similar trend is shown for unclear through more informed decision making in the digital environment. language, which achieves F 1 equal to 0.460. The very low values of the baselines, as well as the confusion matrices reported in Ta- 7 ACKNOWLEDGEMENTS ble 2, clearly show the large imbalance between the positive and We would like to thank all the members of the Project Claudette negative classes: for example, only 3% of sentences are annotated and our funding authorities at the European University Institute as either or . This imbalance makes all the considered Research Council, Bureau Européen des Unions de Consommateurs, tasks particularly challenging. Therefore the F 1 values obtained in and the Zeppelin Universität. the range 0.42 – 0.55 can be considered as encouraging. In addition, we also want to note that the results are very het- REFERENCES erogeneous across different documents. For example, for the [1] Lisa M Austin, David Lie, Peter Yi Ping Sun, Robin Spillette, Michelle Wong, and tag, for the Dropbox and Courchsurfing policies, the SVMHMM Mariana D’Angelo. Towards dynamic transparency: The apptrans (transparency approach achieves F 1 equal to 0.86 and 0.89, respectively, whereas for android applications) project. http://dx.doi.org/10.2139/ssrn.3203601, 2018. [2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine trans- the Crowtangle policy is even perfectly predicted, with three posi- lation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, tive clauses correctly predicted with no false positive. We plan to 2014. deeply analyse and discuss further these more fine-grained results [3] Yannis Bakos, Florencia Marotta-Wurgler, and David R Trossen. Does anyone read the fine print? consumer attention to standard-form contracts. The Journal once our final corpus will be released. of Legal Studies, 43(1):1–35, 2014. GDPR Privacy Policies in CLAUDETTE ASAIL 2019, June 21, 2019, Montreal, QC, Canada Predicted Predicted Predicted 0 1 0 1 0 1 0 5,893 196 0 5,199 303 0 4,969 592 True True True 1 88 98 1 327 446 1 297 417 Table 2: Micro-averaged confusion matrices for the three considered detection tasks: (left), (center), and (right). The positive class (1) represents sentences of that specific tag, whereas the negative class (0) represents all the other sentences. Large class imbalance is evident in all cases. Note that the precision and recall metrics obtained from these tables slightly differ from the results in Table 1 because here we are reporting micro-average rather than macro-average. [4] Cesare Bartolini, Gabriele Lenzini, and Cristiana Santos. A legal validation of a formal representation of gdpr articles. In CEUR Workshop Proceedings:, http://ceur-ws.org/Vol-2309/10.pdf. [5] Shmuel I Becher and Uri Benoliel. Law in books and law in action: The readability of privacy policies and the gdpr. CONSUMER LAW & ECONOMICS, Klaus Mathis & Avishalom Tor, eds., Springer (forthcoming, 2019), 2019. [6] Giuseppe Contissa, Koen Docter, Francesca Lagioia, Marco Lippi, Hans-W Mick- litz, Przemysław Pałka, Giovanni Sartor, and Paolo Torroni. Claudette meets gdpr: Automating the evaluation of privacy policies using artificial intelligence. https://ssrn.com/abstract=3208596, 2018. [7] Elisa Costante, Yuanhao Sun, Milan Petković, and Jerry den Hartog. A machine learning solution to assess privacy policy completeness:(short paper). In Proceed- ings of the 2012 ACM workshop on Privacy in the electronic society, pages 91–96. ACM, 2012. [8] Mauro Dragoni, Serena Villata, Williams Rizzi, and Guido Governatori. Combin- ing nlp approaches for rule extraction from legal documents. In 1st Workshop on MIning and REasoning with Legal texts (MIREL 2016), 2016. [9] Artur d’Avila Garcez, Tarek R Besold, Luc De Raedt, Peter Földiak, Pascal Hit- zler, Thomas Icard, Kai-Uwe Kühnberger, Luis C Lamb, Risto Miikkulainen, and Daniel L Silver. Neural-symbolic learning and reasoning: contributions and challenges. In 2015 AAAI Spring Symposium Series, 2015. [10] Mustafa Hashmi. A methodology for extracting legal norms from regulatory doc- uments. In 2015 IEEE 19th International Enterprise Distributed Object Computing Workshop, pages 41–50. IEEE, 2015. [11] Herbert Jaeger. Artificial intelligence: Deep neural reasoning. Nature, 538(7626):467, 2016. [12] Marco Lippi, Przemysław Pałka, Giuseppe Contissa, Francesca Lagioia, Hans- Wolfgang Micklitz, Giovanni Sartor, and Paolo Torroni. Claudette: an automated detector of potentially unfair clauses in online terms of service. Artificial Intelli- gence and Law, pages 1–23, 2018. [13] Aleecia M McDonald and Lorrie Faith Cranor. The cost of reading privacy policies. ISJLP, 4:543, 2008. [14] George R Milne, Mary J Culnan, and Henry Greene. A longitudinal assessment of online privacy notice readability. Journal of Public Policy & Marketing, 25(2):238– 249, 2006. [15] Roberto Navigli and Simone Paolo Ponzetto. Babelnet: The automatic con- struction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193:217–250, 2012. [16] Jonathan A Obar and Anne Oeldorf-Hirsch. The biggest lie on the internet: Ignoring the privacy policies and terms of service policies of social networking services. Information, Communication & Society, pages 1–20, 2018. [17] Monica Palmirani, Michele Martoni, Arianna Rossi, Cesare Bartolini, and Livio Robaldo. Pronto: Privacy ontology for legal reasoning. In International Conference on Electronic Government and the Information Systems Perspective, pages 139–152. Springer, 2018. [18] Joel R Reidenberg, Jaspreet Bhatia, Travis D Breaux, and Thomas B Norton. Ambiguity in privacy policies and the impact of regulation. The Journal of Legal Studies, 45(S2):S163–S190, 2016. [19] Welderufael B Tesfay, Peter Hofmann, Toru Nakamura, Shinsaku Kiyomoto, and Jetzabel Serna. I read but don’t agree: Privacy policy benchmarking using machine learning and the eu gdpr. In Companion of the The Web Conference 2018 on The Web Conference 2018, pages 163–166. International World Wide Web Conferences Steering Committee, 2018. [20] Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, and Yasemin Altun. Support vector machine learning for interdependent and structured output spaces. In Proceedings of the twenty-first international conference on Machine learning, page 104. ACM, 2004.