=Paper=
{{Paper
|id=Vol-2385/paper9
|storemode=property
|title=GDPR Privacy Policies in CLAUDETTE: Challenges of Omission, Context and Multilingualism
|pdfUrl=https://ceur-ws.org/Vol-2385/paper9.pdf
|volume=Vol-2385
|authors=Rūta Liepina,Giuseppe Contissa,Kasper Drazewski,Francesca Lagioia,Marco Lippi,Hans-Wolfgang Micklitz,Przemysław Pałka,Giovanni Sartor,Paolo Torroni
|dblpUrl=https://dblp.org/rec/conf/icail/LiepinaCDL0MPST19
}}
==GDPR Privacy Policies in CLAUDETTE: Challenges of Omission, Context and Multilingualism==
<pdf width="1500px">https://ceur-ws.org/Vol-2385/paper9.pdf</pdf>
<pre>
                     GDPR Privacy Policies in CLAUDETTE:
              Challenges of Omission, Context and Multilingualism
                    Rūta Liepin, a                                    Giuseppe Contissa                                    Kasper Drazewski
    Law Department, EUI, Florence, Italy                     CIRSFID, University of Bologna, Italy                 Law Department, EUI, Florence, Italy

               Francesca Lagioia                                            Marco Lippi                                Hans-Wolfgang Micklitz
            EUI, Florence, Italy                               DISMI, University of Modena and                     Law Department, EUI, Florence, Italy
    CIRSFID, University of Bologna, Italy                            Reggio Emilia, Italy

               Przemysław Pałka                                          Giovanni Sartor                                        Paolo Torroni
               Yale Law School                                       EUI, Florence, Italy                           DISI, University of Bologna, Italy
            New Haven, United States                         CIRSFID, University of Bologna, Italy

Abstract: The latest developments in natural language process-                        linguistic and legal complexity, and the need for methodologies
ing and machine learning have created new opportunities in legal                      that can be transferred between different European languages.
text analysis. In particular, we look at the texts of online privacy
policies after the implementation of the European General Data                        2     BACKGROUND
Protection Regulation (GDPR). We analyse 32 privacy policies to                       Legal texts, such as regulations, contracts, privacy policies, and
design a methodology for automated detection and assessment of                        cases, provide a rich source for different formal analyses, due to
compliance of these documents. Preliminary results confirm the                        the complexity of language and legal norms within those texts.
pressing issues with current privacy policies and the beneficial                      One of the aims of artificial intelligence and law research [8, 10]
use of this approach in empowering consumers in making more                           is to find methods for accurately and efficiently extracting the
informed decisions. However, we also encountered several serious                      knowledge from legal texts and for providing a level of evaluation
issues in the process. This paper introduces the challenges through                   for the extracted data. This paper focuses on the legal texts of online
concrete examples of context dependence, omission of information,                     privacy policies. We identified three main dimensions for evaluation
and multilingualism.                                                                  based on the GDPR and its guidelines: completeness, compliance
                                                                                      with the data processing rules, and level of readability. A selection
1    INTRODUCTION                                                                     of the research studies in these fields is introduced below.
The changes in online privacy policies following the European
                                                                                         Completeness: one of the core criticisms against unfair privacy
General Data Protection Regulation (GDPR) have further high-
                                                                                      policies regards withheld or missing information on the data pro-
lighted the increasing information asymmetry between online ser-
                                                                                      cessing, such as the purpose and retention time of personal data,
vice providers and consumers. Studies [3, 5] in consumer behaviour
                                                                                      including sensitive data. Constante et al. [7] use machine learning
in reading privacy policies show that long and complex legal doc-
                                                                                      and pre-annotated privacy policies to check for the completeness
uments are seldom read and understood by users. Moreover, [13]
                                                                                      of information pre-GDPR. To this end, they designed a client-end
show that comprehending the rights and obligations outlined in
                                                                                      solution, allowing consumers to read summarised policies on pri-
these online documents is costly both in terms of time and monetary
                                                                                      vacy categories of their choice (6 core categories and 11 additional
value.
                                                                                      categories).
   This paper presents a work in progress that includes the latest de-
velopments of our methodology [12] in designing the Gold Standard                        Compliance: service providers, consumers and law enforcement
of privacy policy compliance that could be used to build a platform                   authorities are interested in assessing the compliance of online
empowering consumers to gain easier access and support in un-                         privacy policies. However, it has proven to be a challenging task.
derstanding their rights and obligations. We aim to provide such a                    Research in this area focuses on formalising legal norms [4] and
solution through the use of legal analysis, natural language process-                 designing methodologies [17] for automating the assessment of
ing, and machine learning. In Section 4, we describe three challenges                 privacy policies. One of the risks identified [10] relates to the misin-
faced by the AI and Law researchers working on automating eval-                       terpretation of norms as well as to the failure in connecting different
uation of legal documents and illustrate them through examples                        specifications of norms within a legal document.
found in the privacy policies analysed in our study. Among other
                                                                                        Readability: a different area of research focuses on the language
issues, we focus on the problem of context dependence of (legal)
                                                                                      and accessibility of privacy policies. A new study [5] provides
terms, the challenges in formalising the privacy policies due to their
                                                                                      empirical evidence on the readability levels of privacy policies post-
In: Proceedings of the Third Workshop on Automated Semantic Analysis of Information
                                                                                      GDPR, concluding that “these policies are often unreadable”.1 Fol-
in Legal Text (ASAIL 2019), June 21, 2019, Montreal, QC, Canada.                      lowing previous work by [14], their results support the conclusion
© 2019 Copyright held by the owner/author(s). Copying permitted for private and
academic purposes.                                                                    1 For readability scores the study employed the Flesch Reading Ease (FRE) test and the
Published at https://ceur-ws.org                                                      Flesch-Kincaid (F-K) test.
ASAIL 2019, June 21, 2019, Montreal, QC, Canada                                                                                                           R. Liepina et al.


that an unreasonable level of expertise is required to comprehend                          Each of the top-level dimensions has been further divided into
the privacy policies. The average score, among the 300 analysed                         the relevant categories and corresponding evaluation criteria. Dia-
policies, was at a level of “the usual score of articles in academic                    gram 1 shows the layered structure of the methodology by exem-
journals” [5], supporting the claim that policies are not written                       plifying a good privacy policy: one that satisfies all the criteria.2
to be accessible and understandable by the general public. Such                         To meet the requirements of comprehensiveness, a privacy policy
barriers further discourage consumers from reading privacy poli-                        should declare the purposes of the processing precisely and exhaus-
cies [16]. Some solutions, such as automatically generated privacy                      tively. Thus, clauses providing only examples must be considered
policy summaries [19] and interactive solutions of privacy analysis                     as insufficiently informative. In the dimension of substantive com-
through apps [1], are emerging to provide consumers with tools                          pliance, using personal data for targeted advertising is fair only
to better understand the contents of agreements and exercise their                      if based on the data subject’s consent and whenever an opt-out
rights.                                                                                 is possible. Regarding the clarity of expression, i.e. whether a pri-
                                                                                        vacy policy is framed in understandable, precise, and intelligible
3     DESIGNING METHODOLOGY                                                             language, certain unspecific language qualifiers should be avoided
This project aims to design a methodology for creating an open                          (e.g. indeterminate conditioners, creating a dependency of a stated
and high quality annotated corpus of online privacy policies. Such                      action or activity on a variable trigger such as “as necessary”, “from
a data set could be used for automated detection and evaluation of                      time to time”, etc). We have designed detailed annotation guidelines
problematic privacy clauses given the GDPR as the basis for inte-                       that are being further tested with a new data set of policies.
grated normative guidelines. Here, we present an overview of the                        (1) Comprehensiveness of Information. The clause satisfies the crite-
current methodology for detecting and assessing the problematic                         ria if the privacy policy includes sufficient information on the 23
privacy clauses, and how the new guidelines have improved on                            categories defined in the annotation guidelines. These include: <id>
previous versions [6].                                                                  identity of the data controller, <cat> categories of personal data
                                                                                        concerned, and <ret> the period for which the personal data will be
3.1      The Gold Standard                                                              stored. Where ‘sufficiency’ is defined as fully informative privacy
We designed a methodology that reflects the overall aims of the                         clauses that include all the details required by the regulation (e.g.
GDPR in regards to collection and processing of personal data. In                       <id1>). Everything that does not satisfy the given criteria, as speci-
particular, we focus on three ways a privacy policy can be deemed                       fied in the guidelines has been marked as sub-optimal (e.g. <id2>).
unlawful according to articles 13 and 14 of the GDPR: (1) if the pol-                   We use the numerical values of 1 and 2 in the XML tags to refer to
icy omits information required by the regulation, (2) if the policy                     the level of comprehensiveness of the information given. The earlier
defines data processing beyond the prescribed limits, and (3) if it is                  version of the methodology distinguished 12 relevant categories.
written in unclear language.                                                            The number of categories was increased to 23 to provide a more
                                                                                        fine-grained annotation of functions. The improvements from the
                                                                                        previous annotation guidelines [6] consist of the further specifica-
                                                                                        tion of the different functions of the rights granted to consumers,
                               The Gold Standard                                        and the steps needed to exercise them. In particular, the clauses
                                                                                        implementing the duty to inform the data subject about their rights,
                                                                                        under article 13.2(b) and 14.2(c) of the GDPR, initially falling under
                                                                                        a single category of required information[6] and identified with
        Comprehensive               Substantive                                         the <correct> tag, have been distinguished in multiple categories.
     information provided           compliance                Clear expression          The reason for further differentiating between such categories is
                                                                                        twofold. Firstly, from the legal point of view, the right to request
                                                                                        access to, and rectification or erasure of, personal data or restriction
        23 categories                11 categories          clear language not
                                                             tagged; <vag> for
                                                                                        of processing and to object to processing, as well as the right to data
     e.g. <purp> for the       e.g. <ad> for the use of
 purposes of data processing    personal data for ads       unclear expressions         portability, are conceptually distinct and independent. Secondly, in
                                                                                        analysing the privacy policies, we noted that the different rights
                                                                                        and steps needed to exercise these rights are usually addressed in
    Optimal or sub-optimal        Fair processing,        4 indicators: conditionals,   separate clauses. Thus we chose the units for our tagging method
     depending on whether      problematic processing,    generalisations, modality,    as single phrases. Indeed, with clauses covering multiple sentences,
    sufficient info included     or unfair processing        non-spec. quantifiers      we chose to tag each sentence separately, by treating statements
                                                                                        independently from one another. Hence, also the clauses contain-
            Figure 1: Dimensions - categories - criteria                                ing information about the rights are now classified separately from
                                                                                        those outlining the steps needed to exercise these rights. Consider,
                                                                                        for instance, the following example:
    We chose three top-level dimensions for the evaluation:                                     You can request access to your personal in-
    (1) comprehensiveness of information                                                        formation, or correct or update out-of-date
    (2) substantive compliance
    (3) clarity of expression                                                           2 In the diagram, the underlined criteria illustrate a good privacy policy.
GDPR Privacy Policies in CLAUDETTE                                                           ASAIL 2019, June 21, 2019, Montreal, QC, Canada


       or inaccurate personal information we hold                              We identified 11 categories of clauses based on how issues per-
       about you. You can most easily do this by                            taining to such categories might affect individual rights. For in-
       visiting the "Account" portion of our web-                           stance, the unfair processing of sensitive (<sens>) data, or unau-
       site, where you have the ability to access                           thorised transfer of data to third parties (tp) can have negative
       and update a broad range of information                              consequences for the consumer. Other categories pertain to the
       about your account, including your contact                           consent by using practice, the take it or leave it approach, policy
       information, your Netflix payment informa-                           changes and whether there has been a fair warning, cross-border
       tion, and various related information about                          data transfer, consent for processing children’s data, licensing data,
       your account (such as the content you have                           advertising, any other types of consent, as well as one category for
       viewed and rated, and your reviews.                                  tracking any other types of problematic clauses.
Under the previous version of the tagging guidelines, the two               (3) Clarity of Expression. Art 12 specifies that a privacy policy
clauses, considered separately, were not deemed as exhaustive with          should be framed “in a concise, transparent, intelligible and eas-
regard to the initial <correct> category and were marked as insuf-          ily accessible form, using clear and plain language”. To integrate
ficiently informative (for instance, the first clause fails to inform the   this requirement into the assessment criteria, four indicators for
data subject about the existence of the right to object to processing,      vagueness (categories of linguistic expressions possibly generating
as well as about the right to data portability). In the example below,      indeterminacy, depending on the context) were defined [18]: (1)
we illustrate how we now further distinguish <acc> for the right            indeterminate conditioners, creating a dependency of a stated ac-
to request access to personal data from the data controller, <corr>         tion or activity on a variable trigger, such as “as necessary”, “from
the right to request the rectification of personal data, <cat> the          time to time”, etc.; (2) expression generalisations, abstracting ac-
categories of personal data concerned, and <sacc> the steps needed          tions and activities under unclear conditions and contexts, such as
to exercise the right to access their personal data.                        “generally”, “normally”, “ largely”, “often”, etc.; (3) modality, includ-
       [Current version]<acc2><corr2><cat2>You can                          ing adverbs and non-specific adjectives, which create uncertainty
       request access to your personal information,                         with respect to the possibility of certain actions and events, and
       or correct or update out-of-date or inac-                            (4) nonspecific numeric quantifiers, creating ambiguity as to the
       curate personal information we hold about                            actual measure of a certain action and activity, such as “numerous”,
       you.</cat2>                                                          “some”, “most”, “many”, “including (but not limited to)”, etc. Note
       </corr2></acc2>                                                      that a single clause may fall into different categories, in different
       <sacc1><acc1><corr1>You can most easily do                           dimensions, and consequently may have multiple tags. For example,
       this by visiting the "Account" portion of                            if the clause allows for a problematic processing of sensitive data
       our website, where you have the ability to                           and includes vague terms, it is marked as:
       access and update a broad range of infor-                                    <sens><vag>The sentence.</vag></sens>
       mation about your account, including your
       contact information, your Netflix payment                            3.2    A Preliminary Corpus
       information, and various related informa-                            In the privacy policy assessment, we worked with a corpus of 32
       tion about your account (such as the con-                            policies, manually tagged by two independent annotators. Privacy
       tent you have viewed and rated, and your                             policies were selected on the basis of the number of users and
       reviews).</corr1></acc1></sacc1>                                     the platform’s global relevance, as well as taking into account our
   The 23 category guidelines for comprehensiveness of informa-             previous work [6, 12] analysing Terms of Services for the same
tion are currently being tested against the hypothesis that the added       online services. We used XML mark-up language for annotations.
categories will enhance the precision of answers given to the con-             The data set contains 6,275 sentences. As we observed above,
sumers.                                                                     the sentences were tagged according to 35 categories (23 under the
                                                                            comprehensiveness of information dimension, 11 under substantive
(2) Substantive Compliance. In dimension of substantive compliance,         compliance, and 1 under clarity of expression). In the remainder of
we distinguish 11 categories of clauses pertaining to the types of          the paper we will only mention some of these categories and we
processing. A clause is considered fair if the defined data processing      will report on experiments concerning three categories (<purp>,
practices are permitted by, and thus compliant with, the GDPR               <ad>, and <vag>): one for each dimension of the Gold Standard
(Art.5, 6, and 9). We assumed that each clause can be classified either     defined in Section 3.1. <purp> for the comprehensiveness of in-
as a fair processing clause <tag1>, problematic processing <tag2>,          formation, <ad> for substantive compliance, and finally <vag> for
or unfair processing <tag3> clause. We used the numerical values of         unclear language. The corpus contains 773 sentences tagged with
1, 2, and 3 for each XML tag to indicate the level of fairness. In this     <purp>, out of which 281 and 492 sentences refer to cases of suf-
dimension, the two levels of sub-optimal achievement of the Gold            ficient (<purp1>) and partial (<purp2>) information, respectively.
Standard distinguish between problematic clauses, where it may be           As for advertising, 91 sentences in the corpus are tagged as prob-
reasonably doubted that the clause meets the GDPR requirements,             lematic (<ad2>) whereas 95 are tagged as unfair (<ad3>). Finally,
and unfair clauses, where the data processing clearly fails to meet         714 sentences are tagged as unclear (<vag>).
the GDPR requirements, i.e. the data processing defined in the                 We hereby remark that, in this paper, we are presenting a pre-
policy document is forbidden by the regulation.                             liminary version of the corpus for which the tagging guidelines
ASAIL 2019, June 21, 2019, Montreal, QC, Canada                                                                                              R. Liepina et al.


directed to annotators have been revised multiple times. We plan                  third parties.
to make these guidelines stable and publicly available in the near                We process this information given our le-
future, once the corpus is finalised. At that stage, we also intend               gitimate interest in protecting the Airbnb
to measure the inter-annotator agreement in order to assess the                   Platform, to measure the adequate perfor-
quality of the deployed data set.                                                 mance of our contract with you, and to com-
                                                                                  ply with applicable laws.
4     CHALLENGES
In this section, we describe the challenges that we envision when            As it can be seen, the last sentence taken separately fails to
aiming to develop an automatic system for the assessment of com-          specify the legitimate interest at stake, the specification there pro-
pliance of privacy policies according to the GDPR. All examples           vided “protecting the Airbnb Platform, to measure the adequate
have been extracted from the Airbnb Privacy Policy document, last         performance of our contract with you, and to comply with appli-
updated 16 April 2018.                                                    cable laws", which is very generic. However, the sentence offers
                                                                          an adequate specification when it is read in conjunction with the
4.1    Context                                                            preceding list. This means that for the detector to identify defec-
One of the earliest challenges encountered in the automated de-           tiveness of a clause, it should evaluate the whole section, rather
tection of problematic clauses in privacy policies is the fact that       than the individual sentences.
the examination of single sentences is insufficient for the deter-
mination of their defectiveness within the three dimensions. For          4.2      Omission of Information
this purpose we need to link several sentences. Conversely, our           In our previous work [12] on Terms of Service, we used machine
previous experiments showed that the analysis of single sentences         learning and natural language processing techniques for the detec-
is adequate to identify unlawful or unfair clause in terms of services.   tion of (potentially) unfair clauses. In the context of privacy policies
For instance, consider the following example taken from the Airbnb        we have different goals, which are defined in the Gold Standard
privacy policy.                                                           guidelines (see Section 3.1). In particular, our purpose lies not only
                                                                          in detecting the unfairness, and the unclear language,3 but also in
       [Line 80] 2.2 Create and Maintain a Trusted
                                                                          checking whether certain information is present and sufficient in
       and Safer Environment. Detect and prevent
                                                                          view of the regulatory framework.
       fraud, spam, abuse, security incidents, and
                                                                             The latter is conceptually a completely different task for two
       other harmful activity.
                                                                          main reasons: (i) we aim to identify the presence of a sentence,
       Conduct security investigations and risk as-
                                                                          rather than the fact that its content is not compliant with the law,
       sessments.
                                                                          and (ii) we need to verify whether some information is sufficient,
       Verify or authenticate information or iden-
                                                                          or not, with respect to the Gold Standard.
       tifications provided by you (such as to ver-
                                                                             In case of Terms of Service, classic NLP approaches, such as
       ify your Accommodation address or compare
                                                                          statistical classifiers or neural networks, worked quite well since
       your identification photo to another photo
                                                                          the detection of unfair clauses can be easily framed as a sentence
       you provide).
                                                                          classification problem, where (potential) unfairness is clearly de-
       Conduct checks against databases and other
                                                                          fined and statistics collected from a wide corpus can be sufficient
       information sources, including background
                                                                          to identify target clauses. In contrast, in the privacy policy analysis
       or police checks, to the extent permitted
                                                                          our goal is not pure detection of content, since it also involves the
       by applicable laws and with your consent
                                                                          capability to spot some missing, hidden, or insufficient information.
       where required.
                                                                          For humans, this problem is typically addressed with a number of
       Comply with our legal obligations.
                                                                          reasoning steps. Therefore, we argue that more sophisticated artifi-
       Resolve any disputes with any of our Mem-
                                                                          cial intelligence approaches are needed, for example coming from
       bers and enforce our agreements with third
                                                                          the neural-symbolic community [9], or from the neural architec-
       parties.
                                                                          tures that have been specifically developed to deal with reasoning
       Enforce our Terms of Service and other poli-
                                                                          tasks [11]. Another path for development could be explored by
       cies.
                                                                          adding contextual information to the classifier. For instance, when
       In connection with the activities above, we
                                                                          classifying a single sentence, taking into account also the informa-
       may conduct profiling based on your inter-
                                                                          tion regarding surrounding sentences, or even the whole document,
       actions with the Airbnb Platform, your pro-
                                                                          could in fact provide crucial information for a correct classification
       file information and other content you sub-
                                                                          of the clause.
       mit to the Airbnb Platform, and information
                                                                             As an example of the complexity of such a task, we hereby report
       obtained from third parties. In limited
                                                                          some clauses related to the purpose of processing (<purp>) within
       cases, automated processes may restrict or
                                                                          the comprehensiveness dimension. Following the GDPR, the data
       suspend access to the Airbnb Platform if
                                                                          controller is required to provide clear information on the purposes
       such processes detect a Member or activity
       that we think poses a safety or other risk                         3 The detection of unclear language is also per se a slightly different task, as it moves
       to the Airbnb Platform, other Members, or                          the attention towards a purely linguistic perspective.
GDPR Privacy Policies in CLAUDETTE                                                        ASAIL 2019, June 21, 2019, Montreal, QC, Canada


as to why data are collected and how such data will be used. These              [ENGLISH] <ret2>We may retain information
processes should be transparent and within the limits prescribed in             as required or permitted by applicable laws
articles 13(1)(c) and 14(1)(c). To assess whether the privacy policy            and regulations, including to honor your
is compliant in this regard, we distinguish between optimal (fully              choices, for our billing or records pur-
informative) and sub-optimal (missing some information) clauses.                poses and to fulfill the purposes described
   For example the following clause satisfies the criteria since it             in this Privacy Statement.</ret2>
provides an exhaustive list of the purposes for data processing.                <ret2><cat2>We take reasonable measures to
                                                                                destroy or de-identify personal information
       <purp1>If you are a Host, the Payments Data
                                                                                in a secure manner when it is no longer re-
       Controller may require identity verifica-
                                                                                quired.</cat2></ret2>
       tion information (such as images of your
       government issued ID, passport, national ID
       card, or driving license) or other authen-                           Let us now consider the corresponding clauses in German as
       tication                                                          translated and marked.
       information, your date of birth, your ad-                                [GERMAN] <ret2>Wir können Informationen, wie
       dress, email address, phone number and other                             gemäß geltenden Gesetzen und Bestimmungen
       information in order to verify your iden-                                erforderlich oder zugelassen, einschließlich
       tity, provide the Payment Services to you,                               unter Einbeziehung ihrer Auswahl, zu zwecken
       and to comply with applicable law.</purp1>                               der Rechnungstellung oder Buchführung und
                                                                                um den zwecken dieser Datenschutz Erklärung
   In contrast, clauses that use vague language and only give general
                                                                                nachzukommen, speichern.</ret2>
examples are considered problematic, since they can be interpreted
                                                                                <cat2><ret2>Wir ergreifen angemessene Maß-
to justify the use of personal data beyond what the consumer might
                                                                                nahmen, um personenbezogene Daten auf eine
have intended when consenting to the policy. It raises concerns
                                                                                sichere Weise zu zerstören oder unkenntlich
around informed consent. Consider, for instance the following ex-
                                                                                zu machen, wenn diese nicht länger erforder-
ample from the Airbnb Privacy Policy.
                                                                                lich sind.</ret2></cat2>
       <purp2>We may use your personal data to de-                          In this test case, the machine translation reference file was gen-
       velop new services</purp2>                                        erated in an accurate manner and the tags were successfully trans-
                                                                         ferred, given that the English and German language versions did
4.3    Multilingualism                                                   not bear discrepancies in the clauses used.
Considering that the GDPR governs data processing in all European           Clearly, there would be major challenges involved with transfer-
Union states, it is important to take into account its 24 official       ring tags in cases where the text in English is different from the text
languages. Linguistic diversity and equal legal status between the       in target language, not only in terms of syntax, but also regarding
different European languages are among the core values in access         the legal obligations that might be unique to a certain jurisdiction.
to justice in the EU. Therefore, when offering any solution aimed at     Moreover, English is by far the most widely studied language in
informing and protecting consumers, researchers should also design       natural language processing, thus the existing resources in other
its methodology to preserve the original functions and accuracy          languages are often not as accurate or rich as those developed
across these many different languages. This task is particularly         for English. Nevertheless, a lot of effort in artificial intelligence
relevant for NGOs and consumer organisations that very often             is currently being dedicated to tools and platforms dealing with
struggle with the diversity of language and the comparison of            multilingualism (e.g., see [2, 15] and references therein).
different versions of the same documents.
   In our project, we have chosen English as the base language, and      5   EXPERIMENTS
have started experimenting with transfer of tags from annotated          In this section we present some preliminary results, based on the
documents in English to privacy policies in German. This process         data set of 32 annotated privacy policies, as described in Section 3.2.
involves the use of three types of documents: (1) the original, an-      We focus on the task of sentence detection only, leaving to future
notated text in English, (2) the original text in German, and (3) the    work the challenges related to multilingualism.
automatic translation of the original English text into German.             In particular, in our experimental evaluation we used SVMHMM,
   Consider, for instance, the following examples of original, an-       a machine learning approach that combines Support Vector Ma-
notated clauses in English. The first clause pertains to the period      chines (SVM) and Hidden Markov Models (HMM) [20], and which
for which the personal data will be stored. It has been marked as        enables to collectively classify all the sentences in a document, thus
<ret2>, i.e. insufficiently informative, since it does not clearly de-   taking into account the order of the examples. We started with
fine the retention period of the personal data. The second clause        a very basic set of features, namely the bag-of-words (unigrams
pertains to both the data retention and the categories of data col-      and bigrams) describing each sentence, leaving to future research a
lected. It has been marked as insufficient since the retention period    deeper investigation of richer feature sets, possibly exploiting deep
and the categories of personal data are not defined, as indicated by     learning in order to directly learn sentence representations.
the expressions ‘reasonable measures’ and ‘when it is no longer             In all the experiments we used the leave-one-document-out
required’.                                                               (LOO) procedure, where each document is used, in turn, as the
ASAIL 2019, June 21, 2019, Montreal, QC, Canada                                                                                              R. Liepina et al.

Table 1: Macro-averaged results achieved by SVMHMM on                      6    DISCUSSION AND FUTURE WORK
the LOO setting. To highlight the difficulty of the task, we
                                                                           Considering the number of independent research projects working
also report the performance of a random predictor, and a
                                                                           in this area, an identification of the current problems aims to estab-
trivial classifier always predicting the positive class.
                                                                           lish a common ground for fruitful discussions of the future work. In
                                                                           this paper, we have presented a work in progress of a methodology
          Tag         Method              P       R        F1              (the Gold Standard) for annotating post-GDPR privacy policies to
                     SVMHMM             0.408   0.565    0.421             identify and assess the compliance with the regulation. We have
          <ad>        Random            0.034   0.034    0.034             identified three challenges that should be addressed to progress
                   Always Positive      0.032   1.000    0.061             in assessing the privacy policies with NLP and ML tools. While
                     SVMHMM             0.602   0.586    0.552             we have made some progress in each of the identified areas, there
        <purp>        Random            0.126   0.126    0.126             remains a lot of work to reach the overall objectives of the project.
                   Always Positive      0.126   1.000    0.221                 The first challenge concerns the fact that the privacy policies are
                     SVMHMM             0.412   0.612    0.460             written in a language that tends to be more broad in its possible
         <vag>        Random            0.112   0.112    0.112             interpretations, and it is not uncommon to define the meaning of
                   Always Positive      0.112   1.000    0.196             certain terms early in the document and use such terms without
                                                                           direct references back to the original definitions. Such references
                                                                           can be both internal and external, increasing the complexity for
                                                                           comprehension of the consumer’s rights and duties based on the
                                                                           signed agreement. Since our project aims at providing consumers
test set, and all the remaining are merged into the training set. We       with a tool that would facilitate an increased understanding of the
consider the following performance measures: (i) precision P, that         privacy policies, it is essential that the automated evaluation of
is the fraction of sentences predicted as positive, which are actually     clauses is able to build context for such an understanding.
positive; (ii) recall R, that is the fraction of positive sentences that       The second challenge focused on the omission of information,
are correctly detected; (iii) F -measure F 1 , that is the harmonic mean   which requires both the knowledge of what information should
between P and R. For each measure, we report the macro-average,            be included in the document and a way to identify the absence of
that is the average computed over the measures obtained for each           the required information. Such a task requires exploring methods
single document.                                                           beyond pure text mining approaches.
    We consider the tasks of detecting the clauses concerning the              Lastly, we looked at the need to consider an approach that is
purpose of processing (thus considering the union of <purp1> and           able to use the results achieved in working with privacy policies in
<purp2> as the positive class), those problematic or unfair related        English and transfer the annotations to different language versions
to advertising (with the union of <ad2> and <ad3> as the positive          without losing the accuracy and efficiency.
class), and finally those that contain unclear language (the <vag>             In sum, with ever more scientific research going open-access, the
tag only). Results are reported in Table 1. To highlight the difficulty    need for clear and transparent annotation guidelines and shared
of the task, we compare the results achieved by SVMHMM against             corpora is increasingly pressing. As part of our future work, we
two trivial baselines: a random classifier, which predicts the positive    aim to publish the annotated privacy policy corpora online, as we
class accordingly to class distribution, and a second system that          have done with the Terms of Service agreements. Future work also
always predicts the positive class. SVMHMM achieves a value of             includes moving beyond pure language processing and introducing
F 1 equal to 0.552 for the detection of clauses regarding the purpose      a level of reasoning that allows context comprehension by machines.
of processing (against 0.126 and 0.221 of the two baselines, respec-       We maintain our overall objective to design a methodology and
tively) and 0.421 for advertising (against 0.034 and 0.061 for the         provide a tool for consumers and NGOs that would empower them
two baselines, respectively). A similar trend is shown for unclear         through more informed decision making in the digital environment.
language, which achieves F 1 equal to 0.460. The very low values
of the baselines, as well as the confusion matrices reported in Ta-        7    ACKNOWLEDGEMENTS
ble 2, clearly show the large imbalance between the positive and           We would like to thank all the members of the Project Claudette
negative classes: for example, only 3% of sentences are annotated          and our funding authorities at the European University Institute
as either <ad2> or <ad3>. This imbalance makes all the considered          Research Council, Bureau Européen des Unions de Consommateurs,
tasks particularly challenging. Therefore the F 1 values obtained in       and the Zeppelin Universität.
the range 0.42 – 0.55 can be considered as encouraging.
    In addition, we also want to note that the results are very het-
                                                                           REFERENCES
erogeneous across different documents. For example, for the <ad>            [1] Lisa M Austin, David Lie, Peter Yi Ping Sun, Robin Spillette, Michelle Wong, and
tag, for the Dropbox and Courchsurfing policies, the SVMHMM                     Mariana D’Angelo. Towards dynamic transparency: The apptrans (transparency
approach achieves F 1 equal to 0.86 and 0.89, respectively, whereas             for android applications) project. http://dx.doi.org/10.2139/ssrn.3203601, 2018.
                                                                            [2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine trans-
the Crowtangle policy is even perfectly predicted, with three posi-             lation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473,
tive clauses correctly predicted with no false positive. We plan to             2014.
deeply analyse and discuss further these more fine-grained results          [3] Yannis Bakos, Florencia Marotta-Wurgler, and David R Trossen. Does anyone
                                                                                read the fine print? consumer attention to standard-form contracts. The Journal
once our final corpus will be released.                                         of Legal Studies, 43(1):1–35, 2014.
GDPR Privacy Policies in CLAUDETTE                                                            ASAIL 2019, June 21, 2019, Montreal, QC, Canada

                                  Predicted                       Predicted                        Predicted
                                   0      1                        0       1                        0       1
                             0 5,893 196                      0 5,199 303                     0 4,969 592
                     True                             True                              True
                             1    88     98                   1 327 446                       1 297 417
Table 2: Micro-averaged confusion matrices for the three considered detection tasks: <ad> (left), <purp> (center), and <vag>
(right). The positive class (1) represents sentences of that specific tag, whereas the negative class (0) represents all the other
sentences. Large class imbalance is evident in all cases. Note that the precision and recall metrics obtained from these tables
slightly differ from the results in Table 1 because here we are reporting micro-average rather than macro-average.


 [4] Cesare Bartolini, Gabriele Lenzini, and Cristiana Santos. A legal validation
     of a formal representation of gdpr articles. In CEUR Workshop Proceedings:,
     http://ceur-ws.org/Vol-2309/10.pdf.
 [5] Shmuel I Becher and Uri Benoliel. Law in books and law in action: The readability
     of privacy policies and the gdpr. CONSUMER LAW & ECONOMICS, Klaus Mathis
     & Avishalom Tor, eds., Springer (forthcoming, 2019), 2019.
 [6] Giuseppe Contissa, Koen Docter, Francesca Lagioia, Marco Lippi, Hans-W Mick-
     litz, Przemysław Pałka, Giovanni Sartor, and Paolo Torroni. Claudette meets
     gdpr: Automating the evaluation of privacy policies using artificial intelligence.
     https://ssrn.com/abstract=3208596, 2018.
 [7] Elisa Costante, Yuanhao Sun, Milan Petković, and Jerry den Hartog. A machine
     learning solution to assess privacy policy completeness:(short paper). In Proceed-
     ings of the 2012 ACM workshop on Privacy in the electronic society, pages 91–96.
     ACM, 2012.
 [8] Mauro Dragoni, Serena Villata, Williams Rizzi, and Guido Governatori. Combin-
     ing nlp approaches for rule extraction from legal documents. In 1st Workshop on
     MIning and REasoning with Legal texts (MIREL 2016), 2016.
 [9] Artur d’Avila Garcez, Tarek R Besold, Luc De Raedt, Peter Földiak, Pascal Hit-
     zler, Thomas Icard, Kai-Uwe Kühnberger, Luis C Lamb, Risto Miikkulainen, and
     Daniel L Silver. Neural-symbolic learning and reasoning: contributions and
     challenges. In 2015 AAAI Spring Symposium Series, 2015.
[10] Mustafa Hashmi. A methodology for extracting legal norms from regulatory doc-
     uments. In 2015 IEEE 19th International Enterprise Distributed Object Computing
     Workshop, pages 41–50. IEEE, 2015.
[11] Herbert Jaeger. Artificial intelligence: Deep neural reasoning. Nature,
     538(7626):467, 2016.
[12] Marco Lippi, Przemysław Pałka, Giuseppe Contissa, Francesca Lagioia, Hans-
     Wolfgang Micklitz, Giovanni Sartor, and Paolo Torroni. Claudette: an automated
     detector of potentially unfair clauses in online terms of service. Artificial Intelli-
     gence and Law, pages 1–23, 2018.
[13] Aleecia M McDonald and Lorrie Faith Cranor. The cost of reading privacy policies.
     ISJLP, 4:543, 2008.
[14] George R Milne, Mary J Culnan, and Henry Greene. A longitudinal assessment of
     online privacy notice readability. Journal of Public Policy & Marketing, 25(2):238–
     249, 2006.
[15] Roberto Navigli and Simone Paolo Ponzetto. Babelnet: The automatic con-
     struction, evaluation and application of a wide-coverage multilingual semantic
     network. Artificial Intelligence, 193:217–250, 2012.
[16] Jonathan A Obar and Anne Oeldorf-Hirsch. The biggest lie on the internet:
     Ignoring the privacy policies and terms of service policies of social networking
     services. Information, Communication & Society, pages 1–20, 2018.
[17] Monica Palmirani, Michele Martoni, Arianna Rossi, Cesare Bartolini, and Livio
     Robaldo. Pronto: Privacy ontology for legal reasoning. In International Conference
     on Electronic Government and the Information Systems Perspective, pages 139–152.
     Springer, 2018.
[18] Joel R Reidenberg, Jaspreet Bhatia, Travis D Breaux, and Thomas B Norton.
     Ambiguity in privacy policies and the impact of regulation. The Journal of Legal
     Studies, 45(S2):S163–S190, 2016.
[19] Welderufael B Tesfay, Peter Hofmann, Toru Nakamura, Shinsaku Kiyomoto,
     and Jetzabel Serna. I read but don’t agree: Privacy policy benchmarking using
     machine learning and the eu gdpr. In Companion of the The Web Conference
     2018 on The Web Conference 2018, pages 163–166. International World Wide Web
     Conferences Steering Committee, 2018.
[20] Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, and Yasemin
     Altun. Support vector machine learning for interdependent and structured output
     spaces. In Proceedings of the twenty-first international conference on Machine
     learning, page 104. ACM, 2004.

</pre>