<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Natural Language Processing for Mobile App Privacy Compliance</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peter Story</string-name>
          <email>pstory@andrew.cmu.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Zimmeck</string-name>
          <email>szimmeck@wesleyan.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abhilasha Ravichander</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel Smullen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ziqi Wang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joel Reidenberg</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>N. Cameron Russell</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Norman Sadeh</string-name>
          <email>sadeh@cs.cmu.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics and Computer Science, Wesleyan University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science, Carnegie Mellon University</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>School of Law, Fordham University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Many Internet services collect a flurry of data from their
users. Privacy policies are intended to describe the
services’ privacy practices. However, due to their length
and complexity, reading privacy policies is a challenge
for end users, government regulators, and companies.
Natural language processing holds the promise of
helping address this challenge. Specifically, we focus on
comparing the practices described in privacy policies to
the practices performed by smartphone apps covered by
those policies. Government regulators are interested in
comparing apps to their privacy policies in order to
detect non-compliance with laws, and companies are
interested for the same reason.</p>
      <p>We frame the identification of privacy practice
statements in privacy policies as a classification problem,
which we address with a three-tiered approach: a
privacy practice statement is classified based on a data
type (e.g., location), party (i.e., first or third party), and
modality (i.e., whether a practice is explicitly described
as being performed or not performed). Privacy policies
omit discussion of many practices. With negative F1
scores ranging from 78% to 100%, the performance
results of this three-tiered classification methodology
suggests an improvement over the state-of-the-art.</p>
      <p>
        Our NLP analysis of privacy policies is an integral part
of our Mobile App Privacy System (MAPS), which
we used to analyze 1,035,853 free apps on the Google
Play Store. Potential compliance issues appeared to be
widespread, and those involving third parties were
particularly common.
In the absence of a general privacy law in the United States,
the Federal Trade Commission (FTC) is stepping into the
void and is creating a “common law of privacy”
        <xref ref-type="bibr" rid="ref28 ref35">(Solove and
Hartzog 2014)</xref>
        , which, to a large extent, is based on the
notice and choice paradigm. In this paradigm, users are
notified of a service’s privacy practices and are given a choice
to consent to those practices; if the user does not consent
to the service’s practices, they are not allowed to use the
service. Natural language privacy policies are intended to
notify users of privacy practices. Privacy policies are
complex and lengthy documents: they are often vague, internally
contradictory, offer little protection, or are silent on critical
points
        <xref ref-type="bibr" rid="ref12">(Marotta-Wurgler 2015)</xref>
        . While there are other forms
of privacy notification, such as mobile app permission
requests, these are not a replacement for privacy policies;
permission requests are generally insufficient to express what
users agree to with sufficient clarity. Machine-readable
privacy policies, such as P3P policies
        <xref ref-type="bibr" rid="ref2">(Cranor et al. 2002)</xref>
        ,
were suggested as replacements for natural language privacy
policies. However, none of these replacements have gained
widespread adoption. Thus, despite their shortcomings,
natural language privacy policies are the standard instrument
for effectuating notice and choice.
      </p>
      <p>
        The FTC engages in enforcement actions against
operators of apps that are non-compliant with their privacy
policies. Such non-compliance is considered an unfair or
deceptive act or practice in or affecting commerce in
violation of Section 5(a) of the FTC Act
        <xref ref-type="bibr" rid="ref4">(FTC 2014)</xref>
        . In order
to detect whether an app is potentially not compliant with
its privacy policy, we built the Mobile App Privacy System
(MAPS)
        <xref ref-type="bibr" rid="ref37">(Zimmeck et al. 2019)</xref>
        . MAPS is of interest to both
government regulators and companies. For government
regulators, MAPS can identify potential compliance issues,
reducing the cost of investigations. For companies, MAPS can
help them ensure that their privacy policies fully describe
their apps’ practices.
      </p>
      <p>Our focus in this article is on the natural language
analysis component of MAPS. We provide a detailed description
of the design and performance of our three-tiered classifier
design for identifying privacy practice statements (x 3). We
also provide a summary of findings from our recent scan of
over 1 million mobile apps on the Google Play Store (x 4).
In this large scale analysis of policies and apps, we found
widespread evidence of potential privacy compliance issues.
In particular, it appears that many apps’ privacy policies do
not sufficiently disclose identifier and location data access
practices performed by ad networks and other third parties.
the Android ecosystem.</p>
    </sec>
    <sec id="sec-2">
      <title>Automated Privacy Policy Text Analysis</title>
      <p>
        Privacy policies are the main instruments for disclosing and
describing apps’ or other software’s privacy practices.
However, the sheer volume of text an individual user would need
to read for the software he or she is using makes privacy
policies impractical for meaningfully conveying privacy
practices
        <xref ref-type="bibr" rid="ref13">(McDonald and Cranor 2008)</xref>
        . Some research has
focused on the structure of privacy policies. For example, the
problem of identifying policy sections relating to different
topics
        <xref ref-type="bibr" rid="ref17">(Ramanath et al. 2014; Liu et al. 2018)</xref>
        . Sathyendra
et al. classified advertising opt outs and similar consumer
choice options on websites
        <xref ref-type="bibr" rid="ref21">(Sathyendra et al. 2017)</xref>
        . Other
work has focused on building tools for users. Using a
simple naive Bayes classifier, Zimmeck and Bellovin provided
a browser extension for identifying common privacy
practices in policy text
        <xref ref-type="bibr" rid="ref28 ref35">(Zimmeck and Bellovin 2014)</xref>
        . Tesfay et
al. used a machine learning-based approach to identify text
addressing various GDPR provisions
        <xref ref-type="bibr" rid="ref30">(Tesfay et al. 2018)</xref>
        .
Harkous et al. developed PriBot, a chatbot for answering
questions about privacy policies
        <xref ref-type="bibr" rid="ref8">(Harkous et al. 2018)</xref>
        .
Different from those studies, however, our domain consists of
app policies instead of website policies.
      </p>
      <p>
        Various studies analyzed privacy policies in specific
domains. Cranor et al. evaluated financial institutions’ privacy
notices, which, in the US, nominally adhere to a model
privacy form released by federal agencies
        <xref ref-type="bibr" rid="ref3">(Cranor, Leon, and
Ur 2016)</xref>
        . They found clusters of institutions sharing
consumer data more often than others. They also found
institutions that do not follow the law, by disallowing consumers
to limit such sharing. Further, Zhuang et al. aimed to help
university researchers by automating enforcement of privacy
policies of Institutional Review Boards (Zhuang et al. 2018).
Auditing the disclosure of third party data collection
practices on 200,000 website privacy policies, Libert found that
the names of third parties are usually not explicitly disclosed
in website privacy policies
        <xref ref-type="bibr" rid="ref10">(Libert 2018)</xref>
        . We focus on
classifying first and third party access of contact, location, and
unique identifier data in smartphone apps’ privacy policies.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Android Privacy Studies</title>
      <p>
        We are extending the emerging domain of verifying
privacy practices of mobile apps against privacy requirements,
notably privacy policies. The closest related work to ours
analyzed the practices of 17,991 Android apps and
determined whether those with a privacy policy adhered to
it
        <xref ref-type="bibr" rid="ref36">(Zimmeck et al. 2017)</xref>
        . Several other studies have also
compared privacy policies to apps’ code
        <xref ref-type="bibr" rid="ref27 ref33">(Yu et al. 2016;
Slavin et al. 2016)</xref>
        . Going beyond this work, our system is
capable of large-scale analyses, which we demonstrate by
an analysis of 1,035,853 free apps on the Google Play Store.
Additionally, our analysis evaluates compliance issues at a
finer granularity. This advance is notable because the access
of coarse-grained location data (e.g., city) is far less
privacyinvasive than the access of fine-grained data (e.g., latitude
and longitude).
      </p>
      <p>
        We are motivated to study privacy in the Android
ecosystem due to numerous findings of potentially non-compliant
privacy practices. Story et al. studied the metadata of over a
million apps on the Play Store and found that many apps lack
privacy policies, even when developers describe their apps
as collecting users’ information
        <xref ref-type="bibr" rid="ref29">(Story, Zimmeck, and Sadeh
2018)</xref>
        . Analyzing close to a million Android web apps (i.e.,
Android apps that use a WebView), Mutchler et al. found
that 28% of those have at least one vulnerability, such as
data leakage through overridden URL loads
        <xref ref-type="bibr" rid="ref14">(Mutchler et al.
2015)</xref>
        . Differences in how apps are treating sensitive data
were used to identify malicious apps
        <xref ref-type="bibr" rid="ref1">(Avdiienko et al. 2015)</xref>
        .
More recently, AppCensus revealed that many Android apps
collect persistent device identifiers to track users, which
is not allowed for advertising purposes according to the
Google Play Developer Program Policy
        <xref ref-type="bibr" rid="ref30 ref6 ref7">(Reyes et al. 2018;
Google 2018b)</xref>
        . The observation of 512 popular Android
apps over eight years of version history by Ren et al. came
to the conclusion that an important factor for higher privacy
risks over time is the increased number of third party
domains receiving personally identifiable information
        <xref ref-type="bibr" rid="ref19">(Ren et
al. 2018)</xref>
        . In line with these observations, it is one of our
goals in this study to examine apps’ third party practices in
the Android ecosystem.
      </p>
      <p>3</p>
      <sec id="sec-3-1">
        <title>Analysis Techniques</title>
        <p>Our Mobile App Privacy System (MAPS) is comprised of
separate modules for the analysis of privacy policies and
apps. Our system compares the policy and app analyses in
order to identify potential compliance issues.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Privacy Practices</title>
      <p>Our system analyzes privacy practices. A privacy practice,
or simply practice, describes a behavior of an app that
can have privacy implications. Table 3 contains the list of
practices we consider in our model.1 We account for the
fact that disclosures found in privacy policies can vary in
specificity. For instance, for the access of location data our
model includes practices that pertain to location in general
(i.e., Location) as well as more specific practices that
explicitly identify the type of access (i.e., Location Cell
Tower, Location GPS, and Location WiFi). Our
model distinguishes between first party access, where data
is accessed by the code of the app itself, and third party
access, where data is accessed by advertising or other third
party libraries. Finally, our model also distinguishes between
a policy describing the performance of a practice (e.g., “We
access your location information.”) and the description that
a practice is not performed (e.g., “We do not access your
location information.”). When access is neither explicitly
described nor explicitly denied, neither modality classifier
flags the statement. Note that a given text fragment can refer
to multiple practices.</p>
      <p>1In preliminary tests we also considered city, ZIP code, postal
address, username, password, ad ID, address book, Bluetooth, IP
address (identifier and location), age, and gender practices.
However, we ultimately decided against further pursuing those as we
had insufficient data, unreliable annotations, or difficulty
identifying a corresponding API for the app analysis.</p>
    </sec>
    <sec id="sec-5">
      <title>Privacy Policy Analysis</title>
      <p>
        We characterize the detection of privacy practice
descriptions in privacy policy text as a classification problem.
Dataset We used the APP-350 corpus of 350 annotated
mobile app privacy policies to train and test our
classifiers
        <xref ref-type="bibr" rid="ref37">(Zimmeck et al. 2019)</xref>
        .2 The corpus’s policies were
selected from the most popular apps on the Google Play Store.
The policies were annotated by legal experts using a set of
privacy practice annotation labels. As they were annotating
the policies, the experts also identified the policy text
fragments corresponding to the practice annotation labels they
applied. All policies were comprehensively annotated.
Consequently, it is assumed that all unannotated portions of text
do not describe any of the practices and can be used as
training, validation, and test data to detect the absence of
statements on respective practices.
      </p>
      <p>
        We randomly split the annotated privacy policies into
training (n = 188), validation (n = 62), and test (n = 100)
sets. We used the training and validation sets to develop our
classifiers. The test set was set aside in order to prevent
overfitting. We did not calculate performance using the test set
until after we finished developing our classifiers.
Classification Task The goal of the classification task is
to assign annotation labels to policy segments, that is,
structurally related parts of policy text that loosely correspond to
paragraphs
        <xref ref-type="bibr" rid="ref32">(Wilson et al. 2016; Liu et al. 2018)</xref>
        . We focus
on segments instead of entire policies to make effective use
of the annotated data and to identify the specific policy text
locations that describe a certain practice.
      </p>
      <p>The infrequent occurrence of certain types of statements
makes the training of classifiers for some practices more
challenging. In particular, statements on third party
practices and statements explicitly denying that activities are
performed are rare. For example, our training set only
includes 7 segments saying that Location Cell Tower
information is not accessed by third parties. To address this
challenge, we decompose the classification problem into
three subproblems, that is, classifying (1) data types (e.g.,
Location), (2) parties (i.e., 1stParty or 3rdParty)3,
and (3) modalities (i.e., whether a practice is explicitly
described as being performed or not performed). For
example, the Location Cell Tower 3rdParty Not
Performed classification will be assigned to a segment
if the Location Cell Tower, 3rdParty, and Not
Performed classifiers all return a positive result for the
segment.</p>
      <p>The decomposition of the classification task allows for
an economic use of annotated data. If the subproblems
were tackled all at once, 68 monolithic classifiers would be
needed, most of which would have to be trained on fewer
than 100 positive training samples. By dividing the problem,
only 22 classifiers are needed (18 “data type”, 2 “party”,
2The dataset is available at https://data.usableprivacy.org.
3Note that the Single Sign On and Single Sign On:
Facebook practices do not use a party classifier, as all data is
exchanged between the app developer as first party and the SSO
provider as third party.
and 2 “modality” classifiers). These classifiers have a much
higher number of positive samples available for training, as
shown in Figure 1.</p>
      <p>
        Preprocessing As classifier performance depends on
adequate preprocessing of policy text as well as domain-specific
feature engineering, we normalize whitespace and
punctuation, remove non-ASCII characters, and lowercase all
policy text. Because stemming did not lead to performance
improvements, we are omitting it. In order to run our classifiers
on the most relevant set of features, we use an optional
preprocessing step of sentence filtering. Based on a grid search,
in cases where it improves classifier performance, we
remove a segment’s sentences from further processing if they
do not contain keywords related to the classifier in
question
        <xref ref-type="bibr" rid="ref36">(Zimmeck et al. 2017)</xref>
        . For example, the Location
classifier is not trained on sentences which only describe
cookies.
      </p>
      <p>
        Vectorizing Prior to training, we generate vector
representations of the segments. Specifically, we take the union
of a TF-IDF vector and a vector of manually crafted
features. Our TF-IDF vector is created using the
TfidfVectorizer
        <xref ref-type="bibr" rid="ref15 ref22 ref25">(scikit-learn developers 2016a)</xref>
        configured with English
stopwords (stop words=’english’), unigrams and
bigrams (ngram range=(1, 2)), and binary term counts
(binary=True). This configuration is similar to what was
used in prior work (Liu et al. 2018). Our vector of
manually crafted features consists of Boolean values indicating
the presence or absence of indicative strings we observed in
our training and validation data. For example, we include the
string not collect, because we realized that it would be
a strong indicator of the negative modality.
      </p>
      <p>
        Training Using scikit-learn, version 0.18.1
        <xref ref-type="bibr" rid="ref16">(Pedregosa
et al. 2011)</xref>
        we train binary classifiers for each data
type, party, and modality. For all but four classifiers
we use scikit-learn’s SVC implementation
        <xref ref-type="bibr" rid="ref22 ref25">(scikit-learn
developers 2016b)</xref>
        . We train those with a linear
kernel (kernel=’linear’), balanced class weights
(class weight=’balanced’), and a grid search with
five-fold cross-validation over the penalty (C=[0.1, 1,
10]) and gamma (gamma=[0.001, 0.01, 0.1])
parameters. We create rule-based classifiers for four data types
(Identifier, Identifier IMSI, Identifier
SIM Serial, and Identifier SSID BSSID) due to
the limited amount of data and their superior performance.
Our rule-based classifiers identify the presence or absence
of a data type based on indicative text strings.
      </p>
      <p>Table 1 shows the effects of our features and
preprocessing steps on the F1 scores of our non-rule-based classifiers.
The performance is calculated using our training and
validation sets. We made sentence filtering an optional part of
preprocessing because of the large detrimental effect it has on
some of our classifiers, as highlighted in Table 2. In general,
our results suggest that the chosen feature and
preprocessing steps improve classifier performance. However, ideally
they should be chosen on a per-classifier basis to avoid any
negative performance impact.</p>
      <p>
        Performance Analysis Table 3 shows the performance of
the classifiers on the privacy policies of the test set. We say a
policy describes a practice if at least one segment is flagged
by the corresponding data type, party, and positive
modality classifiers. Since our definition of potential compliance
issues does not depend on the negative modality classifier,
we do not include it in the table. Because detecting potential
compliance issues is dependent on detecting when practices
are not described in policies
        <xref ref-type="bibr" rid="ref36">(Zimmeck et al. 2017)</xref>
        ,
negative predictive value, specificity, and negative F1 scores are
of particular importance.
      </p>
      <p>
        In the closest related work
        <xref ref-type="bibr" rid="ref36">(Zimmeck et al. 2017)</xref>
        ,
classifiers for contact, identifier, and location data practices
covered multiple specific practices. Thus, a direct performance
comparison to our classifiers is not possible. However, with
negative F1 scores ranging from 78% to 100%, 23 of our
specific classifiers achieve better negative F1 scores than the
corresponding course-grained classifiers, and 3 performed
equally. These results demonstrate that our approach
constitutes an overall improvement over the state of the art. We
believe that decomposing the classification task into three
subproblems increases performance as it allows for a better
exploitation of training data compared to monolithic
classifiers.
      </p>
      <p>Our results reveal that generally + support is lower for
third party practices; that is, third party practices are often
not as extensively described in privacy policies as first party
practices. It should be further noted that higher counts of
support generally correlate with higher performance scores.
Intuitively, it is easier to classify a policy that does not
describe a practice, which makes up the majority of - support
instances.</p>
      <p>
        We reviewed the errors made by our classifiers and
identified several potential areas for improvement. First,
approaching the classification task at the level of segments,
as suggested by prior work
        <xref ref-type="bibr" rid="ref32">(Wilson et al. 2016; Liu et al.
2018)</xref>
        , can pose difficulties for our subproblem classifiers.
For example, if a segment describes a 1stParty
performing the Location practice, and a 3rdParty performing
Contact, our classifiers cannot distinguish which party
should be associated with which practice. Thus,
performing classifications at the level of sentences may yield
performance improvements. Second, the variety of technical
language in privacy policies poses challenges. For
example, we observed a false positive when “location” was used
in the context of “co-location facility”, and a false negative
when “clear gifs” was used to refer to web beacons. Such
errors might be prevented by training on more data or using
domain-specific word embeddings (Kumar et al. 2019).
Finally, a more sophisticated semantic representation might be
necessary in certain cases. For example, we observed
misclassification of a sentence which said that although the first
party does not perform a practice, third parties do perform
the practice.
      </p>
    </sec>
    <sec id="sec-6">
      <title>App Analysis</title>
      <p>MAPS detects apps’ privacy practices at app store-wide
scale. Detecting which practices an app performs relies on
static code analysis, a relatively resource-efficient technique
Policy Classification
Contact 1stParty
Contact 3rdParty
Contact Email Address 1stParty
Contact Email Address 3rdParty
Contact Phone Number 1stParty
Contact Phone Number 3rdParty
Identifier 1stParty
Identifier 3rdParty
Identifier Cookie 1stParty
Identifier Cookie 3rdParty
Identifier Device ID 1stParty
Identifier Device ID 3rdParty
Identifier IMEI 1stParty
Identifier IMEI 3rdParty
Identifier IMSI 1stParty
Identifier IMSI 3rdParty
Identifier MAC 1stParty
Identifier MAC 3rdParty
Identifier Mobile Carrier 1stParty
Identifier Mobile Carrier 3rdParty
Identifier SIM Serial 1stParty
Identifier SIM Serial 3rdParty
Identifier SSID BSSID 1stParty
Identifier SSID BSSID 3rdParty
Location 1stParty
Location 3rdParty
Location Cell Tower 1stParty
Location Cell Tower 3rdParty
Location GPS 1stParty
Location GPS 3rdParty
Location WiFi 1stParty
Location WiFi 3rdParty
Single Sign On
Single Sign On: Facebook
compared to dynamic code analysis. Our system operates
on four app resources: Android APIs, permissions, strings,
and class structure. If a sensitive Android API is called, the
app has the required permissions to make the call, and
required string parameters (e.g., the GPS PROVIDER string)
are passed in, the system will flag the existence of a first
or third party practice depending on the package name of
the class from which the call originated. We assume a threat
model which considers data as compromised from the
moment a privacy-sensitive API appears to be called (Neisse et
al. 2016).</p>
      <p>After downloading an app from the Google Play Store our
system decompiles it into Smali bytecode using Apktool.4
It then searches through the bytecode, identifying APIs
indicative of a privacy practice being performed. Generally, if
a practice occurs in a package corresponding to the app’s
package ID, the practice is considered a first party practice;
otherwise, it is considered a third party practice. In order to
evaluate the performance of our system’s app analysis, we
compare its results against ground truth obtained by a
man4Apktool, https://ibotpeaches.github.io/Apktool/, accessed:
March 18, 2019.
ual dynamic analysis.</p>
    </sec>
    <sec id="sec-7">
      <title>Compliance Analysis</title>
      <p>Our system combines policy and app analysis results to
identify potential compliance issues. We define a potential
compliance issue to mean that an app is performing a practice
(e.g., Location GPS 1stParty) while its associated
policy does not disclose it either generally (e.g., “Our app
accesses your location data.”) or specifically (e.g., “Our app
accesses your GPS data.”). We chose this definition because
we observed that policies generally either disclose that a
practice is performed or omit discussion of the practice—
statements denying practices are rare.</p>
      <p>Table 4 shows our system’s identification of potential
compliance issues and its performance. For the 26 practices
for which positive ground truth instances were present, we
observe a mean F1 score of 71%. Many potential compliance
issues relate to the access of identifiers. However, the three
third party location practices Cell Tower, GPS, and WiFi
account for 15, 10, and 12 respective findings as well. Notably,
all first party practices exhibit a lower number of potential
compliance issues than their third party counterparts.
Potential Compliance Issue
Contact Email Address 1stParty
Contact Email Address 3rdParty
Contact Phone Number 1stParty
Contact Phone Number 3rdParty
Identifier Cookie 1stParty
Identifier Cookie 3rdParty
Identifier Device ID 1stParty
Identifier Device ID 3rdParty
Identifier IMEI 1stParty
Identifier IMEI 3rdParty
Identifier IMSI 1stParty
Identifier IMSI 3rdParty
Identifier MAC 1stParty
Identifier MAC 3rdParty
Identifier Mobile Carrier 1stParty
Identifier Mobile Carrier 3rdParty
Identifier SIM Serial 1stParty
Identifier SIM Serial 3rdParty
Identifier SSID BSSID 1stParty
Identifier SSID BSSID 3rdParty
Location Cell Tower 1stParty
Location Cell Tower 3rdParty
Location GPS 1stParty
Location GPS 3rdParty
Location WiFi 1stParty
Location WiFi 3rdParty
Single Sign On: Facebook</p>
      <sec id="sec-7-1">
        <title>Privacy Compliance in the Play Store</title>
        <p>
          Our large-scale analysis of free apps in the Google Play
Store provides us with a rich dataset for evaluating the state
of privacy in a substantial part of the Android ecosystem.
Here, we summarize our findings, with a focus on our
privacy policy analysis. For a complete description of our
findings, please see
          <xref ref-type="bibr" rid="ref37">(Zimmeck et al. 2019)</xref>
          .
        </p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Analyses at Scale</title>
      <p>
        Designing and implementing a robust system to identify
potential compliance issues for large app populations presents
challenges of scale. We address those with a pipeline of
distributed tasks implemented in a containerized software
stack. We performed our Play Store analysis from April
6 to May 15, 2018. Out of 1,049,790 retrieved free apps,
1,035,853 (98.67%) were analyzed successfully. Of the
apps which were not analyzed successfully, 1.03% failed
to download, 0.21% failed in the static analysis, 0.08%
failed in the policy analysis, and 0.01% failed during our
re-analysis.5
35.3% of the apps we analyzed had privacy policies.6 For
apps with privacy policies, Figure 2 depicts the occurrence
of policy statements relating to the practices we examine.
It can be observed that most practices are described only
infrequently; that is, a policy does not mention it at least
once. Further, the statements that are present typically affirm
that a practice is occurring. This finding reveals that users
seem to be given little assurance of potentially
objectionable practices not being performed (e.g., disclosing users’
phone numbers to third parties). Silence about privacy
practices in privacy policies is problematic because there are no
clear statutory default rules of what the privacy relationship
between a user and a service should be, in the absence of
explicit statements in the policy
        <xref ref-type="bibr" rid="ref12">(Marotta-Wurgler 2015)</xref>
        .
      </p>
    </sec>
    <sec id="sec-9">
      <title>Prevalence of Potential Compliance Issues</title>
      <p>Our system detects potential compliance issues by
comparing the privacy policy analysis to the static analysis. A
potential compliance issue is detected when an app performs a
practice that is not described in the app’s privacy policy (if
the app even has a privacy policy). Note that when our
system finds multiple privacy policies for a given app, it pools
5After the completion of the Play Store analysis we noticed a
bug in our static analysis code. As a result, we re-performed the
static analyses and re-calculated all statistics. 135 additional
analyses failed, yielding a final total of 1,035,853 successfully analyzed
apps.</p>
      <p>6This only counts English-language privacy policies: our
system does not identify policies in other languages.
the practice descriptions discovered across all those
policies. This pooling has the effect of making our results rather
conservative. One policy may disclose a particular practice
while another policy discloses another practice, and together
they may cover all practices performed by the associated
app. Overall, the average number of potential compliance
issues per app is 2.89 and the median is 3.</p>
      <p>
        Figure 3 shows the percent of apps that perform
various practices and the respective percent of apps with
potential compliance issues. The figure demonstrates that in
many cases the performance of a practice is strongly
associated with the occurrence of a potential compliance issue:
if a practice is performed, there is a good chance a potential
compliance issue exists as well. This result suggests a broad
level of potential non-compliance. Identifier-related
potential compliance issues are the most common. Three different
types of identifiers make up most potential compliance
issues: cookies, device IDs, and mobile carriers. In particular,
the use of device IDs may constitute a misuse for purposes
of ad tracking
        <xref ref-type="bibr" rid="ref30 ref6 ref7">(Google 2018b)</xref>
        . In addition, there are also
elevated levels of location-related potential compliance issues.
15.3% of apps perform at least one location-related practice,
and 12.1% of apps have at least one location-related
potential compliance issue.
      </p>
      <p>
        For all data types, third party practices are more common
than first party practices and so are third party-related
potential compliance issues. One reason for the prevalence of
potential compliance issues for third party practices could be
that app developers are unaware of the functionality of the
libraries they integrate. Perhaps they also hold the mistaken
belief that it is not their responsibility but the
responsibility of the library developers to disclose to users the
practices the libraries are performing. Some libraries’ terms of
services—for example, the Google Analytics Terms of
Service
        <xref ref-type="bibr" rid="ref6 ref7">(Google 2018a)</xref>
        —obligate the developer integrating it
to explicitly disclose the integration in the developer’s
privacy policy. However, this type of information transfer from
the third party via the developer to the user may be
susceptible to omissions and mistakes.
      </p>
      <p>5</p>
      <sec id="sec-9-1">
        <title>Conclusions</title>
        <p>Natural language privacy policies are intended to
communicate how a service collects, shares, uses, and stores user
data. However, as they are generally lengthy and difficult to
read, the average user often struggles to understand which
privacy practices apply. Leveraging natural language
processing techniques in the policy domain holds the promise
to extract policy content and convert it to a format that is
easier to comprehend. In this study, we reported on our
development of a three-tiered classification model to classify
a variety of privacy practices and their omissions in policy
text. Compared to a monolithic classifier for a privacy
practice, using data type, party, and modality classifiers allows
for economic use of training and test data—which is
oftentimes expensive to obtain—as well as good performance.</p>
        <p>
          The classification model we are proposing here is an
integral part of the Mobile App Privacy System (MAPS)
          <xref ref-type="bibr" rid="ref37">(Zimmeck et al. 2019)</xref>
          . Many mobile apps are reliant on the
collection and use of a wide range of data for purposes of their
functionality and monetization. MAPS presents one use case
for implementing the suggested privacy policy classification
model. MAPS pairs our policy analysis with static
analysis of mobile apps to identify possible discrepancies
between the two and flag potential compliance issues. Our
results from analyzing 1,035,853 free apps on the Google Play
Store suggest that potential compliance issues are rather
common, particularly, when it comes to the disclosure of
third party practices. These and similar results may be of
interest to app developers, app stores, privacy activists, and
regulators.
        </p>
        <p>Recently enacted laws, such as the General Data
Protection Directive, impose new obligations and provide for
substantial penalties for failing to properly disclose privacy
practices. We believe that the natural language analysis of
privacy policies, in tandem with mobile app analysis, for
example, has the potential to improve privacy transparency and
enhance privacy levels overall.</p>
      </sec>
      <sec id="sec-9-2">
        <title>Acknowledgments</title>
        <p>We would like to thank Jongkyu Talan Baek, Sushain
Cherivirala, Roger Iyengar, Pingshan Li, Shaoyan Sam
Li, Kanthashree Sathyendra, Florian Schaub, Xi Sun, and
Shomir Wilson for their help with this research. This study
was supported in part by the NSF Frontier grant on Usable
Privacy Policies (CNS-1330596, CNS-1330141, and
CNS1330214) and a DARPA Brandeis grant on Personalized
Privacy Assistants (FA8750-15-2-0277). The US Government
is authorized to reproduce and distribute reprints for
Governmental purposes not withstanding any copyright
notation. The views and conclusions contained herein are those
of the authors and should not be interpreted as necessarily
representing the official policies or endorsements, either
expressed or implied, of the NSF, DARPA, or the US
Government. This work used the Extreme Science and Engineering
Discovery Environment (XSEDE), which is supported by
National Science Foundation grant number ACI-1548562.
The authors acknowledge the Texas Advanced Computing
Center (TACC) at The University of Texas at Austin for
providing high performance computing resources that have
contributed to the research results reported within this paper.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Avdiienko</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Kuznetsov</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gorla</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zeller</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Arzt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Rasthofer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Bodden</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Mining apps for abnormal usage of sensitive data</article-title>
          .
          <source>In 37th IEEE/ACM International Conference on Software Engineering, ICSE</source>
          <year>2015</year>
          , Florence, Italy, May
          <volume>16</volume>
          -24,
          <year>2015</year>
          , Volume
          <volume>1</volume>
          ,
          <fpage>426</fpage>
          -
          <lpage>436</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Cranor</surname>
            ,
            <given-names>L. F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Langheinrich</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Marchiori</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; PreslerMarshall, M.; and
          <string-name>
            <surname>Reagle</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <year>2002</year>
          .
          <article-title>The Platform for Privacy Preferences 1.0</article-title>
          (
          <issue>P3P1</issue>
          .
          <article-title>0) specification</article-title>
          . World Wide Web Consortium,
          <source>Recommendation REC-P3P-20020416.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Cranor</surname>
            ,
            <given-names>L. F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Leon</surname>
            ,
            <given-names>P. G.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Ur</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>A large-scale evaluation of U.S. financial institutions standardized privacy notices</article-title>
          .
          <source>ACM Trans. Web</source>
          <volume>10</volume>
          (
          <issue>3</issue>
          ):
          <volume>17</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          :
          <fpage>33</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>FTC.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>Complaint Goldenshores Technologies</article-title>
          . https://www.ftc.gov/system/files/ documents/cases/140409goldenshorescmpt.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          pdf.
          <source>accessed: March</source>
          <volume>18</volume>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Google.</surname>
          </string-name>
          2018a.
          <article-title>Google analytics terms of service</article-title>
          . https://www.google.com/analytics/ terms/us.html.
          <source>accessed: March</source>
          <volume>18</volume>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Google. 2018b. Play</given-names>
            <surname>Console</surname>
          </string-name>
          <article-title>Help</article-title>
          . https:// support.google.com/googleplay/androiddeveloper/answer/6048248?hl=en.
          <source>accessed: March</source>
          <volume>18</volume>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Harkous</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fawaz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lebret</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schaub</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shin</surname>
            ,
            <given-names>K. G.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Aberer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Polisis: Automated analysis and presentation of privacy policies using deep learning</article-title>
          .
          <source>In USENIX Security '18.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          2019.
          <article-title>Quantifying the effect of in-domain distributed word representations: A study of privacy policies</article-title>
          .
          <source>AAAI Spring Symposium on Privacy-Enhancing Artificial Intelligence and Language Technologies.</source>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Libert</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>An automated approach to auditing disclosure of third-party data collection in website privacy policies</article-title>
          .
          <source>In WWW '18.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          2018.
          <article-title>Towards Automatic Classification of Privacy Policy Text</article-title>
          .
          <source>Technical Report CMU-ISR-17-118R and CMU-LTI17-010</source>
          , School of Computer Science Carnegie Mellon University, Pittsburgh, PA.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Marotta-Wurgler</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Does “notice and choice” disclosure regulation work? An empirical study of privacy policies</article-title>
          .
          <source>accessed: March</source>
          <volume>18</volume>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>A. M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Cranor</surname>
            ,
            <given-names>L. F.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>The cost of reading privacy policies. I/S: A Journal of Law and Policy for the</article-title>
          <source>Information Society</source>
          <volume>4</volume>
          (
          <issue>3</issue>
          ):
          <fpage>540</fpage>
          -
          <lpage>565</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Mutchler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; Doupe´, A.; Mitchell, J.; Kruegel,
          <string-name>
            <surname>C.</surname>
          </string-name>
          ; and Vigna,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <year>2015</year>
          .
          <article-title>A large-scale study of mobile web app security</article-title>
          .
          <source>In MoST '15.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          2016.
          <article-title>A privacy enforcing framework for android applications</article-title>
          .
          <source>Computers &amp; Security</source>
          <volume>62</volume>
          :
          <fpage>257</fpage>
          -
          <lpage>277</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; Weiss, R.;
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ; Brucher,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Perrot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ; and
            <surname>Duchesnay</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          <year>2011</year>
          .
          <article-title>Scikitlearn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research.</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Ramanath</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Liu,
          <string-name>
            <given-names>F.</given-names>
            ;
            <surname>Sadeh</surname>
          </string-name>
          , N.; and
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N. A.</given-names>
          </string-name>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <article-title>Unsupervised alignment of privacy policies using hidden markov models</article-title>
          .
          <source>In ACL '14.</source>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Lindorfer,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Dubois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ;
            <surname>Rao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Choffnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ; and
            <surname>Vallina-Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Bug fixes</article-title>
          , improvements, ...
          <article-title>and privacy leaks - a longitudinal study of PII leaks across android app versions</article-title>
          .
          <source>In NDSS '18.</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <year>2018</year>
          . “
          <article-title>Won't somebody think of the children?” Examining COPPA compliance at scale</article-title>
          .
          <source>In PETS '18.</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Sathyendra</surname>
            ,
            <given-names>K. M.</given-names>
          </string-name>
          ; Wilson,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Schaub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ;
            <surname>Zimmeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ; and
            <surname>Sadeh</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Identifying the provision of choices in privacy policy text</article-title>
          .
          <source>In EMNLP '17.</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <article-title>scikit-learn developers</article-title>
          .
          <year>2016a</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          sklearn.feature extraction.text.tfidfvectorizer. http: //scikit-learn.
          <source>org/0</source>
          .18/modules/ generated/sklearn.feature_extraction.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          text.TfidfVectorizer.html.
          <source>Accessed: March</source>
          <volume>18</volume>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <article-title>scikit-learn developers</article-title>
          .
          <source>2016b. sklearn.svm.svc.</source>
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          http://scikit-learn.
          <source>org/0</source>
          .18/modules/ generated/sklearn.svm.
          <source>SVC.html. Accessed: March</source>
          <volume>18</volume>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <surname>Slavin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hosseini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hester</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ; Krishnan,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ; Bhatia,
          <string-name>
            <surname>J.</surname>
          </string-name>
          ; Breaux,
          <string-name>
            <given-names>T.</given-names>
            ; and
            <surname>Niu</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Toward a framework for detecting privacy policy violation in android application code</article-title>
          .
          <source>In ICSE '16.</source>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Solove</surname>
            ,
            <given-names>D. J.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Hartzog</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>The FTC and the new common law of privacy</article-title>
          .
          <source>Columbia Law Review</source>
          <volume>114</volume>
          :
          <fpage>583</fpage>
          -
          <lpage>676</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <surname>Story</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zimmeck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Sadeh</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Which apps have privacy policies?</article-title>
          <source>In APF '18.</source>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>Tesfay</surname>
          </string-name>
          , W. B.;
          <string-name>
            <surname>Hofmann</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; Nakamura,
          <string-name>
            <given-names>T.</given-names>
            ;
            <surname>Kiyomoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ; and
            <surname>Serna</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>I read but don't agree: Privacy policy benchmarking using machine learning and the EU GDPR</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <surname>In</surname>
            <given-names>WWW</given-names>
          </string-name>
          '
          <volume>18</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <surname>Wilson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schaub</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dara</surname>
            ,
            <given-names>A. A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cherivirala</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Leon,
          <string-name>
            <surname>P. G.</surname>
          </string-name>
          ; Andersen,
          <string-name>
            <given-names>M. S.</given-names>
            ;
            <surname>Zimmeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Sathyendra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. M.</given-names>
            ;
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. C.</given-names>
            ;
            <surname>Norton</surname>
          </string-name>
          , T. B.;
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Reidenberg</surname>
          </string-name>
          , J.; and
          <string-name>
            <surname>Sadeh</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2016</year>
          .
          <article-title>The creation and analysis of a website privacy policy corpus</article-title>
          .
          <source>In ACL '16.</source>
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ; and Zhang, T.
          <year>2016</year>
          .
          <article-title>Can we trust the privacy policies of android apps?</article-title>
          <source>In DSN '16.</source>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          2018.
          <article-title>Sensibility testbed: Automated IRB policy enforcement in mobile research apps</article-title>
          . In HotMobile '
          <volume>18</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <surname>Zimmeck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Bellovin</surname>
            ,
            <given-names>S. M.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Privee: An architecture for automatically analyzing web privacy policies</article-title>
          .
          <source>In USENIX Security '14.</source>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <surname>Zimmeck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Iyengar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Liu,
          <string-name>
            <given-names>B.</given-names>
            ;
            <surname>Schaub</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          ; Wilson,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ; Sadeh,
          <string-name>
            <given-names>N.</given-names>
            ;
            <surname>Bellovin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            ; and
            <surname>Reidenberg</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2017</year>
          .
          <article-title>Automated analysis of privacy requirements for mobile apps</article-title>
          .
          <source>In NDSS '17.</source>
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>Zimmeck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Story,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Smullen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ;
            <surname>Ravichander</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ;
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            ;
            <surname>Reidenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ;
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. C.</given-names>
            ; and
            <surname>Sadeh</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          <string-name>
            <surname>MAPS</surname>
          </string-name>
          :
          <article-title>Scaling privacy compliance analysis to a million apps</article-title>
          .
          <source>In PETS '19.</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>