<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Quantifying the Effect of In-Domain Distributed Word Representations: A Study of Privacy Policies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vinayshekhar Bannihatti Kumar</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abhilasha Ravichander</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Story</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Norman Sadeh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute for Software Research, Carnegie Mellon University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Language Technologies Institute, Carnegie Mellon University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Privacy policies are documents that describe what data is collected by a website or an app and how that data is handled. Privacy policies are often long and difficult to understand. Recently people have started to turn to Natural Language Processing (NLP) to automatically extract statements from the text of these policies. This article reports on a study to evaluate the benefits of using word embeddings in this endeavor. Specifically, we use 150,000 privacy policies to build word vectors in an unsupervised manner. This includes evaluating the benefits of privacy specific word embeddings. Evaluation is conducted on the OPP-115 corpus of privacy policy annotations. By building privacy-specific embeddings we hope to accelerate research at the intersection of privacy policies and language technologies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Privacy policies tend to be long and complex documents that
are often difficult to understand. They are the primary
mechanism to inform users about the collection and handling of
their data. Over the past several years, there has been a
growing interest in automating the understanding of privacy
policies using Machine Learning and Natural Language
Processing techniques (e.g.,
        <xref ref-type="bibr" rid="ref13 ref14 ref15 ref17 ref7 ref8 ref8">(Sadeh et al., December 2013; Wilson
et al., 2016; Sathyendra et al., 2017b; Liu et al., 2016b)</xref>
        ).
These techniques rely on a small amount of supervised
training data. Unfortunately, as is the case in many other
domains, supervised training data requires expert annotation
and is expensive to obtain, limiting the size of available
corpora. On the other hand, because privacy policies are
ubiquitous, unlabeled privacy policy data is plentiful and easy to
obtain. In this work, we examine leveraging this unlabeled
data to train word embeddings for the privacy domain. We
demonstrate that these word embeddings yield meaningful
improvements in performance on the popular OPP-115
        <xref ref-type="bibr" rid="ref17 ref8">(Wilson et al., 2016)</xref>
        benchmark for segment labeling in policies.
Performance improvements are observed across a diverse set
of data practices found in the OPP-115 corpus. For
evaluation we use the same test set which was provided by
        <xref ref-type="bibr" rid="ref18">Wilson
et al. (2018)</xref>
        .
      </p>
      <p>The contributions of our paper can be summarized as
follows.
1. We investigate the utility of in-domain word embeddings,
and find that they indeed help over generic word
embeddings to obtain better segment-labeling performance
in the privacy domain. We observe meaningful
improvements on the OPP-115 corpus. We measure an average
macro F1 of 0.803 on the test set.
2. We empirically investigate the relationship between
dimensionality of the word embeddings and segment
labeling performance. We look at the performance across a
diverse set of important data-practice categories when the
dimensionality of the word embeddings changes.
3. We investigate the number of policies that are required
to train expressive word embeddings. We have access to
over 300,000 privacy policies which we scraped from the
Google Play Store. We investigate the amount of
policies which are needed to get good results on the OPP-115
dataset. Henceforth, we call this the policy corpus. It is to
be noted that we did not need all 300,000 privacy policies
as the performance saturates fairly quickly. Incidentally,
we would have needed significantly more computational
resources to train word embeddings on all 300,000
policies. This is not to say that word embeddings trained on
all 300,000 policies might not have helped in the context
of different, possibly more subtle data practices than those
considered in the OPP-115 dataset.
4. We present a qualitative analysis of representations of
words in vector space in the privacy domain. We want to
understand how in-domain word embeddings differ from
the generic word embeddings which are domain
independent.</p>
      <p>2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <sec id="sec-2-1">
        <title>Word embeddings in NLP</title>
        <p>
          Word embeddings have a rich history in the Natural
Language Processing community, and have proven to be
useful in a wide variety of tasks. Work on neural models for
formulating word representations was reported as early as
2003 by
          <xref ref-type="bibr" rid="ref1">Bengio et al. (2003)</xref>
          . They used a simple feed
forward neural network to capture the language model, thereby
building word embeddings.
          <xref ref-type="bibr" rid="ref10">Mikolov et al. (2013)</xref>
          introduced
Word2Vec, a fast and scalabale way of computing word
vectors on large corpora by using sub-sampling and
negative sampling techniques. Pennington, Socher, and Manning
(2014) introduced GloVe embeddings which captured both
the local context and the global context of sentences.
FastText
          <xref ref-type="bibr" rid="ref2">(Bojanowski et al., 2016)</xref>
          is a way to train a model to
use the sub-word information instead of considering words
as discrete tokens. It thus captures local context and is
robust to perturbations. Sub-word information helps us capture
better local context, and thus we decide to use FastText for
training our word embeddings.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Word embeddings and privacy policies</title>
        <p>
          <xref ref-type="bibr" rid="ref5 ref6">Harkous et al. (2018</xref>
          a) have previously used FastText on
privacy policies to get word embeddings. However, these
embeddings are not publicly available, nor were ablation
studies performed to determine their utility. In addition, they
were used for question category identification, rather than
segment labeling as in
          <xref ref-type="bibr" rid="ref17 ref8">(Wilson et al., 2016)</xref>
          .
          <xref ref-type="bibr" rid="ref17">Wilson et al.
(2016)</xref>
          have done a thorough job in creating the OPP-115
dataset and also creating good baselines for the same. In this
work, we fill this gap by training word embeddings from
150,000 privacy policies and demonstrating their utility over
generic word embeddings. We show that our neural models
with the help of our privacy embeddings outperform
previous benchmarks on the OPP-115 dataset.
          <xref ref-type="bibr" rid="ref9">Liu et al. (2018)</xref>
          have tried using pre-trained word embeddings with deep
models to classify the OPP-115 data. However, on classes
with lower numbers of positive examples, their performance
seems to drop. In order to address this issue and to be more
consistent with common machine learning practices, we do
two things. First, we train our in-domain word embeddings
to check if we will get better results on these smaller
population classes. Second, we split the data into 5 folds. We
use 4 folds of data to train our machine learning model. We
call this the train set. We used the other fold for fine tuning
our model. We call this the dev set. We choose the
hyperparameters of our model using the dev set. We evaluate our
model on the test set which was mentioned earlier. This way
of separating the dataset helps ensure that we are not
fitting your model to get good performance on the test set and
helps us to be more robust to unseen data. We report our
average F1 on all the folds of the dev set along with standard
deviation. We also report our average F1 on the test set for
our best performing model. Sathyendra et al. (2017a) used
Word2Vec and trained privacy specific embeddings for an
extrinsic task of question-answering. However, the authors
do not report evaluating the performance of word
embeddings. In contrast, our goal in the work reported herein is to
shed light on and quantify the benefits of word embeddings
in the privacy policy domain.
        </p>
        <p>3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Overall Approach</title>
      <p>
        Fast text was introduced by
        <xref ref-type="bibr" rid="ref2">Bojanowski et al. (2016)</xref>
        . We
leverage this algorithm to learn word embeddings for the
words found in privacy policies scraped from the Google
Play Store. We then do transfer learning by using these
embeddings in the context of website privacy policies. It should
be noted that these two domains have some meaningful
overlap, with many privacy policies written to jointly address
data practices associated with both mobile apps and
websites operated by the same entity.
      </p>
      <sec id="sec-3-1">
        <title>Embeddings Store</title>
        <p>We store our learned embeddings for future access. We
created separate word embedding stores for the different kinds
of embeddings we built. We experimented with the
dimensionality and the amount of data required to train good word
embeddings. These results are discussed in the ”Results and
discussions” section.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Text to Word Vector Converter</title>
        <p>
          We have to convert the text representation of the OPP-115
corpus to real valued representation so that our neural
models can start using them. But, if we have a large vocabulary it
becomes difficult for networks to fit the data. So we restrict
our vocabulary size to the top 50,000 words in the
vocabulary. We also preprocess the data to convert all the words
to lower case. We use all the pre-processing techniques used
by
          <xref ref-type="bibr" rid="ref9">Liu et al. (2018)</xref>
          to get privacy policy segments.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Neural Models</title>
        <p>
          We try several neural architecture models. Following
          <xref ref-type="bibr" rid="ref4">Faruqui et al. (2014)</xref>
          and
          <xref ref-type="bibr" rid="ref10">Mikolov et al. (2013)</xref>
          we decided
to use a simple feed-forward network by averaging these
embeddings. We take the word vectors corresponding to
every word in the segment and average them to form the
input layer to our Multi Layer Perceptron. Not surprisingly,
this model generally gives very good results on the OPP-115
dataset as it acts as a bag of words model. This also shows
that our word embeddings are of high quality. We also
report on performance with deep convolutional models in the
experiment section.
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Statistics Generator</title>
        <p>
          OPP-115
          <xref ref-type="bibr" rid="ref17 ref8">(Wilson et al., 2016)</xref>
          is a small dataset and it is
easy to get varying F1 scores on different runs of the
neural network. The F1 score also varies when the fold of the
data being tested on changes. In this paper, we want to set a
standard when using this corpus. We take the view that one
should report the mean F1 and standard deviation for each of
the folds tested on. We used 5-fold validation. So this
component of our system generated the statistics after training
the network.
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>Embeddings Visualizer</title>
        <p>
          It is a well documented fact that words which appear in
similar contexts must be close to each other in the vector space
          <xref ref-type="bibr" rid="ref10">(Mikolov et al., 2013)</xref>
          . Our word embeddings are of 300
dimensions. In order to visualize these embeddings we have to
convert them to 2D vectors and plot them. We use Principal
Component Analysis (PCA) in order to do this. We
examine the word vectors on a 2D plot. We report on and
discuss differences between the plots of generic embeddings
like GloVe
          <xref ref-type="bibr" rid="ref12">(Pennington, Socher, and Manning, 2014)</xref>
          and
our in-domain word embeddings.
        </p>
        <p>4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Distribution of data practices in the</title>
    </sec>
    <sec id="sec-5">
      <title>OPP-115 dataset</title>
      <p>
        The OPP-115 data-set is a multi-label classification
problem which has skewed counts on some of the classes. For a
detailed understanding of these privacy policies we ask the
reader to refer to:
        <xref ref-type="bibr" rid="ref17">Wilson et al. (2016)</xref>
        . We provide a
distribution of the classes in Figure 2.
      </p>
      <p>
        Some of the classes in this dataset have fairly low counts
of positive examples, making it difficult for machine
learning models to learn the classification. For instance, in
previous publications, the authors had found it hard to identify
segments associated with the ”Data retention” class, as this
class only has a few positive instances in the corpus
        <xref ref-type="bibr" rid="ref17 ref8 ref9">(Wilson
et al., 2016; Liu et al., 2018)</xref>
        . In contrast, our results suggest
that we are able to get significantly better F1 scores on these
types of classes.
We performed several experiments before arriving at our
final word embeddings. Each of our experiment was run on
five folds of data. We report the mean and standard
deviation of the F1 scores for the validation set. We also report
the macro F1 score.
      </p>
      <sec id="sec-5-1">
        <title>Logistic Regression</title>
        <p>We used our word embeddings, averaged them and used that
as the input to the logistic regression model. The results of
the model are shown in Table 1.</p>
        <p>
          We see that our baseline model has some improvement
in performance over the results reported by
          <xref ref-type="bibr" rid="ref17">Wilson et al.
(2016)</xref>
          . These results suggest that our word embeddings in
the privacy domain yield important performance
improvements. We get a macro F1 of 0.76 compared to macro F1 of
0.67 for the results reported in
          <xref ref-type="bibr" rid="ref17">Wilson et al. (2016)</xref>
          .
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>Feed-forward network</title>
        <p>Feed-forward neural networks are powerful machine
learning models which perform really well on classification tasks.
We used a feed-forward neural network as shown in
Figure 3. The input layer to this network was the averaged
embeddings of all the words in the sentence.</p>
        <p>More formally, given a list of words as a segment, we
perform the following:
wordVectors = getW ordV ectors(w1; w2; w3:::wn)
averageEmbeddings = Pi2n vectori / n</p>
        <p>Here w1; w2:::wn are the words of a sentence. We convert
these text of words into their vector representation to get
vectori . We then average these vectors to get
averageEmbeddings .</p>
        <p>
          These average embeddings were then used as the input
layer to a two layer network. Each layer of the network
had 64 Rectified Linear Unit (Relu)
          <xref ref-type="bibr" rid="ref11">(Nair and Hinton, 2010)</xref>
          units. It is very easy to overfit on the training data as the size
of the OPP-115 corpus is not too big. In order to circumvent
this issue, we use dropouts
          <xref ref-type="bibr" rid="ref16">(Srivastava et al., 2014)</xref>
          , with a
drop rate of 0.3 in the first layer and 0.2 in the second layer.
In the final layer we used a softmax to predict if the segment
belonged to a certain class or not. Henceforth, we call this
model CBOW(Continuous Bag Of Words). The network
architecture is shown in Figure 3:
        </p>
        <p>
          This approach of text classification is generally
considered to be a strong baseline by the NLP community (e.g.,
          <xref ref-type="bibr" rid="ref4">Faruqui et al. (2014)</xref>
          and
          <xref ref-type="bibr" rid="ref10">Mikolov et al. (2013)</xref>
          ).
        </p>
        <p>
          Our CBOW Model gave us the best results. We report our
accuracies in Table 2. It can be seen that our results are the
state of the art on this dataset.
          <xref ref-type="bibr" rid="ref5 ref6">Harkous et al. (2018</xref>
          b) show
a higher F1 than ours. But these results are reported on a
small set of user queries which by their nature are different
from the segment classification task described in
          <xref ref-type="bibr" rid="ref17">Wilson et
al. (2016)</xref>
          .
        </p>
        <p>
          We compare our results with the F1 score of
          <xref ref-type="bibr" rid="ref17">Wilson et al.
(2016)</xref>
          and
          <xref ref-type="bibr" rid="ref9">Liu et al. (2018)</xref>
          .
        </p>
        <p>
          We get a macro F1 of 0.803 when compared to their macro
F1 of 0.667 on the classes which we have chosen to
evaluate our model. We also compare our results with Liu et
al. (2016a). We see that we get better results than
          <xref ref-type="bibr" rid="ref9">Liu et
al. (2018)</xref>
          on the average F1 score. The difference is
significant when we use a neural model for classification on
classes with lower number of positive examples. We made
consistent observation with
          <xref ref-type="bibr" rid="ref9">Liu et al. (2018)</xref>
          when using
GloVe vectors. We found it hard to predict classes with lower
counts of positive examples. However, our in-domain word
embeddings help circumvent this issue by providing
meaningful improvement in results. We get a macro F1 of 0.58 on
this class. This further suggests that in-domain word
embeddings are needed to get improvement in results.
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>Deep Convolutional Model</title>
        <p>We also tried deep convolutional models by convolving over
the word embeddings. We used two sets of convolutional
layers , with 200 filters each, but with strides of 3 and 5. We
used Relu non-linearity and Maxpooling operation before
being fed to the next set of these CONV-RELU-MaxPool
Layers. The detailed architecture for our model is shown in
Figure 4.</p>
        <p>
          Although this model generally does better over the
baselines described by
          <xref ref-type="bibr" rid="ref17">Wilson et al. (2016)</xref>
          , in some of the
categories, it fails to capture the meaning when the number of
positive examples are very small. The average macro F1 for
this task was 0.74.
        </p>
        <p>The results of our convolutional model is shown in
Table 3. It can be seen that our model is very close to the
CBOW model, but does not outperform it as the number
of training examples is fairly low in the OPP-115 corpus.
The simple feed forward network is able to capture the
relationship better as the number of parameters in this model is
orders of magnitude lower than in the deeper model. This is
consistent with the general observation that as the model
parameters increase, one needs more data to get better results.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6 Deep Contextualized Word Embeddings</title>
      <p>
        We also explored the use of contextualized word
embeddings, using BERT
        <xref ref-type="bibr" rid="ref3">(Devlin et al., 2018)</xref>
        , which at the time of
writing is generally considered the state of the art model for
such embeddings. We follow Devlin et al.’s recommendation
and only train the model for 3 epochs.
      </p>
      <p>From Table 4 we see that we get good results on classes
that have a higher number of positive instances. BERT does
not do as well on classes with a lower number of positive
instances. This is consistent with the observations across a lot
of NLP tasks such as Question Answering or Natural
Language Inference, where authors have reported significantly
better results when BERT was used for classification tasks.
It is to be noted that we don’t do any hyper-parameter tuning
with BERT, we just take an off-the-shelf model and train on
the OPP-115 corpus for 3 epochs.</p>
      <p>7</p>
    </sec>
    <sec id="sec-7">
      <title>Results and Discussions</title>
      <p>In this section we try and answer research questions that are
at the intersection of privacy policies and language
technologies.</p>
      <sec id="sec-7-1">
        <title>Do in-domain word embeddings help?</title>
        <p>
          In this section we compare our non contextualized word
embeddings to the non contextualized generic word
embeddings. We believe it is not fair to compare our word
embeddings to BERT, because BERT is not an embedding, but
more of a language model to represent sentences. Here we
focus on evaluating the benefits of non-contextualized
indomain word embeddings over non-contextualized generic
word embeddings. We use GloVe embeddings
          <xref ref-type="bibr" rid="ref12">(Pennington,
Socher, and Manning, 2014)</xref>
          as our generic embeddings. We
observe from tables 2, 3, 4 and 5 that our in-domain
embeddings perform better than GloVe in both models. This
comparison intentionally ignores the performance of BERT,
as BERT is a contextualized embedding that represent
sentences rather than words.
        </p>
        <p>It is also worth noting that the standard deviation of F1
scores is higher for practices with lower numbers of positive
examples. In this paper, we have report average F1 scores
across the different validation folds, along with their
standard deviations. This is to capture the variability of the F1
score, as one might observe when looking at unseen data.</p>
      </sec>
      <sec id="sec-7-2">
        <title>What should be the dimensionality of privacy embeddings?</title>
        <p>We checked the performance of both the GloVe and our
indomain embeddings with different settings of the
embeddings size (100 and 300). As can be seen, the 300
dimensional embeddings performed better than the 100
dimensional embeddings. The results of this experiment is
provided in Table 7.</p>
        <p>From Table 7, it can be observed that the higher
dimensional embeddings tend to give better results over their lower
dimensional counterparts. The higher order dimension takes
longer to train, but since it has more dimensions it can
capture relationships between words in a more expressive
fashion.</p>
        <p>Performance for different values of the embeddings are
shown in from Table 7. Results presented in earlier tables
are for 300 dimensional embeddings, as these embeddings
performed than the 100 dimensional ones.</p>
      </sec>
      <sec id="sec-7-3">
        <title>Training Data vs. F1 score</title>
        <p>We now turn our attention to trying to answer the question
of how many privacy policies are needed to get good
embeddings for the OPP-115 dataset classification task. For this,
we trained 100 dimensional FastText embeddings using
different numbers of privacy policies and looked at the F1 score
across various categories.</p>
        <p>We observe from figure 7 that there is a plateau in the
average F1 score which can be obtained after 20,000 policies.
It is also interesting to note that classes which have higher
positive examples in this dataset, tend to plateau quicker
than the ones which have lower numbers of positive
examples. For example, the “First Party Collection and Use” class
has a lot more positive examples than the “Data Retention”
class. We can see in figure 7 that the “Data Retention” class
also takes a lot more time than the‘First Party Collection and
Use” to reach the maximum F1 observed using our model.</p>
      </sec>
      <sec id="sec-7-4">
        <title>How do the embeddings look?</title>
        <p>After training word embeddings we would want to
visualize a how the embeddings look like in the high dimensional
vector space. For this, we take the 300 dimensional
embeddings which we trained using our policy corpus and project
them onto a 2D space using Principal Component Analysis
(PCA). We compare the result of this projection with GloVe
to see if our domain-specific model captures domain
semantics better than GloVe.</p>
        <p>It can be observed in Figure 7 that the privacy related
terms are very closely clustered. In Figure 5 we see that there
is no special semantics for privacy related words. The terms
“food” and “party” are closely related, as that is the more
common case outside of the domain of privacy policies. But
in the Figure 7 “party” is more closely associated with
“privacy” because of the privacy domain-specific concept of
“third-party collection.” The terms “cookies,” “track,” and
“browser” are three words that commonly appear together
in privacy policies. We observe that these words are closely
related in our in-domain vector space. It can also be
observed that the terms ”data” and ”collection” are closer to
each other in our in-domain vector space when compared
to the more general vector space of GloVe. This is because
privacy policies talk about users’ data collection most of the
time.</p>
        <p>While anecdotal, these observations suggest that the
improvement in performance resulting from the use of
indomain embeddings can be attributed to these embeddings
being able to better capture unique syntactic and semantic
features of privacy policies.</p>
        <p>8</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>In this paper we presented distributed word vector
representations for privacy policies. We showed that in-domain
embeddings can yield performance improvements on privacy
policy related tasks over the use of generic embeddings such
as GloVe. We reported good F1 scores across different data
practice categories using our domain specific embeddings.
We also showed both quantitatively and qualitatively that
our in-domain word embeddings help improve performance
of privacy policy segment labeling tasks on the OPP-115
corpus of privacy policies.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgment</title>
      <p>This study was supported in part by the NSF Frontier grant
on Usable Privacy Policies (CNS-1330596). The US
Government is authorized to reproduce and distribute reprints
for Governmental purposes not withstanding any copyright
notation. The views and conclusions contained herein are
those of the authors and should not be interpreted as
necessarily representing the official policies or endorsements,
either expressed or implied, of the NSF or the US
Government. This work used the Extreme Science and Engineering
Discovery Environment (XSEDE), which is supported by
National Science Foundation grant number ACI-1548562.
The authors acknowledge the Texas Advanced Computing
Center (TACC) at The University of Texas at Austin for
providing high performance computing resources that have
contributed to the research results reported within this paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ducharme</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Vincent,
          <string-name>
            <given-names>P.</given-names>
            ; and
            <surname>Jauvin</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <year>2003</year>
          .
          <article-title>A neural probabilistic language model</article-title>
          .
          <source>Journal of machine learning research 3</source>
          (Feb):
          <fpage>1137</fpage>
          -
          <lpage>1155</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and Mikolov,
          <string-name>
            <surname>T.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>arXiv preprint arXiv:1607</source>
          .
          <fpage>04606</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Chang, M.-W.;
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2018</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Faruqui</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dodge</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Jauhar,
          <string-name>
            <given-names>S. K.</given-names>
            ;
            <surname>Dyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ;
            <surname>Hovy</surname>
          </string-name>
          , E.; and
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N. A.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Retrofitting word vectors to semantic lexicons</article-title>
          .
          <source>arXiv preprint arXiv:1411</source>
          .
          <fpage>4166</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Harkous</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fawaz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lebret</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schaub</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shin</surname>
            ,
            <given-names>K. G.</given-names>
          </string-name>
          ;
          <article-title>and Aberer, K. 2018a</article-title>
          .
          <article-title>Polisis: Automated analysis and presentation of privacy policies using deep learning</article-title>
          .
          <source>arXiv preprint arXiv:1802</source>
          .02561.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Harkous</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Fawaz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lebret</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schaub</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shin</surname>
            ,
            <given-names>K. G.</given-names>
          </string-name>
          ;
          <article-title>and Aberer, K. 2018b</article-title>
          .
          <article-title>Polisis: Automated analysis and presentation of privacy policies using deep learning</article-title>
          .
          <source>arXiv preprint arXiv:1802</source>
          .02561.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Andersen</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schaub</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Almuhimedi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ; Zhang,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ; Sadeh,
          <string-name>
            <given-names>N.</given-names>
            ;
            <surname>Acquisti</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          ; and Agarwal,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <year>2016a</year>
          .
          <article-title>Follow my recommendations: A personalized privacy assistant for mobile app permissions</article-title>
          .
          <source>In Symposium on Usable Privacy and Security.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ; Wilson,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Schaub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ; and
            <surname>Sadeh</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <year>2016b</year>
          .
          <article-title>Analyzing vocabulary intersections of expert annotations and topic models for data practices in privacy policies</article-title>
          .
          <source>In 2016 AAAI Fall Symposium Series.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ; Wilson,
          <string-name>
            <surname>S.</surname>
          </string-name>
          ; Story,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Zimmeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ; and
            <surname>Sadeh</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>Towards automatic classification of privacy policy text</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ; Chen,
          <string-name>
            <given-names>K.</given-names>
            ;
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. S.</given-names>
            ; and
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <year>2013</year>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          ,
          <volume>3111</volume>
          -
          <fpage>3119</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Nair</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G. E.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>Rectified linear units improve restricted boltzmann machines</article-title>
          .
          <source>In Proceedings of the 27th international conference on machine learning (ICML-10)</source>
          ,
          <fpage>807</fpage>
          -
          <lpage>814</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Socher, R.; and
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Glove: Global vectors for word representation</article-title>
          .
          <source>In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          ,
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Sadeh</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Acquisti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Breaux</surname>
          </string-name>
          , T. D.;
          <string-name>
            <surname>Cranor</surname>
            ,
            <given-names>L. F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N. A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schaub</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ; and Wilson,
          <string-name>
            <surname>S. December</surname>
          </string-name>
          <year>2013</year>
          .
          <article-title>The usable privacy policy project: Combining crowdsourcing, machine learning and natural language processing to semi-automatically answer those privacy questions users care about</article-title>
          .
          <source>Tech. report CMU-ISR-13- 119</source>
          , School of Computer Science, Carnegie Mellon University,Pittsburgh, PA
          <volume>15213</volume>
          , USA.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Sathyendra</surname>
            ,
            <given-names>K. M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ravichander</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Story</surname>
            ,
            <given-names>P. G.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Black</surname>
            ,
            <given-names>A. W.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Sadeh</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <year>2017a</year>
          .
          <article-title>Helping users understand privacy notices with automated query answering functionality: An exploratory study</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Sathyendra</surname>
            ,
            <given-names>K. M.</given-names>
          </string-name>
          ; Wilson,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Schaub</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ;
            <surname>Zimmeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ; and
            <surname>Sadeh</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <year>2017b</year>
          .
          <article-title>Identifying the provision of choices in privacy policy text</article-title>
          .
          <source>In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <fpage>2774</fpage>
          -
          <lpage>2779</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.;
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.;</given-names>
          </string-name>
          and Salakhutdinov,
          <string-name>
            <surname>R.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>Dropout: a simple way to prevent neural networks from overfitting</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          <volume>15</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Wilson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schaub</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Dara</surname>
            ,
            <given-names>A. A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Cherivirala</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Leon,
          <string-name>
            <surname>P. G.</surname>
          </string-name>
          ; Andersen,
          <string-name>
            <given-names>M. S.</given-names>
            ;
            <surname>Zimmeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ;
            <surname>Sathyendra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. M.</given-names>
            ;
            <surname>Russell</surname>
          </string-name>
          ,
          <string-name>
            <surname>N. C.</surname>
          </string-name>
          ; et al.
          <year>2016</year>
          .
          <article-title>The creation and analysis of a website privacy policy corpus</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , volume
          <volume>1</volume>
          ,
          <fpage>1330</fpage>
          -
          <lpage>1340</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Wilson</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Schaub</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sathyendra</surname>
            ,
            <given-names>K. M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Smullen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zimmeck</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Ramanath,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ; Story,
          <string-name>
            <given-names>P.</given-names>
            ;
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ;
            <surname>Sadeh</surname>
          </string-name>
          , N.; et al.
          <year>2018</year>
          .
          <article-title>Analyzing privacy policies at scale: From crowdsourcing to automated annotations</article-title>
          .
          <source>ACM Transactions on the Web (TWEB) 13</source>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>