Consent Recommender System: A Case Study on LinkedIn Settings

                         Rosni K V, Manish Shukla, Vijayanand Banahatti, Sachin Lodha
                                                    TCS Research Labs, India
                                {rosni.kv,mani.shukla,vijayanand.banahatti,sachin.lodha}@tcs.com


                           Abstract                                2016) discussed the specific ways in which vague or unclear
                                                                   language hinders the comprehension of enterprise practices.
  Privacy is an increasing concern in the digital world, espe-     This paradigm represented one extreme of the data privacy
  cially when it has become a common knowledge that even           management landscape where the data-subject had little or
  high profile enterprises process data without data-subject’s
  consent. In certain cases where data-subject’s consent was
                                                                   no control over her data with respect to its usage and shar-
  taken, it was not linked to the proper purpose of process-       ing.
  ing. To address this growing concern, newer privacy regula-         Some enterprises allowed data-subjects to access their
  tions and laws are emerging to empower a data-subject with       data and provide consent for certain specific purposes such
  informed and explicit consent through which she can allow        as sharing of personal email or demographic data with third
  or revoke usage of her personal data. However, it has been       party. However, such privacy preference controls provided
  shown that privacy self-management does not provide the ex-      by enterprises were either limited or there was a discon-
  pected results. This is mainly due to information overload as    nect from privacy policy (Anthonysamy, Greenwood, and
  data-subjects use multiple services entailing variety of pur-
                                                                   Rashid 2013) or it was hard to use them (Madden 2012).
  poses, and hence, resulting in a very large number of consent
  requests. This may lead to consent fatigue as data-subject is    Further, these controls did not stop an enterprise from an-
  now expected to provide informed consent for each associ-        alyzing the data for gaining additional insights into data-
  ated purpose. The consent fatigue in data-subjects can lead to   subject’s behavior. More recently, these concerns were ad-
  either incorrect decision making or opting for default values    dressed by newer privacy regulations and acts in different
  provided by the enterprise, and thus, defeating the purpose of   geographies, for example, GDPR in EU (Voigt and Von dem
  new data privacy regulations.                                    Bussche 2017) and CCPA in California (de la Torre 2018).
  In this work, we discuss the factors influencing the informed    These data protection regulations are designed to protect the
  consent of a data-subject. Further, we propose a ‘consent rec-   personal information of individuals by restricting how such
  ommender system’ based on Factorization Machines (FMs)           information can be collected, used and disclosed by having
  to assist the data-subject and thereby avoiding consent fa-      proper informed consent from data-subjects (Barnard-Wills,
  tigue. Our consent recommender system effectively models         Chulvi, and De Hert 2016). For example, France’s National
  the interaction between the different factors which influence    Data Protection Commission (CNIL) penalized Google for
  a data-subject’s informed consent. We discuss how this setup
  extends for cold start data-subjects facing the decision prob-
                                                                   not having a valid legal basis to process the personal data of
  lem with consent requests from multiple enterprises. Addi-       the users of its services, especially for ads personalization
  tionally, we demonstrate the scenario of consent recommen-       purposes1 .
  dation as a prediction problem with minimum attributes avail-       Informed consent is beginning to form the foundation of
  able from LinkedIn’s privacy settings.                           data protection law in many jurisdictions. It is intuitively
                                                                   considered as an appropriate method to ensure the protection
                                                                   of a data-subject’s autonomy as it allows her to have control
                     1    Introduction                             over her personal data (Voigt and Von dem Bussche 2017;
With ever increasing digitalization we experience that enter-      Dwyer III, Weaver, and Hughes 2004). However, if a data-
prises capture consumer data for understanding their behav-        subject interacts with multiple services having consent re-
ior and for offering better personalized services. More than       quirement for many purposes (defined in Section 3) then
often the captured data contains personal and sensitive infor-     it leads to information overloading while making decision,
mation of the consumer (also referred to as ‘data-subject’),       and hence, consent fatigue. In biomedical domain consent
and thus, leads to privacy concerns (Andrade, Kaltcheva, and       fatigue is a well discussed topic (Ploug and Holm 2013).
Weitz 2002; Malhotra, Kim, and Agarwal 2004; Flavián and          Solove (Solove 2012) and Casteren (Casteren 2017) have
Guinalı́u 2006). Till recently, the data privacy landscape was     studied about consumer’s privacy self-management and their
more enterprise centric with long and incomprehensible pol-
icy documents and default opt in for data sharing and us-              1
                                                                         https://www.cnil.fr/en/cnils-restricted-committee-imposes-
age (Cranor et al. 2013). In her work, Priya Kumar (Kumar          financial-penalty-50-million-euros-against-google-llc
                                                 Pre process (extract the required settings,
                                                  one-hot encoding user indexing, purpose                  Model               User Preference Score
                                                           related information)


                                    Survey
                                               Privacy Settings                 Pre-Processed Data
                                  Responses
                                                                                                                                score >= threshold?
                                                                                            Query Vector
                                                                  New User
                                                                                                                       Yes                             No


                                                                                                                   Allow                               Deny


                                              Figure 1: Recommender System Overview


ability to make meaningful decisions with information over-                        describes the implication of our work, future research possi-
load. A recent study (Degeling et al. 2018) discusses the                          bilities and the limitation of our work with some concluding
impact of GDPR on web applications and services as well                            remarks in section 7.
as new issues arising from the same. Two key takeaways
from their work are: a) The majority of websites updated                                                           2         Related Work
their privacy policies in the last two years, and, b) Average                      Often services and applications capture more than required
text length in policy document rose from a mean of 2,145                           user data for analytics or generating profit by selling it to
words in March 2016 to 3,044 words in March 2018 (+41%                             third party. An example of this was discussed in (Balebako
in 2 years) and increased another 18% until late May (3,603                        et al. 2013) where they showed that even well-known mobile
words). The consent fatigue may either result in wrong de-                         applications capture sensitive data of data-subjects and then
cision making by data-subject or providing implicit consent                        share it with third party without their cognizance. However,
by not taking any action.                                                          with latest data privacy regulations a data-subject’s consent
   In this work, we explore the problem of consent fatigue                         becomes necessary to process her data. Substantial amount
due to information overload and frequent decision mak-                             of work is done for understanding privacy concerns of data-
ing. To address this issue we proposed and implemented a                           subject (Liu et al. 2016; Olejnik et al. 2017; Knijnenburg
consent recommender system for LinkedIn application. Our                           2014; Sadeh and Hong 2014; Liu, Lin, and Sadeh 2014;
work enables a LinkedIn user in identifying appropriate pri-                       Sadeh et al. 2009; Wijesekera et al. 2017).
vacy controls and its corresponding setting. It is especially                         In their work, Sadeh et al analyzed the sensitive data re-
useful for cold-starting a new user for whom no prior histor-                      quested by a mobile app and the purposes associated with
ical privacy preferences are available. The main contribution                      it (Sadeh and Hong 2014). Liu et al, detected user profiles
of our work consists of a novel combination of Factorization                       based on the user application permission settings (Liu, Lin,
Machine (FM) (Rendle 2010; 2012) and factors affecting an                          and Sadeh 2014). They further used Singular Value Decom-
individuals decision making process for predicting their pri-                      position (SVD) for addressing the issues related to sparsity
vacy preference. That said, the details of our contribution are                    and dimensionality. In (Wijesekera et al. 2017), authors re-
as follows:                                                                        duce the burden on users by automating the decision making
• We conducted a survey on 50 data-subjects to identify fac-                       process in smartphones.
  tors that can influence their decision-making process. Fur-                         Researchers have also looked into the privacy preference
  ther, we collected LinkedIn privacy setting data for each                        recommender system for social networks. Ghainour et al
  participant for building our recommendation model.                               (Ghazinour, Matwin, and Sokolova 2016) proposed a rec-
                                                                                   ommender system for privacy settings in social networks,
• In this work we have shown that the privacy recommenda-
                                                                                   particularly for Facebook. They modeled user’s Facebook
  tion problem can be modeled as a prediction problem. For
                                                                                   privacy settings of photo albums by independently consid-
  that we used Factorization Machine (FM) (Rendle 2010;
                                                                                   ering different attributes, for example, personal profile and
  2012) for consent recommendation. This also helped in
                                                                                   interests. In this paper, we also make use of the pairwise in-
  analyzing the pairwise interaction of attributes for learn-
                                                                                   teraction of attributes. As it helps in learning reliable weights
  ing reliable weights. Further, we showed that the accuracy
                                                                                   by taking the inner product of lower dimensional vectors.
  of our proposed model is around 88%. Also, we discussed
                                                                                      In a recent work, (Naeini et al. 2017) focused on privacy
  the change in accuracy (in terms of precision, recall and
                                                                                   expectations and preferences in IoT data collection scenar-
  F1-score) with respect to the different combination of fea-
                                                                                   ios. Naeini et al (2017) further showed that privacy pref-
  tures.
                                                                                   erences are diverse, context dependent and participants are
   The rest of the paper is organized as follows. Related work                     more likely to consent to data if it benefits them. Addition-
is presented in section 2. Architecture and system descrip-                        ally, they were able to predict data-subjects preferences af-
tion are given in section 3. The survey methodology, demog-                        ter three data-collection scenarios. The work presented in
raphy details and result analysis are discussed in section 4.                      (Naeini et al. 2017) comes closer to our work. However,
The experimental results are shown in section 5. Section 6                         their main focus is on improving the privacy notices for IoT
                                                                                      n
                          1       0       …   1         0           …   1        0         …      1      0       …       1      0      …           1
                          1       0       …   0         1           …   0        1         …      0      1       …       0      0      …           0
                   m      0       1       …   1         0           …   1        0         …      1      0       …       0      1      …           1
                          0       0       …   0         0           …   1        0         …      0      1       …       1      0      …           1
                          u1     u2       …   d1       d2           …   p1       p2        …      pc1    pc2     …      psc1   psc2     …

                               User (u)            Data Field (d)            Purpose (p)          Purpose Category     Purpose Sub-category    Consent (c)


                                                                                                                       α
Figure 2: Input Matrix to Factorization Model. Where, α is the set of attributes, m is the number of samples and n is the number of features.
For further description refer to Section 3 and 3.1.


devices and develop more advanced personal privacy assis-                                  (Degeling et al. 2018). We extracted the privacy setting of
tants, whereas, we are addressing the problem of informa-                                  each participant in our experiment. The collected data is pro-
tion overload, and hence, the issue of consent fatigue in post                             cessed to create a suitable feature vector for training the FM
GDPR and CCPA era.                                                                         model using TensorFlow (Abadi et al. 2016). We tested the
                                                                                           accuracy of model by splitting the collected data into train-
                  3    System Description                                                  ing and testing and reported the results in Section 5.
Definitions: Some basic definitions of the terms as per
GDPR (Voigt and Von dem Bussche 2017):
                                                                                           3.1     Factorization Machines (FM)
                                                                                           Our data is described in the matrix format X ∈ Rm×n ,
1. data-subject is an individual whose personal data is col-                               wherein, xi ∈ Rn is the ith row that represents the combi-
   lected, held or processed. In this paper terms consumer                                 nation of a data-subject and a particular privacy setting with
   and data-subject are used interchangeably.                                              additional attributes as binary indicator variables. The re-
2. personal data shall mean any information relating to an                                 sponse variable y i ∈ R represents the consent value for ith
   identified or identifiable natural person (‘data subject’)                              feature vector. Figure 2 shows the input matrix representa-
3. consent is defined as a data-subject’s informed and unam-                               tion used in this work.
   biguous agreement to process her data.                                                     Why FM for Consent Recommendation? The Equa-
                                                                                           tion 1 shows the traditional linear regression model, where,
4. purpose of processing data refers to the need and unam-                                 w0 ∈ R and W ∈ Rn are bias and weights for features
   biguous reason for collecting, accessing and processing                                 respectively. For any two given features we can indepen-
   data-subject’s data.                                                                    dently learn the weight parameters using the model of Equa-
   Problem Statement: Let U be the set of data-subjects                                    tion 1 with linear time complexity. However, this model is
such that U = {u1 , . . . , uN }. Further, let S be a service                              not suitable for learning the pairwise interaction of features
provider (LinkedIn in our case), that processes large amount                               as discussed in (Rendle 2010; 2012). A polynomial regres-
of data fields D = {d1 , . . . , dK }. Let P = {p1 , . . . , pX }                          sion model with order 2 can capture the parameters for pair-
be the set of clear and unambiguous purposes under which                                   wise interaction, but, its time complexity is O(n2 ).
S processes D. For a given purpose pi ∈ P , there is an as-                                                                              n
sociated Di ⊆ D. The service provider S will only process                                                                                X
Di for the purpose pi . Similarly, a data field dj ∈ D could                                                         ŷ(x) := w0 +             wi xi                          (1)
                                                                                                                                         i=1
be linked to multiple purposes Pj ⊆ P . Also, purpose pi is
associated with a set of attributes (αi ) (e.g., description, pur-                            In a consent recommendation system various factors in-
pose category, sensitivity of requested data field, etc.), such                            teract and influence each other and that is why we have se-
          SX
that α = i=1 αi .                                                                          lected FM as our model. It solves the issue by factorizing the
   Figure 1 describes the overall flow of our proposed rec-                                W as a lower dimensional factor matrix. The model equation
ommendation system. We selected LinkedIn for building our                                  from (Rendle 2012) is given below:
recommendation model because its a popular professional
networking site and we found their privacy settings very                                                         n
                                                                                                                 X                n
                                                                                                                                  X n
                                                                                                                                    X                        k
                                                                                                                                                             X
comprehensive, including, handling of GDPR related con-                                        ŷ(x) := w0 +           w i xi +                  x i x i0          vi,j vi0 ,j (2)
cerns2 . The modification in their policy was notified via a                                                     i=1              i=1 i0 =i+1                j=1
banner on their landing page. In case a data-subject keeps
on using their service without modifying any settings then                                 In Equation 2, model parameters are w0 ∈ R, w ∈ Rn and
it is considered as implicit consent which is discussed by                                 V ∈ Rn×k . Further, vi and vi0 in V represents the ith and
                                                                                           (i0 )th variables with k latent factors. The first part of the
   2
       https://www.linkedin.com/help/linkedin/topics/6701/6702                             above equation models the linear interaction, and, second
                         1


              2

                                                 3


                                                           4


Figure 3: LinkedIn’s Privacy Settings. Example of purpose and related attribute is highlighted and numbered. 1. Purpose Category (e.g.
Account), 2. Purpose Sub Category (e.g. General advertising preferences), 3. Purpose (e.g. Insights on websites you visited ), 4. Setting
Information comprises data field and consent value (e.g. toggle button representing ‘yes’)


part shows the pairwise interaction of variables with low                              participants did not have any personally identifiable infor-
rank(k) using their inner product. This effectively helps to                           mation. The study consisted of three sections: a) an online
estimate the parameters in highly sparse dataset. The Equa-                            survey focused on understanding respondent’s basic demo-
tion 2, is of order 2. We can have higher order variable inter-                        graphics, b) Internet User’s Information Privacy Concern
actions as shown below (Rendle 2010):                                                  (IUIPC) survey(Malhotra, Kim, and Agarwal 2004), and c)
                                                                                       some additional questions to support our design, so as to un-
                  n                                                                    derstand how active the participant is in social networking
                                                                                       platforms, especially, in this case LinkedIn (refer to Section
                  X
 ŷ(x) = w0 +           w i xi +
                  i=1
                                                                                       4.2).
         d X
           n                   n          l
                                                       !       kl Y
                                                                  l
                                                                                   !      The participants were asked to provide us their privacy
         X                     X          Y                    X           (l)         settings information from LinkedIn. We processed the set-
                      ···                       x ij                      vij ,f       tings information and related description for building binary
          l=2 i1 =1         il =il−1 +1   j=1                  f =1 j=1
                                                                                       indicator feature vectors (xi ∈ Rn , refer to Section 3.1). We
                                                            (3)                        considered each section title as a purpose that comes un-
Where, V(l) ∈ Rn×kl , kl ∈ N+  0 and, ∀l ∈ {2, . . . , d}, with                        der three categories (privacy, advertisement and communi-
d as the order.                                                                        cation) and 11 subcategories during our study. The purpose
   Prediction of Consent: Given a feature vector x, Equa-                              information comprised of one or more control buttons de-
tion 3 quantifies the consent. The recommendation can be                               noted as setting information (refer to Figure 3). Each type
generated by thresholding the value of ŷ(x). Therefore, the                           of variables such as setting, purpose and its attributes were
predicted consent Cp is defined as:                                                    encoded as one-hot vector.
                             
                                 1, allow if ŷ(x) ≥ θ                                 4.1   Additional Survey Questions
              Cp (x) =                                                         (4)
                                 0, deny if ŷ(x) < θ                                  Participants were asked to rate their comfort level with ser-
                                                                                       vices using and sharing their personal information on a
                        4      Methodology                                             5-point Likert scale: Q1: I am comfortable with LinkedIn
This section describes the steps involved in our data collec-                          use/share my personal information or activity data for any
tion procedure. We selected the participants with an active                            purposes. Q2: I am comfortable with other social networks
LinkedIn account with last login activity not older than 15                            (example, Facebook, Twitter, Google+) use/share my per-
days. We presented a consent form prior to survey that ex-                             sonal information or activity data for any purposes
plained to each participant about the collected data, its use in                          To assess the change in a participant’s behavior, we asked
our study, and the retention period of the data. Those partic-                         the question Q1 and Q2 as Q3 and Q4 respectively with the
ipants who gave consent for data collection and processing                             following updated scenario:
were allowed to volunteer further. The data collected from                                The enterprise explicitly says that for what purpose it is
                                                                 40                          year, and 8% never changed their setting and have given im-
         20
                                                                                             plicit consent for their data use. Figure 4 shows the results
                                                                 35
                                                                                             from our survey. It is apparent that the ‘Agree, Disagree and
                                                                 30                          Neutral’ count value changes from ‘Q1’ to ‘Q2’ and from
         15                                                                                  ‘Q3’ to ‘Q4’. We used this insight and included purpose and
                                                                 25
                                                                                             it’s attributes for building our prediction model. In Figure 4,
 count


                                                         count
                                                                 20                          we can see that the most of the participants tend to make
         10
                                                                 15
                                                                                             their personal information visible to their social network.
                                                                                             However, some participants kept their information visible to
         5
                                                                 10                          the public in LinkedIn but not on other social networking
                                                                 5
                                                                                             sites. We conjecture that a participant could benefit by dis-
                                                                                             closing the professional information as it helps them build-
         0
                Q1         Q2            Q3   Q4
                                                                 0
                                                                      Q5                Q6
                                                                                             ing new professional connects, and hence, possibility of new
                                  variable                                   variable        job opportunities. This finding is coherent with the observa-
              Agree
              Strongly Disagree
                                                   Neutral
                                                   Strongly Agree
                                                                             Public
                                                                             Your network
                                                                                             tion from Geffet et al (Zhitomirsky-Geffet and Bratspiess
              Disagree                                                       Private         2016). These insights suggest that the reputation of an en-
                                                                                             terprise and the potential benefits to the data-subject could
                                          Figure 4: Survey Result                            influence consent decision.

                                                                                                            5    Experiment Analysis
                       IUIPC score                 Range              Mean        SD
                                                                                             We surveyed 50 participants for LinkedIn with maximum of
                           Control                    1-5             4.42       0.60        174 privacy settings, 42 purposes, 4 purpose categories (3
                          Awareness                   1-5             4.65       0.54        values used here) and 11 purpose subcategories. Total we
                          Collection                  1-5             4.29       0.68        had 5584 samples (m) with 281 features (n = 50 + 174 +
                                                                                             42+4+11), for m and n refer to Section 3.1. If a participant
                                     Table 1: IUIPC Score Details                            gives her consent for a given data field and purpose then the
                                                                                             state of the control is considered as ‘1’, that is the control
                                                                                             is selected, otherwise it will be ‘0’. Further, we utilized the
         using the information and it’s privacy practice is certi-                           TensorFlow implementation of FM algorithm (TFFM) with
         fied by a trusted organization.                                                     ADAM optimizer (Mikhail Trofimov 2016). Learning rate
                                                                                             was kept as 0.001 and the threshold value (θ) was set as 0.5.
   ‘Q5’ and ‘Q6’ were formulated to understand participants
                                                                                                In our experiment, we randomly divided all the partici-
opinion on visibility of their personal data on LinkedIn and
                                                                                             pants in 10 bins. We iterated over these 10 bins, using one
other social networking sites. Q5: If you are disclosing your
                                                                                             bin for testing purpose and the remaining 9 bins for train-
personal information in LinkedIn, who can see your per-
                                                                                             ing our model. Finally, We averaged out the accuracy ob-
sonal information? Q6: If you are disclosing your personal
                                                                                             tained from the 10 iterations, shown in Table 2. The sensi-
information in other social networks (example, Facebook,
                                                                                             tivity analysis of f1-score with respect to the rank is shown
Twitter, Google+), who can see your personal information?
                                                                                             in Figure 5. It can be observed that there is change in ac-
                                                                                             curacy with different degree of feature combination (order).
4.2             Survey Result Analysis                                                       Further, the size of the dataset is limited which may lead to
Dataset Demographics. Sampled population from our re-                                        the fluctuations in the line plot as rank increases. It would be
search lab consists of data-subjects with an active LinkedIn                                 interesting to use some contextual information such as text
account and an active user of at least one more social net-                                  from purpose description to understand the meaning behind
working service. The number of participants who gave their                                   latent factors (V ∈ Rn×k in Equation 2). The complexity of
consent for data collection experiment were 50. Out of these                                 different models is given in Table 3.
50 participants 54% were Male and 46% were Female. 96%                                          Mean Square Error, Precision and Recall: We analyzed
of the participants were from age group 22-30 years. The                                     the Mean Square Error (MSE), precision, recall and f1-score
minimum educational qualification within the sample pop-                                     with different order and rank combinations. The results are
ulation was under-graduate degree, whereas, the highest                                      shown in Table 2. Initially we considered all the purpose
qualification was Doctor of Philosophy (PhD). Also, 68%                                      attributes in our TFFM model. Further, we assessed the im-
of the participants were highly active (more than once in a                                  pact of purpose attributes by removing each attribute one
week) on LinkedIn’s social networking platform.                                              by one. From experiments we figured that rank(k) 17 gives
   Findings. In the entry level survey the participants scored                               better results in terms of accuracy. Moreover, we compared
relatively well on IUIPC scale for control, awareness and                                    TFFM results with Linear Support Vector Machine (SVM)
collection of personal information as reported in Table 1.                                   and polynomial SVM. Linear SVM showed marginal im-
This indicates that participants have reasonably high level of                               provement over TFFM model as linear models work better
privacy concerns. From the survey we found that 20% par-                                     with less amount of data. However, as explained in Section
ticipants have modified their privacy settings only at the time                              3.1, TFFM can work as a consent recommendation system
of registration, 42% modify once in a quarter, 30% once in a                                 given its linear complexity, scalability with larger datasets
                                             Models                   f1-score    precision    recall   MSE
                                             Linear SVM               0.89        0.87         0.94     -
                           No Rank           SVM (kernel=‘poly’)      0.82        0.69         1.0      -
                                             TFFM (d=1)               0.88        0.87         0.89     0.135
                                             d=2                      0.87        0.85         0.89     0.167
                           TFFM              d=3                      0.87        0.86         0.89     0.159
                                             d=4                      0.87        0.86         0.89     0.161
                                             TFFMx=A                  0.80        0.85         0.76     0.231
                           Order (d=3)       TFFMx=B                  0.84        0.84         0.84     0.274
                                             TFFMx=A+B                0.72        0.85         0.64     0.313

Table 2: Evaluation in terms of f1-score, precision, recall and mean square error (MSE) for rank = 17 (where, rank = k in Equation 2)
and order d. TFFMx is the TFFM model without purpose attributes ‘x’. Where ‘x’ can be Purpose Category (A), Purpose Sub Category (B) or
both (A+B). Variants of TFFM model compared with SVM linear model and SVM with ’poly’ kernel. It is observed that order d=3 performs
better among other orders. Linear SVM performs slightly better than TFFM. Also, TFFM with all purpose attributes performs better than the
model without purpose attributes


       Model      Order      Complexity                                 subject’s decision making process for consent. Furthermore,
                                                                        the survey results showed that data-subjects are more com-
         FM          d       O(kd nd ) (straight forward)               fortable in sharing information with enterprises providing
         FM          d       O(kn) (reformulated)                       professional services.
         FM          d       O(ks̄D ) (under sparsity)
        SVM          2       O(n2 )                                        Future Work. Informed consent from data-subject is piv-
                                                                        otal in data privacy regulations and safeguarding their inter-
                                                                        ests. However, privacy policies are complex, and even with
Table 3: Complexity of Models (Rendle 2010) with different              relevant educational qualification data-subjects find it diffi-
cases, where k is the number of latent factors, d is the order, s̄D
denotes the non zero elements from the data (s̄D =2 for matrix fac-     cult to make proper choices. Therefore, there is a need for
torization).                                                            personal digital assistant that can also help a data-subject in
                                                                        making consent decisions. For future work we will refer to
                                                                        (Liu et al. 2016; Naeini et al. 2017) as our baseline. As con-
                                                                        sent is pivotal concept in most of the regulations, therefore,
and can accommodate different contextual factors. It can be             we envision that it will be required even if the enterprise
inferred from Table 2 that SVM with ‘poly’ kernel is over-              were to process homomorphically encrypted data (Gentry
fitting with the data. Also, in his work Steffen Rendle (Ren-           and Boneh 2009).
dle 2010) showed that SVM with ‘poly’ kernel fails with two                Implicit consent for data collection, sharing and process-
way interactions.                                                       ing is possible due to multiple reasons. Three main reasons
    Cold start vs warm start: The cold-start recommenda-                contributing to implicit consent are: a) consent fatigue, b)
tion scenario appears when there are no prior preferences for           data-subjects unawareness, and c) complex privacy policy
users or items, whereas, warm-start arises when prior pref-             document. This may lead to a sense of false compliance and
erences are available.                                                  security (Degeling et al. 2018). A potential area to explore
    FM model works with attributes or categories of input               is to identify possible breach of compliance regulations due
data represented as binary indicators (Rendle 2012). The                to a data-subject’s implicit consent.
flexibility of this model helps us to deal with cold-start
                                                                           In this work we built our recommender system by training
users/items even when we lack prior preferences. Here, the
                                                                        our model on data gathered from LinkedIn. In post GDPR
purpose related attributes of input data are helpful for pre-
                                                                        and CCPA era, all the service providers of varying type are
dicting the new data-subject’s consent.
                                                                        expected to comply with them. However, more than often
                                                                        it is not feasible to gather sufficient data to build a model
           6    Discussion and Implication                              for each one of them. To address this issue transfer learning
Contributions. Our work makes some useful contributions                 could be a possible area to look into. Assuming the consent
in the context of information overload and resulting con-               requests from the other service has the same flavour of pur-
sent fatigue due to multiple purposes for whom consent is               poses and related attributes.
needed. We have shown that consent recommendation could                    Apart from European Union’s GDPR, many other coun-
be modeled as a prediction problem. Our recommender sys-                tries are looking into their own version of data privacy
tem has an accuracy of 87% for data-subjects with no prior              laws and regulations. For example, Protection of Personal
preferences or usage history. For warm-start data-subjects              Information Act, 2013 (POPI Act) of South Africa, Per-
the system is expected to perform even better. We also iden-            sonal Information Protection and Electronic Documents Act
tified certain factors which may heavily influence a data-              (PIPEDA) from Canada, Singapore Personal Data Protec-
                                                               Performance of model by varying rank

                                                            order 1                                            order 2
                                      1.0


                                      0.8


                           f1-score
                                      0.6


                                      0.4


                                                            order 3                                            order 4
                                      1.0


                                      0.8
                           f1-score


                                      0.6


                                      0.4

                                            5   10   15      20    25    30      35    40    5    10    15      20    25    30    35   40
                                                          Rank of matrix                                     Rank of matrix


                                                           TFFM               TFFM without categorical information


Figure 5: Performance of model by varying rank for different orders. Note that order = 1 is similar to linear models where there is no
significance of latent factors.


tion Act, 2012, and Data Protection Act in India. In future                                 consent, information about data field sensitivity and its re-
we would like to do a user study and analyze the effect of                                  tention period should matter, but it was hard to extract this
their demographics on the decision making process.                                          information from the experimental setup.
   Limitations. Our findings are based on study of privacy
settings of a single web-application. This prediction model                                                              7       Conclusion
developed for LinkedIn might not be suitable for a dating
site or a photograph sharing site. However, there is a possi-                               In this work, we explored the issues pertaining to informa-
bility of exploring the application of transfer learning and                                tion overload and consent fatigue due to complex privacy
checking the efficacy of our model on other applications.                                   policies and new regulations requiring consent for various
   We could collect only limited number of participant’s pri-                               purposes. We addressed this issue by implementing a con-
vacy settings. In order to obtain a more reliable confidence                                sent recommender system for LinkedIn. Furthermore, we
metric, we will carry out experiments with more partici-                                    demonstrated that the recommendation problem could be
pants. Also, in this work we have not quantified the degree                                 modeled as a prediction problem. Our analysis of survey re-
of fatigue. It will be interesting to see how it will affect the                            sponses and LinkedIn data enabled us to identify some im-
recommendation model. A possible way to assess it is to ob-                                 portant factors which can influence a data-subject’s decision
serve a data-subject’s interaction with the application.                                    making process. We hope that our work will be useful in
                                                                                            identifying the issues pertaining to consent fatigue and build
   The information we obtained from the self reported re-
                                                                                            interest for further research in this area.
sponses of the participants may suffer from ‘Privacy Para-
dox’ (Norberg, Horne, and Horne 2007). Even though most
of the participants were highly concerned about their pri-                                                                   References
vacy, but, their actual behavior towards consent request may                                Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean,
change in real life. Further, we could not analyze whether the                              J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al.
participants are going to change the privacy settings later or                              2016. Tensorflow: a system for large-scale machine learn-
not.                                                                                        ing. In OSDI, volume 16, 265–283.
   We conclude that a lot of factors can affect a data-subjects
consent depending on the purpose of processing data. How-                                   Andrade, E. B.; Kaltcheva, V.; and Weitz, B. 2002. Self-
ever, the unavailability of factors in the real world setting                               disclosure on the web: The impact of privacy policy, reward,
challenged us in our experiments. For example, the time of                                  and company reputation. ACR North American Advances.
consent request, benefit to a data-subject in exchange for                                  Anthonysamy, P.; Greenwood, P.; and Rashid, A. 2013. So-
cial networking privacy: Understanding the disconnect from      Malhotra, N. K.; Kim, S. S.; and Agarwal, J. 2004. Internet
policy to controls. Computer 46(6):60–67.                       users’ information privacy concerns (iuipc): The construct,
Balebako, R.; Jung, J.; Lu, W.; Cranor, L. F.; and Nguyen,      the scale, and a causal model. Information systems research
C. 2013. ”little brothers watching you”: Raising awareness      15(4):336–355.
of data leaks on smartphones. In Proceedings of the Ninth       Mikhail Trofimov, A. N. 2016. tffm: Tensorflow implemen-
Symposium on Usable Privacy and Security, SOUPS ’13,            tation of an arbitrary order factorization machine. https:
12:1–12:11. New York, NY, USA: ACM.                             //github.com/geffy/tffm.
Barnard-Wills, D.; Chulvi, C. P.; and De Hert, P. 2016. Data    Naeini, P. E.; Bhagavatula, S.; Habib, H.; Degeling, M.;
protection authority perspectives on the impact of data pro-    Bauer, L.; Cranor, L.; and Sadeh, N. 2017. Privacy expecta-
tection reform on cooperation in the eu. Computer Law &         tions and preferences in an iot world. In Proceedings of the
Security Review 32(4):587–598.                                  13th Symposium on Usable Privacy and Security (SOUPS).
Casteren, D. v. 2017. Consent now and then. Ph.D. Disser-       Norberg, P. A.; Horne, D. R.; and Horne, D. A. 2007. The
tation, Queensland University of Technology.                    privacy paradox: Personal information disclosure intentions
                                                                versus behaviors. Journal of Consumer Affairs 41(1):100–
Cranor, L. F.; Idouchi, K.; Leon, P. G.; Sleeper, M.; and Ur,   126.
B. 2013. Are they actually any different? comparing thou-
sands of financial institutions privacy practices. In Proc.     Olejnik, K.; Dacosta, I.; Machado, J. S.; Huguenin, K.;
WEIS, volume 13.                                                Khan, M. E.; and Hubaux, J. 2017. Smarper: Context-aware
                                                                and automatic runtime-permissions for mobile devices. In
de la Torre, L. 2018. A guide to the california consumer        2017 IEEE Symposium on Security and Privacy, SP 2017,
privacy act of 2018. Available at SSRN.                         San Jose, CA, USA, May 22-26, 2017, 1058–1076.
Degeling, M.; Utz, C.; Lentzsch, C.; Hosseini, H.; Schaub,      Ploug, T., and Holm, S. 2013. Informed consent and routin-
F.; and Holz, T. 2018. We value your privacy... now take        isation. Journal of Medical Ethics 39(4):214–218.
some cookies: Measuring the gdpr’s impact on web privacy.       Rendle, S. 2010. Factorization machines. In Data Mining
arXiv preprint arXiv:1808.05096.                                (ICDM), 2010 IEEE 10th International Conference on, 995–
Dwyer III, S. J.; Weaver, A. C.; and Hughes, K. K. 2004.        1000. IEEE.
Health insurance portability and accountability act. Security   Rendle, S. 2012. Factorization machines with libfm. ACM
Issues in the Digital Medical Enterprise 72(2):9–18.            Transactions on Intelligent Systems and Technology (TIST)
Flavián, C., and Guinalı́u, M. 2006. Consumer trust, per-      3(3):57.
ceived security and privacy policy: three basic elements of     Sadeh, J. L. B. L. N., and Hong, J. I. 2014. Modeling users
loyalty to a web site. Industrial Management & Data Sys-        mobile app privacy preferences: Restoring usability in a sea
tems 106(5):601–620.                                            of permission settings. In Symposium on Usable Privacy
Gentry, C., and Boneh, D. 2009. A fully homomorphic en-         and Security (SOUPS). Citeseer.
cryption scheme, volume 20. Stanford University Stanford.       Sadeh, N.; Hong, J.; Cranor, L.; Fette, I.; Kelley, P.;
Ghazinour, K.; Matwin, S.; and Sokolova, M. 2016. Your-         Prabaker, M.; and Rao, J. 2009. Understanding and
privacyprotector, a recommender system for privacy settings     capturing peoples privacy policies in a mobile social net-
in social networks. arXiv preprint arXiv:1602.01937.            working application. Personal and Ubiquitous Computing
                                                                13(6):401–412.
Knijnenburg, B. P. 2014. Information disclosure profiles for
segmentation and recommendation. In SOUPS2014 Work-             Solove, D. J. 2012. Introduction: Privacy self-management
shop on Privacy Personas and Segmentation.                      and the consent dilemma. Harv. L. Rev. 126:1880.
Kumar, P. 2016. Privacy policies and their lack of clear        Voigt, P., and Von dem Bussche, A. 2017. The EU General
disclosure regarding the life cycle of user information. In     Data Protection Regulation (GDPR), volume 18. Springer.
2016 AAAI Fall Symposium Series.                                Wijesekera, P.; Baokar, A.; Tsai, L.; Reardon, J.; Egelman,
                                                                S.; Wagner, D.; and Beznosov, K. 2017. The feasibility of
Liu, B.; Andersen, M. S.; Schaub, F.; Almuhimedi, H.;
                                                                dynamically granted permissions: Aligning mobile privacy
Zhang, S. A.; Sadeh, N.; Agarwal, Y.; and Acquisti, A. 2016.
                                                                with user preferences. In Security and Privacy (SP), 2017
Follow my recommendations: A personalized privacy assis-
                                                                IEEE Symposium on, 1077–1093. IEEE.
tant for mobile app permissions. In Twelfth Symposium on
Usable Privacy and Security (SOUPS 2016), 27–41. Den-           Zhitomirsky-Geffet, M., and Bratspiess, Y. 2016. Profes-
ver, CO: USENIX Association.                                    sional information disclosure on social networks: The case
                                                                of facebook and linked in in israel. Journal of the Associa-
Liu, B.; Lin, J.; and Sadeh, N. 2014. Reconciling mobile        tion for Information Science and Technology 67(3):493–504.
app privacy and usability on smartphones: Could user pri-
vacy profiles help? In Proceedings of the 23rd International
Conference on World Wide Web, WWW ’14, 201–212. New
York, NY, USA: ACM.
Madden, M. 2012. Privacy management on social media
sites. Pew Internet Report 1–20.