=Paper= {{Paper |id=Vol-2758/OHARS-paper2 |storemode=property |title=A Nudge-based Recommender System Towards Responsible Online Socializing |pdfUrl=https://ceur-ws.org/Vol-2758/OHARS-paper2.pdf |volume=Vol-2758 |authors=Rim Ben Salem,Esma Aïmeur,Hicham Hage |dblpUrl=https://dblp.org/rec/conf/recsys/SalemAH20 }} ==A Nudge-based Recommender System Towards Responsible Online Socializing== https://ceur-ws.org/Vol-2758/OHARS-paper2.pdf

A Nudge-based Recommender System Towards
Responsible Online Socializing
Rim Ben Salema , Esma Aïmeura and Hicham Hageb
a
Department of Computer Science & Operations Research, University of Montreal, Montreal, QC, Canada
b
Science Department, Notre Dame University-Louaize, Zouk Mosbeh, Lebanon

Abstract
In recent years, the popularity of social media has been on the rise. Driven by a multitude of motivations,
users have grown accustomed to sharing online most aspects of their lives. This self-disclosure has
not only proven to be dangerous for peoples’ privacy and security but also harmful to their personal,
professional and intimate relationships. However, in the age of social media, it seems improbable for
people to completely discontinue using social media platforms such as Facebook, Twitter and Instagram.
Hence, there is an urgent need to find a balanced compromise between online sociability and privacy.
In this work we propose a platform to mitigate the dangers of self-disclosure through the use of
a personalized harm-aware recommender system. Specifically, the recommender system balances the
requirements for privacy protection with the users’ need to share with their social circles. To achieve this,
the platform starts by evaluating the risks of disclosing the personal information and then, if necessary,
proceeds to recommend to the user how to reduce that risk. While the evaluation of the risk is done
in an objective manner, personalization is of the essence since users have different preferences and
sharing needs. As such, when performing the recommendation, the systems will provide personalized
nudge-based recommendations, raising the users’ awareness of the privacy issues stemming from self-
disclosure.

Keywords
Harm-aware recommender system, self-disclosure, privacy, personalization, nudge-based recommenda-
tions

1. Introduction
Today, Social Networking Sites (SNS) have become very versatile, providing users with a wide
range of functionalities: from posting messages, photos and videos, to playing games online,
shopping and finding a job. Due to the variety of SNS, the user’s information and data can
vary greatly from one to another. This variety of SNS platforms and services, along with the
broad appeal and extensive use of SNS, creates a wealth of information on users. A report by
the Pew research center indicates that about 75% of the American public uses more than one
SNS platform, and the typical American uses three of these sites [1]. Moreover, the report also
indicates that younger adults tend to use a greater variety of social media platforms.
The initial optimism about the positive potentials of the Internet and social media has given
way to concerns about the constant harvesting of personal information. Indeed, SNS actively

OHARS’20: Workshop on Online Misinformation- and Harm-Aware Recommender Systems, September 25, 2020, Virtual
Event
email: kulyabov-ds@rudn.ru (R. B. Salem); i.tiddi@vu.nl (E. Aïmeur); i.tiddi@vu.nl (H. Hage)
© 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)

23
encourages the collection and sharing of information, by gradually increasing the variety and
types of collected data, as well as relying on more revelatory default visibility settings [2].
While social media platforms are expected to safeguard the user’s data and information, this
task becomes much harder when the human factors are considered.
Indeed, while users in general report a high concern for their privacy, they tend to have a
privacy-compromising online behavior, and an unprecedented level of self- disclosure (the act
of voluntarily sharing and disclosing personal information to others) activity is taking place.
This contradiction between privacy attitudes and actual behavior is generally referred to as
the privacy paradox [3]. Essentially, one would logically expect that users’ privacy concerns
would restrict the voluntary disclosure of personal information. However, the reverse effect
is observed where users tend to share private information in exchange for retail items and
personalized services, or through their social network activities [4].
This has existed long before the advent of the internet and social media. In 1973, psychologists
Irwin Altman and Dalmas Taylor formulated the theory of social penetration [5]. It theorizes
that the more people disclose things about themselves, the closer they get to those with whom
they share said information. This applies to friendships [6] and romantic [7] relationships
where this reciprocal act is regarded as necessary to build and maintain the interpersonal ties.
Moreover, there are many other reasons why people feel motivated to divulge their personal
and sensitive information, ranging from seeking fame, attracting brand sponsorships, release
pent-up feelings [8], social validation [9, 10] to non-lucrative altruistic objectives like benefiting
others with life experience [11]. These personal motives lead to different assessment of the value
of one’s personal data [12]. Users looking to connect with people who reside close to them are
likely to easily disclose their location over another piece of data such as their medical record. In
addition to this, other parameters factor in peoples’ choices such as their background, thoughts,
interests and prior knowledge of the consequences of publishing said piece of information [13].
Hence self-disclosure is highly dependent on the person’s subjective evaluation of their data.
As such, approaches to mitigate the dangers of self-disclosure need to be equally user specific
and address the individual rather than the crowd.
The following scenario serves to further explain the issue and the proposed solution: Alice is
usually careful and aware of dangers on privacy and cybersecurity online. Recently, she has been
laid off her current job due to the economic struggle following the COVID-19 outbreak. While
browsing through her preferred social media platform, she stumbles upon a post for a “work
from home” job offer, providing an email address for applicants and requesting information
such as a curriculum vitae, social security number and address. This is one of the biggest scams
that have been targeting vulnerable people seeking an alternative income. Alice is usually
careful and aware of menaces on privacy and cybersecurity, but her judgement is clouded by
her current mental state and she is unable to objectively assess her situation. If she proceeds
without being advised otherwise, she might turn into a fraud victim.
In this case, there is a need for a system which already identified Alice’s privacy preferences,
knows that her current conduct is dangerous and can advise her against this course of action.
This system would intervene and nudge Alice telling her “BEWARE! Your are about to share
information which can easily lead to your identity being stolen. Consider deleting the infor-
mation ‘social security number’ to reduce the risk”. This paper proposes a platform for this
purpose, to guide users towards aware disclosure. The proposed system considers the user’s

24
assessment of their data as well as an objective evaluation and finds middle ground between
the two.
In dealing with this issue, this work makes the following contributions:

• Propose an objective threat assessment model for computing the privacy risk based on
an input vector representing the user’s disclosed data.
• Detail an adaptive nudge-based recommender that balances on one sidethe objective risk
assessment and on the other side the user’s sharing needs and privacy preferences.

The paper is organized as follows: section 2 discusses existing work that relates to our
proposition. Section 3 details the recommender system and the different submodules of the
architecture. Section 4 reports on the evaluation of the platform and section 5 concludes this
work with a discussion of the contributions provides pointers on future work.

2. Related Work
Trepte and Masur [14] define privacy as a need to control who has access to personal information
and a form of solitude, intimacy, anonymity, or reserve. It is also considered to be the key to
fulfilling basic human needs such as the need for autonomy and protected communication [15].
These needs differ from one person to the other and they are highly behavior-dependent [16].
Petronio and Durham [17] relate private information divulgence to their perception of their
ownership over the data. This further corroborates that privacy preservation needs to be
personalized as the act of disclosure itself is user-based.
People evaluate the risks and the perceived gratification [18] and, depending on their per-
sonality as well as the situation, one can outweigh the other resulting in the decision to veil or
unveil the data. However, such decisions can be affected by bias and misconceptions [4, 19], as
well as the user’s background such as skills, experience and education [20]. In fact, some users
consider clearing cookies to be the highest form of awareness and privacy preservation [21]
while for others, the most frequent protective decision is antivirus scans [22].
It is for this reason that the need for an objective party to help assess the situation has emerged.
The user’s decision-making abilities should not be discarded but aided instead. Bandura [23]
confirms that a certain behavior can be encouraged or deterred provided that the person receives
a prediction of positive incentives or detrimental consequences. This is where our proposition
plays a major role to either validate or discourage the self-disclosing user post and provide
personalized alternatives in the latter case.
Other studies aiming to reduce self-disclosure include research on the correlation between
changing privacy settings and revealing personal data [24]. The study concludes that simply
limiting the audience does not equal more control over sensitive credentials. This further
supports the urgent need to find a solution to manage the user’s input rather than focus on
platform-specific configurations and antimalware.
In particular, nudge-based mechanisms are garnering interest in awareness raising contexts.
Nudges propose positive reinforcements and suggestions. They are used for cybersecurity and
privacy preservation in order to encourage users to adopt aware behavior and to reflect on

25
their behavior in a non-obstructive way. They can be a one-fits-all where they depend on the
scenario rather than the user [25] or they can be tailored to the user [26, 27].
Nudged are used in a multitude of contexts, from helping adolescent SNS users avoid pri-
vacy and safety threats [28] to Blockchain-based open banking [29]. Another example is the
personalized privacy assistant for mobile app permissions [30], which proved its effectiveness
with 78.7% of its recommendations being accepted and implemented by the user. Similarly,
another work [31] aiming to address the privacy paradox has shown that users tend to reflect
on their behavior after receiving nudges. The authors compare the device’s general settings
with the permissions granted to a specific app and notes the discrepancies. Although this
comparison does reveal the user’s bias towards one app, if that is the case, the general settings
are also subjective. So, this work compares a general level of subjectivity and a specific one and
ideally, they are the same. However, this does not truly reveal the user’s deviation from advised
practices in an effort to correct them.
We believe that having an objective assessment along with the personal judgement has
the potential to not only get closer to the user’s preferences but also mitigate the risk of self-
disclosure, which is why we adopt this approach. The following section details our proposition
of the harm-aware recommender system.

3. Personalized Nudge-based Recommender System
This section details the proposed approach for a personalized, nudge-based recommender system.
However, the recommender system is one component of a larger platform. Consequently, to
help the reader better understand the design and function of the recommender system, the next
subsection highlights the general architecture of the platform (Figure 1), briefly introducing the
various components.

3.1. General Overview of the Platform
The platform relies on user modeling to study the user’s tendencies and preferences over time.
This is then utilized along with domain knowledge to empower the personalized recommender
system and mitigate the risk of self-disclosure. The user model is used to capture characteristics
of the individual including motivation, objectives and cognitive bias. A personalized approach
must also consider the user’s trust circle (where the user feels comfortable revealing personal
data), which includes both the platforms and the human counterparts.
Another major component is the domain knowledge which, contrarily to the user model, does
not focus on individuals but rather on the general processes and conclusions. This includes the
language model which serves to process the text input and to “understand” it. Other functions
include devising the subjective assessment model as well as studying the privacy tendencies on
the platform.
The Information Extraction component analyzes the raw text input of the user, determines
the exact disclosed data, ultimately generating the threat vector. It has the disclosed information
as well as their disclosure rate so each piece of data is represented as: (𝑥𝑖 , 𝑦𝑖 ).
Finally, the user model, the domain model and the threat vector are sent to the recommender
system, the focal point of this paper. The system orchestrator serves as the medium through

26
Figure 1: General overview of the platform.

which the different modules communicate.

3.2. The Recommender System Mechanism
This subsection focuses on the main contribution for this paper which stems from the rec-
ommender system mechanism. It is worth noting that it can be used as a standalone module
(browser plugin for example) or integrated within other platforms. Figure 2 is a view of the
recommender system’s modules which are detailed next.

3.2.1. Threat assessment module
The objective of this module is to provide an objective assessment of the risk. The value 𝑅𝑖𝑠𝑘
is user input-specific but not user preference-specific. Specifically, the value of 𝑅𝑖𝑠𝑘 depends
solely on the input. As such, if two users, with completely different preferences disclose the
same pieces of data, they would have the exact same risk value.
This is done using the objective threat function as follows:
𝑛
𝑅𝑖𝑠𝑘 = ∑ 𝑊𝑖,𝑜𝑏𝑗 · 𝑓 (𝑥𝑖 , 𝑦𝑖 ) (1)
𝑖=1

𝑥𝑖 corresponds to a component of the data vector 𝑋. Each component is a piece of personal
data such as location or social security number.

27
Figure 2: The architecture of the recommender system. The recommender mechanism starts with the
threat assessment after receiving data from the orchestrator.

𝑦𝑖 is a component of the disclosure vector 𝑌. For 𝑥𝑖 =legal name, 𝑦𝑖 = first name or 𝑦𝑖 = last
name.

𝑊𝑖,𝑜𝑏𝑗 is the objective weight/risk of disclosing 𝑥𝑖 .

𝑓 (𝑥𝑖 , 𝑦𝑖 ) disclosure rate: is the value of disclosing the information 𝑦𝑖 out of 𝑥𝑖 . This varies
between 0 and 1. The lower end means that the user didn’t disclose any portion of that
information and 1 refers to a complete disclosure of datum 𝑥𝑖 .

𝑛 the total number of personal data that the system considers to be private.

To the best of our knowledge, there are no formal objective measure to weigh the risk of
disclosing certain pieces of personal information. As such, we devised on a novel approach to
determine the importance (weight) of personal information 𝑊𝑖,𝑜𝑏𝑗 , based on their price on the
dark web. A piece of data is considered to be most sensitive and its disclosure most costly when
it has the highest price tag. These values were collected from different reports and studies done
by various sources including Experian, TransUnion, Atlas VPN, Safety detectives, Keepersecurity,
Symantec, Statista, and Pace technical [32, 33, 34, 35, 36, 37, 38, 39]. Table 1 illustrates some of
the data and their values. These values are fixed by the system for all users.
Table 2 shows the results of the raw data preprocessing by binning them into buckets from
least to highest sensitivity.
Each class is then represented by a normalized interval’s mean value to reduce the noise
of the raw data. This newly calculated value corresponds to the objective weight of each
information belonging to said class. For example, the weight of an information belonging to
the “low sensitivity” bucket is 2.25.
In Table 3, scenarios are defined, and their objective risk values are calculated using the
weights in Table 2 and the estimated values of the disclosure rate 𝑓 (𝑥𝑖 ). This is the method

28
Table 1
Prices of personal data on the dark web.
Personal data Min Value ($) Max Value ($) Sensitivity
Date of birth 1 1 Least
Address 1 1 Least
Social security number 1 4 Least
E-mail and password 5 30 Low
Passport (1) / ID (2) / License (3) 200 500 High
Diploma 300 656 Highest

Table 2
Data binning.
Classes 1 ≤ 𝑐𝑜𝑠𝑡 ≤ 26 26 < 𝑐𝑜𝑠𝑡 ≤ 101 101 < 𝑐𝑜𝑠𝑡 ≤ 201 201 < 𝑐𝑜𝑠𝑡 ≤ 352 352 ≤ 𝑐𝑜𝑠𝑡 ≤ 453
Normalized classes 1 ≤ 𝑐𝑜𝑠𝑡 ≤ 1.5 1.51 ≤ 𝑐𝑜𝑠𝑡 ≤ 3 3.01 ≤ 𝑐𝑜𝑠𝑡 ≤ 5 5.01 ≤ 𝑐𝑜𝑠𝑡 ≤ 8 8.01 ≤ 𝑐𝑜𝑠𝑡 ≤ 10
Sensitivity Least Low Medium High Highest
Representative element 1.25 2.25 4 6.5 9

used to estimate values of 𝑓 (𝑥𝑖 ): If the data is fully disclosed: 𝑓 (𝑥𝑖 ) = 1 and if the data is fully
undisclosed: 𝑓 (𝑥𝑖 )=0.
If a portion of the data is disclosed: Suppose that the complete data 𝑥𝑖 has 𝑛 pieces 𝑦𝑖 ranked
from least to most user specific, example for 𝑥𝑖 =credit card number, the first 4 digits (𝑦1 ) that
are bank-specific are worth less than the rest of the number (𝑦2 ) which are more user-specific.
To put a value of how much of 𝑥𝑖 does 𝑦1 account for, we define the function 𝑓 in general as
follows:

𝑓 (𝑥𝑖 , 𝑦1+𝑘 ) = 2 ∗ 𝑓 (𝑥𝑖 , 𝑦𝑘 ) , ∀ 1 ≤ 𝑘 < 𝑛
𝑓 (𝑥𝑖 , 𝑦1 ) + 𝑓 (𝑥𝑖 , 𝑦2 ) + … … … + 𝑓 (𝑥𝑖 , 𝑦𝑛 ) = 𝑓 (𝑥𝑖 , 𝑥𝑖 ) = 1
𝑓 (𝑥𝑖 , 𝑦1 ) + 2𝑓 (𝑥𝑖 , 𝑦1 ) + … … … + 𝑓 (𝑥𝑖 , 𝑦1 ) = 1
𝑓 (𝑥𝑖 , 𝑦1 ) (1 + 2 + 4 + … + 2𝑛−1 ) = 1
1
𝑓 (𝑥𝑖 , 𝑦𝑘 ) = 𝑛 ⋅ 2𝑘−1 (2)
2 −1
Example: in Table 3, scenario 2, the user disclosed their year of birth which is the most
1
specific in comparison with the day and month of birth. 𝑦3 = 22 ⋅ 3 = 0.57
2 −1
The exception to this would be the address because disclosing the zip code for example, even
without explicitly stating the city, province and country, is an implicit disclosure of all of these
pieces of data. As such, we define the f values for country, province, city, zip code, building
consecutively as 0.125, 0.25, 0.5, 0.75,1. This is applied to scenario 1 in Table 3.
Going back to the main scenario in the introduction where Alice is about to disclose her
address and social security number, 𝑥1 is the address and 𝑥2 is the SSN, 𝑅𝑖𝑠𝑘 = 𝑊1,𝑜𝑏𝑗 ⋅ 𝑓 (𝑥1 , 𝑦1 ) +
𝑊2,𝑜𝑏𝑗 ⋅ 𝑓 (𝑥2 , 𝑦2 ) = 1.25 ∗ 1 + 1.25 ∗ 1 = 2.5.

29
Table 3
Experimental scenarios
Scenario Data 𝑓 (𝑥𝑖 ) Risk
sensitivity value
1 Posting a FB status: The government just announced that our province of Que- Least 0.25 0.31
bec was hit the worst during the covid19 pandemic.
0.75 Location
2 Posting a tweet: I could see the accident that was reported on the news at New- Least 1.65
0.33 date of birth
street from my window at home and it was the most horrific thing I’ve seen in my
30 years of existence.
3 Sending a message to a friend: I’m expecting an email response to my job interview Low 1 2.25
but I will be visiting my parents in the countryside and the internet service is bad
there. Can you please check my email daily for me ? Here are my email and
password: xxxxxxxx xxxx
4 Phishing message on LinkedIn asking for professional email: The malicious per- Medium 0.33 1.32
son pretends to request professional email for future cooperation and the user
discloses it.
5 Responding to phishing scam email claiming to be from the Immigration, Refugees High 1 6.5
and Citizenship department asking for a full scan of the receiver’s passport. The
user sends his full passport.

The aforementioned and recorded values are non-user specific as they define the objective
values. The following module shows how the user preferences are used along with these
objective weights to personalize the nudges.

3.2.2. Personalized nudge module
This module takes as input the threat vector and the calculated 𝑅𝑖𝑠𝑘. The latter is then compared
with the user’s subjective threshold. Specifically, this threshold represents the user’s privacy
tolerance, that is how much information they are willing to share. This is an important aspect of
personalization since different users have different sharing/privacy needs. If (𝑅𝑖𝑠𝑘 < 𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑),
it means that the disclosure is within the user’s tolerance, and the process terminates. However,
if (𝑅𝑖𝑠𝑘 ≥ 𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑), it means that the risk exceeds the user’s tolerance, and the system must
notify the user and recommend (nudge) an appropriate course of action to reduce the risk below
the threshold. Another important aspect of personalization is the user preferences for each
piece of data that are called subjective weights 𝑊𝑖,𝑢𝑠𝑒𝑟 . Each data 𝑥𝑖 has an objective weight
𝑊𝑖,𝑜𝑏𝑗 and a subjective 𝑊𝑖,𝑢𝑠𝑒𝑟 .
To do so, we start by defining the set 𝑌 = {𝑥𝑖 , 𝑅𝑖𝑠𝑘 − 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ≤ 𝑊𝑖,𝑜𝑏𝑗 ⋅ 𝑓 (𝑥𝑖 )}. Specifically, 𝑌
is a set of candidates (pieces of data 𝑥𝑖 ) whose elimination alone reduces the value of objective
threat function 𝑅𝑖𝑠𝑘 below the personalized threshold. 𝑌 can either be:

• 𝑌 ≠ ∅, meaning that there exists at least one piece of data 𝑥𝑖 that, when removed, will
reduce the risk below the threshold. If multiple candidates exist, we choose the one to
delete 𝑥𝑛𝑢𝑑𝑔𝑒 using this user preference-based formula:

𝑥𝑛𝑢𝑑𝑔𝑒 = arg min (𝑊𝑖,𝑜𝑏𝑗 ⋅ 𝑓 (𝑥𝑖 ) ⋅ 𝑊𝑖,𝑢𝑠𝑒𝑟 ) (3)
1≤𝑖≤|𝑌|

• 𝑌 = ∅, meaning that there is not one single piece of data that would reduce the risk below
the threshold. In this case, the system recommends to the user to delete multiple pieces

30
of disclosed data to reach the risk tolerance level. In this case we execute the following
pseudo code:

𝑌 ←𝑋
𝑟𝑒𝑝𝑒𝑎𝑡 ∶ 𝑥𝑛𝑢𝑑𝑔𝑒 = arg min (𝑊𝑖,𝑜𝑏𝑗 ⋅ 𝑓 (𝑥𝑖 ) ⋅ 𝑊𝑖,𝑢𝑠𝑒𝑟 )
1≤𝑖≤|𝑌|
𝑋𝑛𝑢𝑑𝑔𝑒 ← 𝑥𝑛𝑢𝑑𝑔𝑒
𝑅𝑖𝑠𝑘 ← 𝑅𝑖𝑠𝑘 − 𝑊𝑛𝑢𝑑𝑔𝑒,𝑜𝑏𝑗 ⋅ 𝑓 (𝑥𝑛𝑢𝑑𝑔𝑒 )
𝑌 ← 𝑌 \ {𝑥𝑛𝑢𝑑𝑔𝑒 }
𝑢𝑛𝑡𝑖𝑙 (𝑅𝑖𝑠𝑘 ≤ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑)

𝑋𝑛𝑢𝑑𝑔𝑒 is the final output and it’s the set of data to be recommended to the user. In particular,
Alice who is highly aware of privacy menaces, has a threshold equal to 2, and the risk of her
current action is 2.5 as calculated in the previous section. If her preference for SNN is 1 and for
location is 2, in this case the nudge would be to delete SSN.
At this point, Alice has received the recommendation and the process that follows her choice
is detailed in the evaluation module section.

3.2.3. Evaluation module
This process is based on the user’s action after being recommended the personalized nudges.
The user can refuse the nudge or accept it. In the first case, the user shares the content as
is, without modifications. In the second case, the user either accepts the recommendation
and reduces the risk as proposed by the system, or might reduce the risk based on their own
discretion. All these cases are considered as a form of implicit feedback which is used to update
the values of the threshold and 𝑊𝑖,𝑢𝑠𝑒𝑟 . Both these values are user-specific parameters on which
the personalization is based.
i Begin with reevaluating the 𝑅𝑖𝑠𝑘 after the user’s action:
• User doesn’t delete any piece of data: 𝑅𝑖𝑠𝑘 remains the same
• User deletes one or more pieces of data. Supposing that the user deletes the following
Deleted_data = {𝑥𝑘 , … , 𝑥𝑗 }:
𝑗
𝑅𝑖𝑠𝑘 = 𝑅𝑖𝑠𝑘 − ∑ 𝑊𝑖,𝑜𝑏𝑗 ⋅ 𝑓 (𝑥𝑖 ) (4)
𝑖=𝑘

ii Then we update the threshold as follows:

𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 = 𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 + softsign (𝑟𝑖𝑠𝑘 − 𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑) ⋅ 𝛼, 0 < 𝛼 < 1 is the learning rate

• if 𝑅𝑖𝑠𝑘–𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 < 0, either the risk of the user’s original input was already below
the threshold or the user fully accepted the recommendation to delete the pieces of
data whose weight exceeds the threshold: −1 < softsign(𝑅𝑖𝑠𝑘–𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑) < 0. As a
result, the threshold is reduced.

31
Table 4
The result of applying this process to the user Alice
User Initial values Risk 𝑋𝑛𝑢𝑑𝑔𝑒 User’s decision
Threshold=2 Accept nudge: 𝑁 𝑒𝑤 𝑅𝑖𝑠𝑘 = 2.5 − 1.25 = 1.25
Risk > threshold
Subjective weights: 𝑊𝑖,𝑛𝑢𝑑𝑔𝑒 = 2.5 + softsign(1.25 − 2) = 1.57
Alice 2.5 Recommend deleting
SSN =1 Refuse nudge: 𝑅𝑖𝑠𝑘 = 2.5
social security number
Address =2 𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 = 2.5 − softsign(1.25 − 2) = 2.42

• if 𝑅𝑖𝑠𝑘–𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ≥ 0, the user either deleted partial pieces of data or com-
pletely ignored the recommendation and proceeded to post the input as it is.
0 < softsign(𝑅𝑖𝑠𝑘–𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑) < 1. As a result, the threshold is increased.
iii We update the weights using a similar formula, for each 𝑥𝑖 ∈ 𝑋𝑛𝑢𝑑𝑔𝑒 :
• If the user deletes 𝑥𝑖 :

𝑊𝑖,𝑢𝑠𝑒𝑟 = 𝑊𝑖,𝑢𝑠𝑒𝑟 + softsign (𝑊𝑖,𝑜𝑏𝑗 − 𝑊𝑖,𝑢𝑠𝑒𝑟 ) ⋅ 𝛼, 0 < 𝛼 < 1 is the learning rate

• If the user does not delete 𝑥𝑖 :

𝑊𝑖,𝑢𝑠𝑒𝑟 = 𝑊𝑖,𝑢𝑠𝑒𝑟 − softsign (𝑊𝑖,𝑜𝑏𝑗 − 𝑊𝑖,𝑢𝑠𝑒𝑟 ) ⋅ 𝛼, 0 < 𝛼 < 1 is the learning rate

In Table 4, this process is applied to Alice’s scenario and calculations are made for both cases
when she does and when she does not accept the nudge.
To simplify the weight update , we use 𝛼 = 1 but in the evaluation section we discuss the
impact of the value of this parameter on the system’s recommendations. Table 4 shows how the
threshold is increased when the user ignores the nudge and vice versa. Finally, the algorithms
for recommendations and updating the user preferences are detailed below.

32
Algorithm 1 Recommendation algorithm
Input: vector 𝑋, 𝑊𝑜𝑏𝑗 , function 𝑓, float 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑
𝑛
𝑅𝑖𝑠𝑘 ← 𝑅𝑖𝑠𝑘 − ∑𝑖=1 𝑊𝑖,𝑜𝑏𝑗 ⋅ 𝑓 (𝑥𝑖 )
if 𝑅𝑖𝑠𝑘 ≤ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 then
end
end if
𝑌 = {}
for all 𝑖 𝑖𝑛 [1, 𝑠𝑖𝑧𝑒(𝑋 )] do
if (𝑅𝑖𝑠𝑘 − 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ≤ 𝑊𝑖,𝑜𝑏𝑗 ⋅ 𝑓 (𝑥𝑖 )) then
𝑌 ← 𝑌 ∪ {𝑥𝑖 }
end if
end for
if 𝑌 ≠ ∅ then
𝑥𝑛𝑢𝑑𝑔𝑒 ← arg min1≤𝑖≤|𝑌| (𝑊𝑖,𝑜𝑏𝑗 ⋅ 𝑓 (𝑥𝑖 ) ⋅ 𝑊𝑖,𝑢𝑠𝑒𝑟 )
else
𝑌 ←𝑋
𝑋𝑛𝑢𝑑𝑔𝑒 ← 𝑥1
while 𝑅𝑖𝑠𝑘 ≥ 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 do
𝑥𝑛𝑢𝑑𝑔𝑒 ← arg min1≤𝑖≤|𝑌| (𝑊𝑖,𝑜𝑏𝑗 ⋅ 𝑓 (𝑥𝑖 ) ⋅ 𝑊𝑖,𝑢𝑠𝑒𝑟 )
𝑋𝑛𝑢𝑑𝑔𝑒 ← 𝑥𝑛𝑢𝑑𝑔𝑒
𝑅𝑖𝑠𝑘 ← 𝑅𝑖𝑠𝑘 − 𝑊𝑛𝑢𝑑𝑔𝑒,𝑜𝑏𝑗 ⋅ 𝑓 (𝑥𝑛𝑢𝑑𝑔𝑒 )
𝑌 ← 𝑌 \ {𝑥𝑛𝑢𝑑𝑔𝑒 }
end while
end if

Algorithm 2 User preference adaptation algorithm
Input: Set Deleted_data, 𝑋𝑛𝑢𝑑𝑔𝑒 , float 𝑅𝑖𝑠𝑘, 𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑, 𝛼, Vector 𝑊𝑢𝑠𝑒𝑟 , 𝑊𝑜𝑏𝑗
for all 𝑥𝑖 𝑖𝑛 Delete_data do
𝑅𝑖𝑠𝑘 ← 𝑅𝑖𝑠𝑘 − 𝑊𝑖,𝑜𝑏𝑗 ⋅ 𝑓 (𝑥𝑖 )
𝑊𝑖,𝑢𝑠𝑒𝑟 ← 𝑊𝑖,𝑢𝑠𝑒𝑟 + softsign(𝑊𝑖,𝑜𝑏𝑗 − 𝑊𝑖,𝑢𝑠𝑒𝑟 )
end for
for all 𝑥𝑖 𝑖𝑛 𝑋𝑛𝑢𝑑𝑔𝑒 \Deleted_data do
𝑊𝑖,𝑢𝑠𝑒𝑟 ← 𝑊𝑖,𝑢𝑠𝑒𝑟 − softsign(𝑊𝑖,𝑜𝑏𝑗 − 𝑊𝑖,𝑢𝑠𝑒𝑟 )
end for
𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 ← 𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 + softsign(𝑅𝑖𝑠𝑘 − 𝑇 ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑)

4. Evaluation
In order to evaluate the proposed approach, a proof-of-concept experiment was designed.
The main goal of the evaluation is to study how the user’s choices impact the process of
personalization. The criterion for testing the quality of personalization is how fast does the
initial threshold converge to the user’s actual threshold as in his actual preferences. The

33
Table 5
Updated threshold depending on the user, scenario and learning rate.
scenario Risk value 𝛼 John Bob Anna Sam Catherine
Initial threshold 1 4.25 6 7.5 8.75
Actual threshold 2 3 8 5 6
1 0.31 0.5 1.29 3.84 6.17 7.06 8.30
3 2,25 0.5 1.33 3.44 6.34 6.62 7.87
7 6,5 0.5 1.21 4.40 6.11 6.91 7.87

simulated data represents all types of users from low awareness, medium to high. One of the
main concerns when using recommender systems is the cold start problem in which the user
has just been introduced to the platform and his preferences are unknown. To overcome this
and set initial threshold values to kickstart the recommender system-based awareness raising
process, the user’s preferences are estimated though a short survey. Essentially, it consists of a
straightforward process where the individual estimates how much their own assets are worth
to them on a scale of 1 to 10. These values as given by the user are averaged and considered to
be the initial subjective weights. This is also used to compute the initial threshold which equals
10 − 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 − (𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑖𝑣𝑒 𝑤𝑒𝑖𝑔ℎ𝑡𝑠). After doing this, the system has the initial threshold and as
mentioned previously, the system is judged on how fast this converges to the actual threshold
The next step is to see how these initial thresholds evolve over time. This depends on the
scenario that defines the objective risk value, the system’s set learning rate 𝛼 and the user’s
responses to the recommendations. We select 3 scenarios out of a total of 20 conducted tests
that have a significant objective risk values to demonstrate the different outcomes depending
on the user.
In Table 5, users’ decisions are simulated as follows:
• Users who have a low threshold value are the most concerned about their privacy and
the most likely to accept to delete data as recommended by the system. The higher
the risk the more likely they are to accept the nudge. For example, user 3 refused the
recommendations with a 25% probability, accepted them with a 75% probability.
• Users with medium thresholds (Bob and Anna) accept recommendations with a probability
closer to 50%.
• Users with high threshold are the least conscious about their privacy and as a result
they refuse the recommendations with a 75% probability and accept them with a 25%
probability.
In each iteration (scenario), the risk value is evaluated (values taken from Table 3) and
compared to the personal threshold which is initialized based on the survey filled by each
user. This comparison allows the system to determine which type of nudge it needs to make:
recommend modification or encourage the post as it is if the risk does not surpass the personal
threshold. Example: in scenario 7, Sam whose threshold is 6.62 after the previous scenario
is encouraged by the system to reduce the disclosure but upon his refusal, the threshold is
increased as it corresponds to his tolerance to risk. The threshold goes from 6.62 to 6.91 which
gets the user closer to their actual threshold 5.

34
For the same scenario, John who displays the highest aversion to risk, accepts the recommen-
dation which results in his threshold going down from 1.33 to 1.21 which is further from the
actual threshold which is equal 2. However, that can be explained by the fact that the risk is 6.5
which is high, and a highly aware person is likely to accept to reduce the risk. The threshold
converges more quickly with low risk scenarios.
Another important thing worth pointing out is that 𝛼 is an experimental value that is charac-
teristic of the system and not the users. Meaning that once we set its value, the thresholds of all
users are computed using the same 𝛼 value. In our evaluation, we tested 𝛼 values between 0.1
and 0.9 and compared the impact of each on the convergence of the threshold. A good choice
of 𝛼 is imperative. A large value for 𝛼 would introduce a considerable change, and this would
cause the threshold to exceed and miss the targeted value. A small value of 𝛼 would introduce
very small changes to the threshold, and that is not desirable as well, since it will take longer to
converge to the targeted value. So, if the threshold difference from one iteration to the next is
negligible then the awareness-raising purpose.
We tested the outcome with values of 𝛼 from 0.1 to 0.9 in increments of 0.1. For example,
Catherine whose initial threshold is 8.75 , has given the address the highest score during the
quiz so he might be inclined to eliminate his province after second thought. with a value α
=0.9, Catherine’s new threshold is 7.94 which is a 0.81 reduction from the earlier value. This
seems like an improvement in terms of the user’s awareness. However, it is too optimistic as
in fact, this user has a minimal awareness of all of her assets. So, the next time we nudge her,
unless the risk is address-related, we predict that she would ignore the system. In conclusion,
maximizing the threshold reduction can have a negative impact because large jumps result
in frequent unwelcomed nudges that do not reflect the user’s preferences. Similarly to this,
fixing 𝛼 = 0.1 results in 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 = 8.67 for the same user Catherine, after she performs the
same action. This is not a good choice either because a user with a high threshold such as
Catherine would almost never receive nudges which counters the privacy preserving purpose
of the platform. After several tests, 𝛼 was set to 0.5, as it provides an optimal value that has
shown to converge to the user’s preferences more accurately and quickly, with Low (0.31) ,
Medium (2.25) and High (6.5) risk levels alike.
Finally, in Table 6, a comparison is drawn between this paper’s proposition and existing work
on nudges for privacy preservation.
In this section, we have evaluated the platform based on users with different initial thresholds
and it is then compared in terms of mechanism with existing approaches.

5. Conclusion and Future Work
Today, the proliferation of social networking sites, their reachability, ease of use as well as their
integration within many aspects of our daily activities, has given rise to an unprecedented level
of self-disclosure activity. Various solutions exist to help the user navigate the maze of online
self-disclose, however, such systems focus mainly either on the data being disclosed (ignoring
the user’s needs and preferences) or on the user (ignoring the data being disclosed). As such,
this paper presents a new platform that balances the risks of self-disclosure and the user’s
sharing needs and privacy preferences. It is a user-centric proposition based on a personalized

35
Table 6
Comparing our approach with existing work.
AppOps [25] Tailored privacy Nudge Me Right [27] Our approach
nudges [26]
Personal data location Scenario-based password 12 types of data with
where users are varying degrees of
given scenarios and sensitivity
choose whether they
disclose or not
Nudge mechanism The nudge offers pre- 2 nudges correspond- The user is nudged The user is com-
defined responses ing to disclosure op- to strengthen their pletely free to
for the user to tions password is it is delete the data
choose with the deemed weak and as recommended
highlighted best the user is free to by the system or
choice enter a new pass- another piece of
word and have it information, or do
tested again. (no nothing, the risk is
predefined choices) recalculated. (no
predefined choices)
Personalized no yes yes yes

risk-aware recommender system. Users are guided towards privacy preservation through
nudges tailored according to their perception to maximize the probability of accepting the
recommendation. The contributions of this work can be summarized into two main aspects.
First, this work provides an objective approach to evaluate the risk of various types of user
data. Second, this work provides a novel method of modeling and evaluating the user’s privacy
preferences at two different levels, one for a general disclosure (the threshold), and the second
more specific (𝑊𝑖,𝑢𝑠𝑒𝑟 ), based on the different types of data being disclosed. Finally, an evaluation
of the personalization component of the recommender system is performed, to validate how
well the system adapts and converges to the user’s preferences.
As for future works, various aspects still require further development. First, currently, the
recommender system considers each disclosure on its own and keeps track only of the user
preferences. This can represent a threat for users who small nuggets of information, one at a
time. These might not trigger the recommender system and go undetected. As such, the risk
evaluation function proposed in this work should also incorporate other aspects from previous
disclosures by the user. Second, both the risk and the user preference, in reality, are very
dependent on the sharing environment as well as the people involved. As such we intend to
investigate further how to incorporate these aspects within the objective evaluation of risk, and
the personalized recommendation and nudges. Also, the language model needs implementation
and more development to ensure both the understanding of the user input and the generation
of the nudge. Finally, it would be interesting to explore how one can extend this approach to
other domains, such as fake news. Specifically, an approach that would combine an objective
evaluation of the post with the user’s preferences and cognitive biases to nudge them away
from fake news.

36
References
[1] A. Smith, M. Anderson, Social media use in 2018: Pew research center; 2018, Available
from: htpp://www. pewinternet. org/2018/03/01/social-media-use-in-2018/.[Last accessed
on 2018 May 20] (2019) 1–17.
[2] A. Acquisti, L. Brandimarte, G. Loewenstein, Privacy and human behavior in the age of
information, Science 347 (2015) 509–514.
[3] A. Acquisti, Privacy in electronic commerce and the economics of immediate gratification,
in: Proceedings of the 5th ACM Conference on Electronic Commerce, EC ’04, Association
for Computing Machinery, New York, NY, USA, 2004, p. 21–29.
[4] S. Barth, M. D. de Jong, The privacy paradox – investigating discrepancies between
expressed privacy concerns and actual online behavior – a systematic literature review,
Telematics and Informatics 34 (2017) 1038–1058.
[5] I. Altman, D. A. Taylor, Social penetration: The development of interpersonal relationships.,
Holt, Rinehart & Winston, 1973.
[6] S. M. Jourard, The transparent self: Self-disclosure and well-being, volume 17, Van Nostrand
Princeton, NJ, 1964.
[7] J.-P. Laurenceau, L. F. Barrett, P. R. Pietromonaco, Intimacy as an interpersonal process:
The importance of self-disclosure, partner disclosure, and perceived partner responsiveness
in interpersonal exchanges, Journal of personality and social psychology 74 (1998) 1238–51.
[8] R. Zhang, The stress-buffering effect of self-disclosure on facebook: An examination of
stressful life events, social support, and mental health among college students, Computers
in Human Behavior 75 (2017) 527–537.
[9] N. N. Bazarova, Y. H. Choi, Self-disclosure in social media: Extending the functional
approach to disclosure motivations and characteristics on social network sites, Journal of
Communication 64 (2014) 635–657.
[10] K. Greene, V. J. Derlega, A. Mathews, Self-Disclosure in Personal Relationships, Cambridge
Handbooks in Psychology, Cambridge University Press, 2006, p. 409–428.
[11] E. Aïmeur, N. Díaz Ferreyra, H. Hage, Manipulation and malicious personalization: Explor-
ing the self-disclosure biases exploited by deceptive attackers on social media, Frontiers
in Artificial Intelligence 2 (2019) 26.
[12] V. J. Derlega, J. Grzelak, Appropriateness of self-disclosure, in: G. J. Chelun (Ed.), Self-
disclosure: Origins, Patterns, and Implications of Openness in Interpersonal Relationships,
Jossey-Bass, 1979, pp. 151–176.
[13] S. M. Jourard, P. Lasakow, Some factors in self-disclosure, The Journal of Abnormal and
Social Psychology 56 (1958) 91–8.
[14] S. Trepte, P. Masur, Need for privacy, in: V. Zeigler-Hill, T. Shakelford (Eds.), Encyclopedia
of personality and individual differences, Springer, 2020.
[15] A. F. Westin, Special report: Legal safeguards to insure privacy in a computer society,
Communications of ACM 10 (1967) 533–537.
[16] M. H. Millham, D. Atkin, Managing the virtual boundaries: Online social networks,
disclosure, and privacy behaviors, New Media & Society 20 (2018) 50–67.
[17] S. Petronio, W. T. Durham, Communication privacy management theory: significance for
interpersonal communication, SAGE Publications, Inc., 2008, pp. 309–322.

37
[18] T. Dienlin, M. J. Metzger, An extended privacy calculus model for SNSs: Analyzing
self-disclosure and self-withdrawal in a representative u.s. sample, Journal of Computer-
Mediated Communication 21 (2016) 368–383.
[19] E. L. Spottswood, J. T. Hancock, Should I share that? prompting social norms that
influence privacy behaviors on a social networking site, Journal of Computer-Mediated
Communication 22 (2017) 55–70.
[20] L. Baruh, E. Secinti, Z. Cemalcilar, Online privacy concerns and privacy management: A
meta-analytical review, Journal of Communication 67 (2017) 26–53.
[21] M. Büchi, N. Just, M. Latzer, Caring is not enough: the importance of internet skills for
online privacy protection, Information, Communication & Society 20 (2017) 1261–1278.
[22] E. G. Smit, G. Van Noort, H. A. Voorveld, Understanding online behavioural advertising:
User knowledge, privacy concerns and online coping behaviour in europe, Computers in
Human Behavior 32 (2014) 15–22.
[23] A. Bandura, Social foundations of thought and action: a social cognitive theory, Prentice-
Hall series in social learning theory, Prentice-Hall, 1986.
[24] H. Chen, W. Chen, Couldn’t or wouldn’t? the influence of privacy concerns and self-
efficacy in privacy management on privacy protection, Cyberpsychology, behavior and
social networking 18 1 (2015) 13–9.
[25] H. Almuhimedi, F. Schaub, N. Sadeh, I. Adjerid, A. Acquisti, J. Gluck, L. F. Cranor, Y. Agar-
wal, Your location has been shared 5,398 times! a field study on mobile app privacy
nudging, in: Proceedings of the 33rd Annual ACM Conference on Human Factors in
Computing Systems, CHI ’15, ACM, New York, NY, USA, 2015, p. 787–796.
[26] L. Warberg, A. Acquisti, D. Sicker, Can privacy nudges be tailored to individuals’ decision
making and personality traits?, in: Proceedings of the 18th ACM Workshop on Privacy in
the Electronic Society, WPES’19, ACM, New York, NY, USA, 2019, p. 175–197.
[27] E. Peer, S. Egelman, M. Harbach, N. Malkin, A. Mathur, A. Frik, Nudge me right: Personal-
izing online security nudges to people’s decision-making styles, Computers in Human
Behavior 109 (2020) 106347.
[28] H. Masaki, K. Shibata, S. Hoshino, T. Ishihama, N. Saito, K. Yatani, Exploring nudge designs
to help adolescent sns users avoid privacy and safety threats, in: Proceedings of the 2020
CHI Conference on Human Factors in Computing Systems, CHI ’20, ACM, New York, NY,
USA, 2020, p. 1–11.
[29] H. Wang, S. Ma, H.-N. Dai, M. Imran, T. Wang, Blockchain-based data privacy management
with nudge theory in open banking, Future Generation Computer Systems 110 (2020)
812–823.
[30] B. Liu, M. S. Andersen, F. Schaub, H. Almuhimedi, S. Zhang, N. Sadeh, Y. Agarwal, A. Ac-
quisti, Follow my recommendations: A personalized privacy assistant for mobile app
permissions, in: SOUPS, 2016.
[31] C. B. Jackson, Y. Wang, Addressing the privacy paradox through personalized privacy
notifications, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous
Technologies 2 (2018).
[32] [Online] The Dark Web & Your Data: Facts to Know, 2020, Accessed: 2020-07-28. URL: https:
//www.transunion.com/blog/identity-protection/the-dark-web-your-data-facts-to-know.
[33] [Online] The Dark Web explained - what does it mean for online security?, 2020,

38
Accessed: 2020-07-28. URL: https://www.equifax.co.uk/resources/identity_protection/
dark-web-explained.html.
[34] [Online] Your SSN costs less than a Starbucks coffee on the dark
web, 2020, Accessed: 2020-07-28. URL: https://atlasvpn.com/blog/
your-ssn-costs-less-than-a-starbucks-coffee-on-the-dark-web.
[35] [Online] Dark Web: The Average Cost of Buying a New Identity in 2020,
2020, Accessed: 2020-07-28. URL: https://www.safetydetectives.com/blog/
dark-web-the-average-cost-of-buying-a-new-identity/.
[36] [Online] How Cybercriminals Make Money, 2020, Accessed: 2020-07-28. URL: https://www.
keepersecurity.com/en_GB/how-much-is-my-information-worth-to-hacker-dark-web.
html.
[37] [Online] Internet Security Threat Report, 2020, Accessed: 2020-07-28. URL: https://docs.
broadcom.com/doc/istr-24-2019-en.
[38] [Online] Dark web market price for stolen credentials 2019, 2020, Ac-
cessed: 2020-07-28. URL: https://www.statista.com/statistics/1007470/
stolen-credentials-dark-web-market-price/.
[39] [Online] How Much Is Your Identity Worth on the Black Market?, 2020, Accessed: 2020-
07-28. URL: https://www.pacetechnical.com/much-identity-worth-black-market/.