=Paper= {{Paper |id=Vol-1830/Paper17 |storemode=property |title=Data Loss Prevention and Challenges Faced in their Deployments |pdfUrl=https://ceur-ws.org/Vol-1830/Paper17.pdf |volume=Vol-1830 |authors=Victor O. Waziri,Ismaila Idris,John K. Alhassan,Bolaji O. Adedayo }} ==Data Loss Prevention and Challenges Faced in their Deployments== https://ceur-ws.org/Vol-1830/Paper17.pdf
                     International Conference on Information and Communication Technology and Its Applications
                                                            (ICTA 2016)
                                                    Federal University of Technology, Minna, Nigeria
                                                                   November 28 – 30, 2016




              Data Loss Prevention and Challenges Faced in their Deployments


                       Victor O. Waziri, Ismaila Idris, John K. Alhassan, and Bolaji O. Adedayo
                  Department of Cyber Security Science, Federal University of Technology, Minna, Nigeria
                                             victor.waziri@futminna.edu.ng

Abstract—The technology world has greatly evolved over the              data loss and it is hard to know the extent or amount of data
past three decades and it is at a pace where an average user’s          loss that has occurred.
laptop can accommodate up to a terabyte of data, where a tiny               There have been some notable data loss incidents in
SD card can store an entire database of an organization, where          recent years that has cost organizations millions of dollars in
file transferring has become less complex, and where users can          the process. An estimated forecast indicates that an average
easily connect to any wireless network (Private or Public)              cost of a data loss will be over $150 million by 2020 and a
within the range of their wireless devices to exchange sensitive        global annual cost forecast to be $2.1 trillion [4].In March
information. This evolvement has led to one of the greatest             2016, LulsZec Philippines uploaded COMELEC’s entire
challenges organizations are faced with, which is in the area of
                                                                        database on Facebook, after their website had been
adequately protecting their sensitive information from being
lost or leaked. Data Loss Prevention (DLP) techniques was
                                                                        hacked[5], while in October 2015, TalkTalk a British
created in preventing these breaches on data loss, when these           telecommunications provider suffered a data loss of over 4
breaches occur in an organization. DLP systems has gained               million of their customer’s details, thereby causing their
popularity over the last decade and is now referred as a                stock to fall drastically[6]. In February 2015, over 80 million
matured technology, and with the alarming rate at which                 records were lost due to data loss in Anthem, these records
digitally stored assets is growing, the need for DLP systems has        included social security numbers and very sensitive
also increased. This paper discusses some of DLP concepts and           information[7]. Adobe Systems revealed in October 2013
trends, as well as the some of the challenges these various DLPs        that there was a data loss of over 130 million user records to
face and proffer a solution for a successful implementation.            a hack group due to insider assistance.These kinds of
                                                                        incidents has caused organizations major financial losses,
   Keywords-Data loss prevention; Data loss; Data protection;           damages to their reputation, loss of their customer
Data security                                                           confidence, legal prosecution, productivity and morale of
                                                                        employee and loss of business opportunities[8].
                                                                            One of the biggest challenges in mitigating data loss, is
                                                                        that there are so many reasons attributed to data loss in an
                      I.   INTRODUCTION                                 organization and there is no tool or a simple solution that
    Data loss can be defined as the unauthorized transfer of            adequately address these various data losses. However to be
sensitive or confidential information about an organization             able to address the risks faced, a solution must be developed
from a workstation or from the organization data center to              to incorporate the causes of data loss, which are can be
the outside world or to an untrusted environment. This can              classified as people, processes and technology[9].
be achieved through various channels of communications or                    People: Data loss can be caused by people through
by using storage devices or simply by memorizing the                             their lack of awareness of the security issues relating
information displayed on the screen [1].These information                        to sensitive information that are to be securedand
could be either a regular data (Debit card data, Bank                            most times are not been accountable for protecting
verification number and Health care data) or organization                        these information.
secrets (Financial information, intellectual property and trade              Process: The process of securing these sensitive
secrets) [2].                                                                    information can be caused by inadequate data usage
    Over the last decades there has been major data loss with                    policies, no proper data transmission process and
serious impacts on organizations and this losses is on an                        lack of data monitoring usage.
increase in recent years. According to DataLossDB [3], their                 Technology: Lack of flexibility and communication
report shows that2015 surpassed the year 2012 all-time                           platform in technology deployed for the protection
record, for the number of reported data loss incidents                           of data, makes it difficult for the user, thereby
worldwide. Over 736 million records were exposed in the                          making the user to look for an alternative.
3,930 reported incidents in 2015. An estimate of 50% out of
those records experienced data loss in the business sector,                As data loss is one of the major problems been faced by
20% in the government sector and the remaining 30%                      organization and if not properly managed can cost the
occurring in the education and health sectors. It is also               organization millions in terms of finance. This problem can
important to mention that private users are also victims of             be mitigated by using various types of Data Loss Prevention
                                                                   90
                                             International Conference on Information and Communication Technology and Its Applications (ICTA 2016)

(DLP) methods and techniques. DLP can be defined as a                       task), they can be integrated to support other technologies
system, which is designed to detect and prevent any potential               like identity access management or encryption. The table 1
data breach both intentionally or unintentionally[10].Most                  summarizes the features for each of this DLP vendors and
organizations combine two or more DLPs to effectively                       figure 2 also shows the performance based on these features.
control the potential data loss they might be faced with.
DLPsystems differs from the conventional security as it has                           TABLE I.         DATA LOSS PREVENTION PRODUCT MATRIX
the ability to analyze the content of the confidential data and
the context surrounding those data and it also has the ability                        Vendor              A   B   C    D   E      F   G   H   I   J
to protect those confidential data in all data states.                      Mobile/tablet                 X   X   ✓    ✓   X      X   ✓   ✓   X   ✓
    A basic DLP system consist of three stages which include                Laptop/Desktop/Workst
discover, monitor and protect[11]. This stages are vital in                 ation                         ✓   ✓   ✓    ✓   ✓      ✓   ✓   ✓   ✓   ✓
setting up an effective DPL system. The discovery stage                     Local network                 X   ✓   X    X   ✓      X   X   X   X   X
locates where your confidential data are been stored, by takin
a detailed inventory of this classified data and then                       Server                        ✓   ✓   ✓    X   ✓      ✓   X   ✓   X   X
regrouping these sensitive data in terms of priorities. In the              Cloud/SaaS                    ✓   ✓   ✓    ✓   ✓      X   ✓   ✓   X   ✓
monitoring stageit monitors how the confidential data are
used, by understanding the content and context of this                                                  Detection technologies
sensitive data and by analyzing when a breach occurs. The                   biometric signatures          X   X   X    X   ✓      X   X   X   X   X
last stage which is the protect stage, basically describes the
                                                                            classification                ✓   ✓   ✓    X   ✓      X   X   X   X   X
ways for protecting data loss and this is done by been
proactive in protecting these confidential data or by                       context analysis              ✓   X   ✓    ✓   ✓      X   ✓   X   X   X
enforcing the data loss policies created.                                   data matching                 X   ✓   X    X   X      X   X   X   ✓   ✓
                                                                            flagging                      X   ✓   X    X   X      X   X   X   X   X
                                                                            dictionaries/lexicons         X   ✓   X    X   ✓      X   X   X   X   X
                                                                            data discovery                ✓   ✓   ✓    X   X      ✓   ✓   ✓   ✓   ✓
                                                                            file type
                                                                            detection/classification      X   X   ✓    ✓   ✓      X   X   ✓   X   ✓
                                                                            machine
                                                                            learning/pattern
                                                                            recognition                   X   ✓   X    X   X      X   ✓   ✓   X   ✓
                                                                            Optical Character
                                                                            Recognition                   X   X   X    ✓   ✓      X   X   X   X   X
                                                                            regular
                                                                            expressions/pattern
                                                                            matching                      X   ✓   X    X   ✓      ✓   X   ✓   X   ✓
                                                                                                       Enforcement technologies
                                                                            block                         ✓   ✓   ✓    ✓   ✓      ✓   ✓   ✓   ✓   ✓
     Figure 1. The different Data States and DLPs functionalities
                                                                            encrypt                       ✓   ✓   ✓    X   ✓      X   X   ✓   ✓   ✓
    For a DLP system to be effectively deployed in an
                                                                            fingerprinting                ✓   ✓   ✓    X   ✓      X   X   X   X   ✓
organization, the data life cycle of the organization is
considered. A data life cycle is a detailed outline of the                  move/remove                   ✓   ✓   ✓    X   X      X   X   X   X   X
phases involved in effectively preserving and managing of                   notify/alert                  ✓   ✓   ✓    X   X      ✓   X   ✓   ✓   X
data to be used and reused. This stages include data at rest
(data in storage), data in use (data flowing through internal               quarantine                    ✓   ✓   X    X   ✓      X   ✓   ✓   ✓   ✓
network) and data in transit (data that are been accessed).                                              Software integration
Figure 1 shows a summary of these phases and how the                        Databases (e.g. SQL
DLPs prevents data loss at those phases. In the protection of               Server)                       X   ✓   X    X   ✓      X   ✓   X   X   ✓
targeted data, DLPs can take many forms during its                          Email client                  X   ✓   ✓    ✓   ✓      ✓   ✓   ✓   ✓   ✓
deployment, which are mostly based on the data state [12].
                                                                            file sharing                  X   ✓   X    X   ✓      ✓   ✓   X   ✓   ✓
               II.    COMPARISON OF DLP TOOLS                               instant messaging             X   ✓   X    X   ✓      ✓   X   ✓   ✓   ✓
    In evaluating some of these tools designed by various                   web 2.0                       X   ✓   X    ✓   ✓      ✓   X   X   ✓   ✓
vendors, we were able to make a comparison for some of the
                                                                            webmail                       X   ✓   X    X   X      ✓   ✓   ✓   X   X
top DLP tools from various security vendors such as CA
Technologies (A), Code Green Networks (B), Digital                                                      Hardware integration
Guardian (C), Forcepoint (D), McAfee (E), Palisade Systems                  CD/DVD                        X   X   X    X   ✓      ✓   ✓   ✓   X   ✓
(F), RSA (G), Trend Micro (H), Trustwave (I) and Symantec
(J), it was observed that these security vendors offer                      external/removable HD         X   ✓   ✓    X   ✓      X   X   ✓   X   X
protection for various data states. Though some of these tools              Printer                       X   X   X    X   ✓      X   ✓   ✓   X   X
are specialized in their design (don’t perform other security
                                                                            USB drives                    X   ✓   X    ✓   ✓      ✓   ✓   ✓   X   ✓
                                                                       91
                                                     International Conference on Information and Communication Technology and Its Applications (ICTA 2016)


wireless devices               X     ✓   X       X   X   X       X       X   X   X
                                                                                          Some of them will require a great number of techniques and
                                                                                          monitoring to adequately secure them.
Monitoring
centralized                    X     ✓   ✓       ✓   ✓   ✓       ✓       ✓   X   ✓
offline                        X     ✓   X       ✓   ✓   X       X       X   X   X
real-time                      X     X   ✓       ✓   ✓   ✓       ✓       ✓   ✓   ✓




            77.78%                 75.00%
                                                   58.33%    58.33%
                       47.22%                 47.22%
                                         41.67%
  33.33%                    36.11%                      33.33%




      A           B      C    D      E       F       G       H       I       J                       Figure 3. Some common data leaking channels

                                                                                               Sensitive data that are either ‘at rest’ or ‘in use’ can be
                 Figure 2. Overall performance of the DLP vendors                         compromised using these channels, which could include
                                                                                          USB ports, CD/DVD drives, printed documents and even
    Though this doesn’t necessary imply that Code Green
                                                                                          through web services. Though data leakages can be mitigated
Networks has the best DLP system, it simply means it
                                                                                          using host DLPs for CD/DVD drives and USB port channels,
compensates in the areas where it lacks. When choosing the
                                                                                          it isn’t adequate enough to prevent data leakages from other
overall DLP vendor for your organization, it is ideal to check
                                                                                          channels such as Instant Messaging (IM) and emails that are
the features that best suites the implantation, such as the ease
                                                                                          always made available [15]. Even when the access right are
of installation, scalability, control features as well as its
                                                                                          restrict to confidential data, some of these data can still be
maintenance.
                                                                                          still be accessed in a printable format. While channels like
                                                                                          file sharing and web services associated with data ‘in transit’
          III.        CHALLENGES ON DLP SYSTEMS DEPLOYMENT                                has been one of the biggest challenges in migrating, as these
    In protecting sensitive data from loss, DLP systems faces                             channels cannot be blocked and the serve as the backbones
many challenges and like other security mechanism these                                   of the organization in terms of data exchange. To effectively
challenges can render the system ineffective. In a review                                 maintain a maximum security in these channels, an intensive
conducted by researchers in the area of both industrial and                               filtering traffic is to be done. The DLP system to be deployed
academic DPL systems, it was discovered that there were                                   should always try to create a balance in security without
seven common challenges been identified[13]. These are                                    affecting the interconnectivity in these channels.
Leaking Channels, The Human Factor, Access Rights,
Encryption and Steganography, Data Modification,
Scalability and Integration, and Data Classification. For an                              B. The Human Factor
effective DLP system to be implemented, these various                                         Humans are generally a complex being, as their
challenges must be addressed. In the following sections we                                behaviors and motives are usually hard to predict or to
will discuss those challenges faced and try to suggest                                    determine, as they are been influenced by many factors,
possible solution for each of them.                                                       which could be psychological or sociological. Decision
                                                                                          makings such as granting of access to a set of users, defining
A. Leaking Channels                                                                       the confidentiality level of a data and setting a threshold
    Everyday there is need to share and access data between                               level for DLP systems, is basically affected by human
different medium and users, and this done with the assistance                             actions. It must also be noted here that, even when
of intermediate channels. In an ideal scenario these channels                             organization’s security policies are in place to mitigate such
are used to legitimately exchange data from one end to                                    data loss, it doesn’t mean it is guarantee to tackle the
another, however these channels can also create a major treat                             problem. Almost all human interactions with data occurs at
in the leakage of sensitive data. These channels cannot be                                the data ‘in use’ state, which simply means the user needs an
totally blocked, as it is an important aspect in the sharing of                           endpoint terminal to access this confidential datafor there to
data, which requires some or even all these channels to be                                be data leakages [16]. Though a typical DLP system will
open. As technology keeps growing at a fast pace and more                                 tend to put some restrictions on data leakages by the user, in
channels becoming available, it has become hard to keep                                   the form of disabling some aspect of the system, such as
pace with securing these channels[14]. The figure 3 shows                                 CD/DVD drives, USB ports and removable drives. But this
some of the commonly used channels used for data                                          user restrictions can easily be bypassed, by the sharing of
exchange, as not all these channels are very easy to secure.                              access rights by users either intentionally (by trust) or
                                                                                     92
                                        International Conference on Information and Communication Technology and Its Applications (ICTA 2016)

unintentionally (by social engineering), there by                      E. Data Modification
compromising the security of the confidentiality of that data              The design of some DLP systems are created to compare
that would be accessed. Users can make use of mobile                   the original sensitive data and inspected traffic flowing
gadgets to snap pictures of sensitive information or even use          through the system by using data signatures and patterns to
hidden cameras to record the entire classified documents and           achieve prevention of data leakages. In this system, detection
transmit it remotely. The human factor will always be a                occurs whenever there is a signature and patterns match to
major challenge when deploying a DLP systems, as long as               that of the confidential data or when there is a high
in is human interactions with the system[17].                          percentage of similarity to the confidential data. The major
                                                                       challenge of this system design is that confidential data are
C. Access Rights                                                       mostly not sent in a form that will enable the system detect
    Access right has always been a key feature in the                  such modification. Data can be modified using various type
deployment of any security mechanism including DLP                     of techniques, which are readily available online. These
systems. Therefore it is of great importance to be able to             confidential documents can be easily modified by removing
categorize these access rights properly and to be able to              some vital lines in the documents or adding to it, thereby
separate each category of users from each other based on               creating an entire different document before sending such
their level of permission. DLP systems won’t be able to                documents over the allowed channels. The user can also
prevent illegitimate users from accessing confidential                 entirely change the structure or format of the document,
information, if there is not a proper categorization of access         thereby rendering the documents undetectable from the DLP
right in place. Access rights is of great importance in                systems [21].
preventing data loss in an organization and should always be               In some other design DLP systems uses data hashing in
updated regularly[18]. As an obsolete access right can have a          analyzing the outgoing traffic by comparing the values
huge impact on the entire system negatively, thereby making            (including SHA1 and MD5) with the original confidential
the system vulnerable to data loss. For instance, when a user          data. The moment these two values matches each other, then
is downgraded or dismissed from the organization and the               detection of data leak occurs. The problem with hashing
access rights are not updated according, it leaves the system          design, is it becomes ineffective the moment the confidential
vulnerable to data leakages, as the DLP system won’t be able           documents is extensively modified, which in turns gives a
to detect any data breach when that user tries to access               different hash value [22].
information he/she is not permitted to access.
                                                                       F. Scalability and Integration
    A data leak can also occur by a legitimate user with the
right access rights either intentionally (due to so many                   The volume data processed can affect the performance of
factors such as financial gain or whistle blower) or                   any security mechanism deployed in securing an
accidentally. For efficiency, a DLP system should be able              organization’s assets. DLP systems can also be a victim of
maintain and control access rights of the organization while           such challenges, which means when deploying them either in
also performing the function of protection of data from                a host, network or storage section, it should be effective in
intentional and unintentional leakages.                                performing its function and smoothly incorporated into the
                                                                       system without affecting or causing delay in the entire work
                                                                       flow of the organization’s system. Therefore factors affecting
D. Encryption and Steganography                                        the scalability of a DLP systems such as its computational
    Encryption is another major challenge been faced by                ability and analyzation techniques should be considered
network based DLP systems, as these systems uses different             when deploying the system [23].
forms of analytical techniques in identifying copies of the                There is usually some challenges faced when integrating
sensitive data and comparing it with the original data that are        the DLP systems during its deployment, as there are similar
been classified as confidential. But with complex encryption           function already been handled by other security mechanism
of the confidential data by the user it makes it hard for the          like firewalls and intrusion detection. Therefore before
DLP system to be able to analyze such data content, thereby            deployment, the entire system must be carefully analyzed
creating a major vulnerability in the system. The implication          and implemented to give an effective performance. As there
of this, is that a confidential document can bypass the DLP            shouldn’t be repetition of functions, as having two similar
system detection mechanism when the user encrypts the                  function on the system can cause a delay in the entire process
documents, thereby allowing the user to be able to send the            of the system, thereby reducing the performance of the
confidential document through his/her email as an                      system.
attachment[19]. Stenography is another type of challenge
similar to encryption, but it is more challenging to mitigate          G. Data Classification
and even impossible to detect when used by the user. The                   Data classification is the process of organizing data into
user uses stenography tools to hide classified documents               categories or levels for an effective and efficient use [24].
within other media, these media could be digital photos,               This definition implies that, DLP systems rely entirely on
audio files and video files. This becomes a challenge to DLP           well-defined data classification to enable the system
system as it won’t be able to detect the confidential data             differentiate confidential data from normal data. The main
inside those media[20]. In some instances a document can be            purpose of classification of data is in determining the
compressed, or converted to a different format, thereby                baseline of security controls to be used in safe guarding data.
making the system unable to detect such documents, as it               There are different ways by which data can be classified
won’t be able to analyze such documents.                               based on the organization classification, with terms like

                                                                  93
                                                International Conference on Information and Communication Technology and Its Applications (ICTA 2016)

confidential, secret and top secret been used by the military                  systems, data controlling systems as well as transaction
to classify their data, while for an Institution data can be                   systems. These systems are unique and separated by their
classified as restricted data, private data and public                         functionality. Take for an example, the antivirus system
data[25]. With this classification it becomes easy in                          cannot perform encryption of data but it works perfectly in
identifying those confidential data, thereby making the                        the monitoring of the data source code. For this reasons
system adequately equipped to protect those confidential                       corporate organization will require many types of security
data. However the problem with data classification is                          systems in their protection of data. As these security system
determining the level of secrecy of those sensitive data. For                  have specific functions, this makes the DLP system having
there to be a proper classification of these secrecy levels, the               more edge over the rest, as it has the ability to perform
owner of the data for protection should be the one                             various functions. This reduces the cost of purchase a lot of
responsible for this classification process. However, most                     security systems in the monitoring of the various security
times the classification process is left to those people who                   gaps. The table 2 summarizes the features of a DLP when
don’t have enough knowledge about all the data. This creates                   compared to similar protection system (Intrusion Prevention
a vulnerability in the DLP system, as those who are not                        System (IPS) and Firewall System).
permitted to see certain information are now equipped to
having access to such confidential information. It is therefore
important to properly have a good classification, as without it                        V.    THE WAYS DLP SYSTEMS ANALYZES DATA
the DLP system becomes ineffective.                                            Though there are different ways by which DLP systems
                                                                               analyze their data, these analysis can be grouped into two
   TABLE II.       COMPARISON OF DLP SYSTEM WITH IPS/FIREWALL                  major group. They are context analysis and content analysis.
                            SYSTEMS
                                                                               The context focuses on the surroundings of the data while
            Security Gaps             IPS/Firewall     DLP Systems             the content focuses on the actual data[26].
                                        Systems
                                                                                    Context analysis: This method of analysis actual
    Network partitioning of           No              Yes
    network security                                                                   analyzes the metadata properties with the
    Load Balancer Integration         No              Limited                          confidential data. It does this by examining the
                                                                                       information about data and keeps track of the data
    Accelerated program delivery      No              Yes
                                                                                       using various attributes of the data such as the size of
    TCP connection pooling            No              Yes                              the document, the source, the destination, when the
                                                                                       document was created or modified and other
    SSL offloading                    No              Yes
                                                                                       properties. With this metadata attributes of the
    Built in authentication engine    No              Yes                              confidential data, a pattern and signature can be used
    Validate encrypted sessions       No              Yes                              to form a process in defining how the policies can be
                                                      (MTA Sensor)                     created for the detection of data loss [27].
    Multiple applications single      No              Yes                           Content analysis: In this method, analysis focuses on
    sign on                                                                            the content of the confidential data, which could be
    Injection attack protection       Limited         Yes                              text or any multimedia material. It does this by
    (XSS, SQL)
                                                                                       comparing the transmitted data with the original
    Normalize encoded traffic         No              Yes
                                                                                       confidential data and detects a breach if there is a
    Inspect HTTPS traffic             No              Yes (vary from                   high percentage in similarity [28]. This process can
                                                      different                        be done through basically three techniques: data
                                                      policy)
                                                                                       fingerprinting (identifies patterns with exact or
    Session tampering/ hijacking/     No              Yes
    riding protection                                                                  partial match), regular expression (identifies its
                                                                                       patterns based on words or text) and statistical
    Forceful browsing prevention      No              Yes                              analysis (using prerecorded information) [29].
    Data theft protection, cloaking   No              Yes
                                                                                  DLP systems could be either preventive or detective,
    Brute-force protection            No              Yes                      depending on the type methods been used by the
    Trojan/Warms/Virus/malware        Yes     (Back   Yes     (Block           organization. The preventive methods includes: Policy and
    upload protection                 Door            and      report          Access Rights, Virtualization and Isolation, Cryptographic
                                      Detection)      users‟act)               Approaches, Quantifying and Limiting; while detective
    Rate control protection           No              Yes                      methods includes: Data Identification, Social and Behavioral,
    Request, response rewrite         No              Yes                      Data Mining/Text Clustering, Quantifying and Limiting.
    Application access logging        No              Yes (depends
    and user audit trails                             on policy rule)          A. Policy and Access Rights
                                                                                   This type of method is widely suitable for organizations,
                                                                               as long as there is a proper classification of their data and a
   IV.    COMPARISON WITH OTHER SECURITY TECHNIQUES
                                                                               well-defined access rights system in place. This becomes
    In the aspect of security, there are a lot of security                     easy to manage as the procedures are clearly stated and
systems and security vendors in the market. These security                     makes it ideal for data ‘at rest’ and data ‘in use’. This
systems can be classified or grouped as network security                       method is constrained by basically improper classification of
systems, antivirus systems, monitoring systems, scanning                       data and not using the effective access controls. As it is a
                                                                          94
                                          International Conference on Information and Communication Technology and Its Applications (ICTA 2016)

preventive method, it doesn’t have the capability to detect               G. Data Mining and Text Clustering
when a breach has occurred [18].                                              This method involves the ability to be able to predict
                                                                          when a data leakage will occur by learning about the data
B. Virtualization and Isolation
                                                                          process and data leakages patterns over time. It is effective in
    It is based isolating the activities of the user virtually and        detecting unstructured documents, making it less dependable
only allowing the system process trusted function or data to              on administrative interfacing, which makes the method easy
pass through the system. This method usually requires                     to integrate. The method is faced with a very high false
hardware in its implementation, thereby reducing the amount               positive as it requires a learning phase to work, thereby
of administrative functions as it makes use of the existing               requiring a huge amount of processing power [35].
data classification on the system. However it isn’t cost
effective and doesn’t detect when there is a data leakage[30].
                                                                           VI.    SOLUTION FOR A SUCCESSFUL DLP IMPLEMENTATION
C. Cryptographic Approaches
    This approach involves encrypting the confidential                        For there to be a proper implementation of any DLP
information with strong encryption tools to enable it produce             systems, there are ten key steps we have considered and if
a maximum level of security. This approach is almost used in              this steps are followed would help an organization to
all DLP systems as it has various options to encrypt such                 adequately implement the DLP systems for protection of
files and it is effective for data ‘at rest’. The major challenge         their confidential data. These steps are as follows:
is that encryption doesn’t hide those confidential documents
even though they might be encrypted. It isn’t a detective                 Step 1: Implementation of a universal technique and value
method, making it vulnerable when there is a data                                  proposal for DLP centered on a risk assessment
leakage[31].                                                              Step 2: Involve the right people with the right organization
                                                                                   model
D. Quantifying and Limiting                                               Step 3: Identify sensitive data and understand how they are
    This method has an added advantage, as it also monitors                        handled
the channels in which those data travels and blocks any                   Step 4: Provide a phased implementation based on
sensitive data from passing through those channels. It can                         progress
effectively be implemented for data ‘in transit’, ‘in use’ and            Step 5: Minimize the impact to system performance and
‘at rest’, thereby making it easy to deploy it for a specific                      business operations
attack on the organization system. As with the other                      Step 6: Create meaningful DLP policies and policy
preventive methods, it makes it hard to detect data leakages                       management processes
and if not properly deployed can disrupt the workflow of the              Step 7: Implement effective event review and investigation
entire system. It is also limited to specific scenarios of data                    mechanisms
leakages thereby making it vulnerable to other data forms of              Step 8: Provide analysis and meaningful reporting
leakages[32].                                                             Step 9: Implement security and compliance measures
                                                                          Step 10: Implement an organizational data flow and
E. Social and Behavior Analysis
                                                                                   oversight mechanism
    This method involves analyzing the level of interaction
between people or in this case users of the organization and
measuring this level, by creating adequate guidelines for the                                     VII. CONCLUSION
protection of sensitive data. When adequately implemented                     Many of organizations have given a great deal of
prevents leakages by detecting any relationship that is of                attention in protecting their sensitive data from been lost
malicious intent and it is effective in all data states. As it is         accidentally or intentionally. DLP systems cannot function
difficult to predict such human behaviors, thereby leading to             effectively in isolation, this implies that for a DLP system to
a high percentage of false positives and also requiring the               effectively function it requires linking other security
administrator to regularly interact with the DLP system. This             information process. However, before implementing any
method also requires a huge amount of time in profiling the               DLP system, there is need to adequately understand what
various users and indexing each of their behavioral patterns              confidential data the organization wants to hold, where does
[33].                                                                     confidential data are to be stored in terms of locations as
                                                                          where those data are been stored are vital in its protection
F. Data Identification                                                    and the destination and the channels this information will
                                                                          pass through.
    This methods uses a mechanism that compares data                          There are several challenges associated with DLP
traffic flowing through the system with that of the original              systems, before they are deployed it is necessary and as well
confidential documents and tries to prevent such data from                as important to adequately have a deep understanding and be
been leaked when there is match. This method produces a                   able to analyze these various challenges associated with the
very low false positive, when using fingerprinting in its                 system. It is also important to make the system easy to be
analysis. However this method can easily be bypassed by                   used and managed, so as to avoid any form of complexity, as
extremely modifying those data, making it impossible to                   the more complex a DLP system, the more likelihood the
detect it [34].                                                           system will be compromised by the user.


                                                                     95
                                                International Conference on Information and Communication Technology and Its Applications (ICTA 2016)

    As new technology are been developed and the ways this                              http://www.scmagazineuk.com/the-people-problem-how-to-manage-
technologies communicates changes as well, it is of great                               the-human-factor-to-shore-up-security/article/494638
importance an organizations must keep pace with these                              [18] D. Gibson, "What's missing from Data Loss Prevention," Data Center
                                                                                        Journal, 2012.
increasing technology advancements by identifying new and
                                                                                   [19] S. R. Raj, A. Cherian, and A. Abraham, "A Survey on Data Loss
better ways in protecting data from been lost by unauthorized                           Prevention Techniques," International Journal of Science and
users.                                                                                  Research, vol. 2, pp. 240-241, 2013.
                                                                                   [20] N. B. Pamula, M. S. Naga, and P. K. Deepthi, "Preventing Data
                                                                                        Leakage in Distributive Strategies by Steganography Technique,"
                             REFERENCES                                                 International Journal of Computer Science and Information
[1]  N. Kumaresan, "Key consideration in protecting sesitive data leakage               Technologies, vol. 4, pp. 220-223, 2013.
     using Data Loss Prevention Tools," ISACA Journal, vol. 1, pp. 1-5,            [21] S. W. Ahmad and G. R. Bamnote, "Data Leakage Detection and Data
     2014.                                                                              Prevention using Algorithm," International Journal of Computer
[2] E. Bergstrom and R. M. Ahlfedt, "Information Classification Issues,"                Science and Application, vol. 6, pp. 394-399, 2013.
     Sprin International Publishing, pp. 27-41, 2014.                              [22] M. Hart, P. Manadhata, and R. Johnson, "Text Classification for Data
[3] DataLossDB. (2016). 2015 Reported data breaches surpasses all                       Loss Prevention," in Privacy Enhancing Technologies, ed Waterloo,
     previous years. Available: http://blog.datalossdb.org                              ON, Canada: Springer Berlin Heidelberg, 2011, pp. 18-37.
[4] IBM and Ponemon Institute LLC, "2015 Cost of Data Breach Study:                [23] J. Thorkelson. (2010). Data Loss Prevention: Simplified. Available:
     Global Analysis," Ponemon Institute LLC Research Department                        http://www.codegreennetworks.com
     2308 US 31 North Traverse City, Michigan 49686 USA 2015.                      [24] M.       Rouse.     (2015).     Data      Classification.    Available:
[5] Trend Micro. (2016). Data Protection Mishap leavees 55M                             http://searchdatamanagement.techtarget.com/data-classification
     Philippine          Voters          at         Risk.        Available:        [25] R. Bragg, "Data Classification," in CISSP Training Guide, 1st ed 800
     http://blog.trendmicro.com/treandlabs-security-intelligence/55m-                   East 96th Street, Indianapolis, Idiana: Pearson IT Certification, 2002,
     registered-voters-risk-philippine-commission-elections-hacked                      pp. 48-51.
[6] BBC NEWS. (2015). TalkTalk hack 'affected 157,000 customers.                   [26] A. Bryman, Social Research Methods, 2nd ed. Great Clarendon
     Available: http://www.bbc.com/news/business-34743185                               Street, Oxford, United Kingdom: Oxford University Press, 2004.
[7] C. Osborne. (2015). Health insurer Anthem hit by hackers, up to 80             [27] S. A. Kale and S. V. Kulkari, "Data Leakage Detection,"
     million             records             exposed.            Available:             International Journal of Advanced Research in Computer and
     http://www.zdnet.com/article/health-insurer-anthem-hit-by-hackers-                 Communication Engineering, vol. 1, pp. 668-678, 2012.
     up-to-80-million-records-exposed                                              [28] K. A. Neuendorf, The Content Analysis Guidebook. Thousand Oaks,
[8] T. Seals. (2016). Data Breach Trends to Evolve in 2016. Available:                  Ca.: Sage Publication Inc., 2002.
     http://infosecurity-magazine.com/news/data]breache-trends-to-                 [29] K. Krippendorf, Content Analysis: An introduction to its
     evolve-in                                                                          methodology. Thousand Oaks, Ca.: Sage Publication Inc., 2004.
[9] EYGM Limited. (2011). Data Loss Prevention: Keeping your                       [30] J. N. Mathews, W. Hu, M. Hapuarachchi, and T. Deshane,
     sensitive data         out of the public domain. Available:                        "Quantifying the performanceof IsolationProperties of Virualization
     http://www.ey.com                                                                  Systems," ACM, pp. 1-9, 2007.
[10] R. R. Tahboub and Y. Saleh, "Data Leakage/Loss Prevention Systems             [31] K. Scarfone. (2013). How to help DLP and Encryption Coexist.
     (DLP)," ResearchGate, 2014.                                                        Available: http://www.statetechmagazine.com/article/2013/11/how-
[11] Jonathan Jesse and ITS Partners. (2015). Symantec DLP Overview.                    help-dlp-and-encryption-coexist-state
     Available: http://www.symantec.com/en/uk/business/theme.jsp?th                [32] S. Vavilis, M. Petkovic, and N. Zannone, "Data Leakage
[12] Price Waterhouse Coopers, "Data Loss Prevention: Keeping sensitive                 Quantification," presented at the Data Applications Security and
     data out of the wrong hands*," pp. 1-16, 2008.                                     Privacy XXVIII: 28th Annual IFIP WG 11.3 Vienna, Austria, 2014.
[13] N. Lord, "Experts on the Data Loss Prevention (DLP) Market in 2016            [33] J. M. Kizza, Computer Network Security and Cyber Ethic, 4th ed.
     & Beyond," ed, 2016.                                                               Jefferson, North Carolina: McFarland & Company, Inc., 2014.
[14] V. Shaj and K. P. Kaliyamurthie, "A review of Data Leakage                    [34] M. Tu, K. Spoa-Harty, and L. Xiao, "Data Loss Prevention
     Detection," IJCSMC Journal, vol. 2, pp. 577-581, 2013.                             Management        and     Control:      Inside     Activity    Incident
[15] T. T. T. Huong and J. Corner, "The impact of communication                         Monitoring,Identification, and Tracking in Healthcare Enterprise
     channels on mobile banking adoption," International Journal of                     Environments," The Journal of Digital Forensics, Security and Law,
     Banking Marketing, vol. 34, pp. 78-109, 2014.                                      vol. 10, pp. 27-44, 2015.
[16] I. Ponemon, "The Human Factor in Data Protection " Trend Micro,               [35] I. H. Witten and E. Frank, "Classification rule," in Data Mining
     pp. 1-27, 2012.                                                                    Practical Machine Learning Tools and Techniques, 2nd ed 500
                                                                                        Sansome Street, Suite 400, San Francisco, CA 94111: Morgan
[17] T. Pepper. (2016). The people problem: How to manage the human                     Kaufmann Publisher, 2005, pp. 200-213.
     factor        to       shore        up       security.      Available:




                                                                              96