International Conference on Information and Communication Technology and Its Applications (ICTA 2016) Federal University of Technology, Minna, Nigeria November 28 – 30, 2016 Data Loss Prevention and Challenges Faced in their Deployments Victor O. Waziri, Ismaila Idris, John K. Alhassan, and Bolaji O. Adedayo Department of Cyber Security Science, Federal University of Technology, Minna, Nigeria victor.waziri@futminna.edu.ng Abstract—The technology world has greatly evolved over the data loss and it is hard to know the extent or amount of data past three decades and it is at a pace where an average user’s loss that has occurred. laptop can accommodate up to a terabyte of data, where a tiny There have been some notable data loss incidents in SD card can store an entire database of an organization, where recent years that has cost organizations millions of dollars in file transferring has become less complex, and where users can the process. An estimated forecast indicates that an average easily connect to any wireless network (Private or Public) cost of a data loss will be over $150 million by 2020 and a within the range of their wireless devices to exchange sensitive global annual cost forecast to be $2.1 trillion [4].In March information. This evolvement has led to one of the greatest 2016, LulsZec Philippines uploaded COMELEC’s entire challenges organizations are faced with, which is in the area of database on Facebook, after their website had been adequately protecting their sensitive information from being lost or leaked. Data Loss Prevention (DLP) techniques was hacked[5], while in October 2015, TalkTalk a British created in preventing these breaches on data loss, when these telecommunications provider suffered a data loss of over 4 breaches occur in an organization. DLP systems has gained million of their customer’s details, thereby causing their popularity over the last decade and is now referred as a stock to fall drastically[6]. In February 2015, over 80 million matured technology, and with the alarming rate at which records were lost due to data loss in Anthem, these records digitally stored assets is growing, the need for DLP systems has included social security numbers and very sensitive also increased. This paper discusses some of DLP concepts and information[7]. Adobe Systems revealed in October 2013 trends, as well as the some of the challenges these various DLPs that there was a data loss of over 130 million user records to face and proffer a solution for a successful implementation. a hack group due to insider assistance.These kinds of incidents has caused organizations major financial losses, Keywords-Data loss prevention; Data loss; Data protection; damages to their reputation, loss of their customer Data security confidence, legal prosecution, productivity and morale of employee and loss of business opportunities[8]. One of the biggest challenges in mitigating data loss, is that there are so many reasons attributed to data loss in an I. INTRODUCTION organization and there is no tool or a simple solution that Data loss can be defined as the unauthorized transfer of adequately address these various data losses. However to be sensitive or confidential information about an organization able to address the risks faced, a solution must be developed from a workstation or from the organization data center to to incorporate the causes of data loss, which are can be the outside world or to an untrusted environment. This can classified as people, processes and technology[9]. be achieved through various channels of communications or  People: Data loss can be caused by people through by using storage devices or simply by memorizing the their lack of awareness of the security issues relating information displayed on the screen [1].These information to sensitive information that are to be securedand could be either a regular data (Debit card data, Bank most times are not been accountable for protecting verification number and Health care data) or organization these information. secrets (Financial information, intellectual property and trade  Process: The process of securing these sensitive secrets) [2]. information can be caused by inadequate data usage Over the last decades there has been major data loss with policies, no proper data transmission process and serious impacts on organizations and this losses is on an lack of data monitoring usage. increase in recent years. According to DataLossDB [3], their  Technology: Lack of flexibility and communication report shows that2015 surpassed the year 2012 all-time platform in technology deployed for the protection record, for the number of reported data loss incidents of data, makes it difficult for the user, thereby worldwide. Over 736 million records were exposed in the making the user to look for an alternative. 3,930 reported incidents in 2015. An estimate of 50% out of those records experienced data loss in the business sector, As data loss is one of the major problems been faced by 20% in the government sector and the remaining 30% organization and if not properly managed can cost the occurring in the education and health sectors. It is also organization millions in terms of finance. This problem can important to mention that private users are also victims of be mitigated by using various types of Data Loss Prevention 90 International Conference on Information and Communication Technology and Its Applications (ICTA 2016) (DLP) methods and techniques. DLP can be defined as a task), they can be integrated to support other technologies system, which is designed to detect and prevent any potential like identity access management or encryption. The table 1 data breach both intentionally or unintentionally[10].Most summarizes the features for each of this DLP vendors and organizations combine two or more DLPs to effectively figure 2 also shows the performance based on these features. control the potential data loss they might be faced with. DLPsystems differs from the conventional security as it has TABLE I. DATA LOSS PREVENTION PRODUCT MATRIX the ability to analyze the content of the confidential data and the context surrounding those data and it also has the ability Vendor A B C D E F G H I J to protect those confidential data in all data states. Mobile/tablet X X ✓ ✓ X X ✓ ✓ X ✓ A basic DLP system consist of three stages which include Laptop/Desktop/Workst discover, monitor and protect[11]. This stages are vital in ation ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ setting up an effective DPL system. The discovery stage Local network X ✓ X X ✓ X X X X X locates where your confidential data are been stored, by takin a detailed inventory of this classified data and then Server ✓ ✓ ✓ X ✓ ✓ X ✓ X X regrouping these sensitive data in terms of priorities. In the Cloud/SaaS ✓ ✓ ✓ ✓ ✓ X ✓ ✓ X ✓ monitoring stageit monitors how the confidential data are used, by understanding the content and context of this Detection technologies sensitive data and by analyzing when a breach occurs. The biometric signatures X X X X ✓ X X X X X last stage which is the protect stage, basically describes the classification ✓ ✓ ✓ X ✓ X X X X X ways for protecting data loss and this is done by been proactive in protecting these confidential data or by context analysis ✓ X ✓ ✓ ✓ X ✓ X X X enforcing the data loss policies created. data matching X ✓ X X X X X X ✓ ✓ flagging X ✓ X X X X X X X X dictionaries/lexicons X ✓ X X ✓ X X X X X data discovery ✓ ✓ ✓ X X ✓ ✓ ✓ ✓ ✓ file type detection/classification X X ✓ ✓ ✓ X X ✓ X ✓ machine learning/pattern recognition X ✓ X X X X ✓ ✓ X ✓ Optical Character Recognition X X X ✓ ✓ X X X X X regular expressions/pattern matching X ✓ X X ✓ ✓ X ✓ X ✓ Enforcement technologies block ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ Figure 1. The different Data States and DLPs functionalities encrypt ✓ ✓ ✓ X ✓ X X ✓ ✓ ✓ For a DLP system to be effectively deployed in an fingerprinting ✓ ✓ ✓ X ✓ X X X X ✓ organization, the data life cycle of the organization is considered. A data life cycle is a detailed outline of the move/remove ✓ ✓ ✓ X X X X X X X phases involved in effectively preserving and managing of notify/alert ✓ ✓ ✓ X X ✓ X ✓ ✓ X data to be used and reused. This stages include data at rest (data in storage), data in use (data flowing through internal quarantine ✓ ✓ X X ✓ X ✓ ✓ ✓ ✓ network) and data in transit (data that are been accessed). Software integration Figure 1 shows a summary of these phases and how the Databases (e.g. SQL DLPs prevents data loss at those phases. In the protection of Server) X ✓ X X ✓ X ✓ X X ✓ targeted data, DLPs can take many forms during its Email client X ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ deployment, which are mostly based on the data state [12]. file sharing X ✓ X X ✓ ✓ ✓ X ✓ ✓ II. COMPARISON OF DLP TOOLS instant messaging X ✓ X X ✓ ✓ X ✓ ✓ ✓ In evaluating some of these tools designed by various web 2.0 X ✓ X ✓ ✓ ✓ X X ✓ ✓ vendors, we were able to make a comparison for some of the webmail X ✓ X X X ✓ ✓ ✓ X X top DLP tools from various security vendors such as CA Technologies (A), Code Green Networks (B), Digital Hardware integration Guardian (C), Forcepoint (D), McAfee (E), Palisade Systems CD/DVD X X X X ✓ ✓ ✓ ✓ X ✓ (F), RSA (G), Trend Micro (H), Trustwave (I) and Symantec (J), it was observed that these security vendors offer external/removable HD X ✓ ✓ X ✓ X X ✓ X X protection for various data states. Though some of these tools Printer X X X X ✓ X ✓ ✓ X X are specialized in their design (don’t perform other security USB drives X ✓ X ✓ ✓ ✓ ✓ ✓ X ✓ 91 International Conference on Information and Communication Technology and Its Applications (ICTA 2016) wireless devices X ✓ X X X X X X X X Some of them will require a great number of techniques and monitoring to adequately secure them. Monitoring centralized X ✓ ✓ ✓ ✓ ✓ ✓ ✓ X ✓ offline X ✓ X ✓ ✓ X X X X X real-time X X ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 77.78% 75.00% 58.33% 58.33% 47.22% 47.22% 41.67% 33.33% 36.11% 33.33% A B C D E F G H I J Figure 3. Some common data leaking channels Sensitive data that are either ‘at rest’ or ‘in use’ can be Figure 2. Overall performance of the DLP vendors compromised using these channels, which could include USB ports, CD/DVD drives, printed documents and even Though this doesn’t necessary imply that Code Green through web services. Though data leakages can be mitigated Networks has the best DLP system, it simply means it using host DLPs for CD/DVD drives and USB port channels, compensates in the areas where it lacks. When choosing the it isn’t adequate enough to prevent data leakages from other overall DLP vendor for your organization, it is ideal to check channels such as Instant Messaging (IM) and emails that are the features that best suites the implantation, such as the ease always made available [15]. Even when the access right are of installation, scalability, control features as well as its restrict to confidential data, some of these data can still be maintenance. still be accessed in a printable format. While channels like file sharing and web services associated with data ‘in transit’ III. CHALLENGES ON DLP SYSTEMS DEPLOYMENT has been one of the biggest challenges in migrating, as these In protecting sensitive data from loss, DLP systems faces channels cannot be blocked and the serve as the backbones many challenges and like other security mechanism these of the organization in terms of data exchange. To effectively challenges can render the system ineffective. In a review maintain a maximum security in these channels, an intensive conducted by researchers in the area of both industrial and filtering traffic is to be done. The DLP system to be deployed academic DPL systems, it was discovered that there were should always try to create a balance in security without seven common challenges been identified[13]. These are affecting the interconnectivity in these channels. Leaking Channels, The Human Factor, Access Rights, Encryption and Steganography, Data Modification, Scalability and Integration, and Data Classification. For an B. The Human Factor effective DLP system to be implemented, these various Humans are generally a complex being, as their challenges must be addressed. In the following sections we behaviors and motives are usually hard to predict or to will discuss those challenges faced and try to suggest determine, as they are been influenced by many factors, possible solution for each of them. which could be psychological or sociological. Decision makings such as granting of access to a set of users, defining A. Leaking Channels the confidentiality level of a data and setting a threshold Everyday there is need to share and access data between level for DLP systems, is basically affected by human different medium and users, and this done with the assistance actions. It must also be noted here that, even when of intermediate channels. In an ideal scenario these channels organization’s security policies are in place to mitigate such are used to legitimately exchange data from one end to data loss, it doesn’t mean it is guarantee to tackle the another, however these channels can also create a major treat problem. Almost all human interactions with data occurs at in the leakage of sensitive data. These channels cannot be the data ‘in use’ state, which simply means the user needs an totally blocked, as it is an important aspect in the sharing of endpoint terminal to access this confidential datafor there to data, which requires some or even all these channels to be be data leakages [16]. Though a typical DLP system will open. As technology keeps growing at a fast pace and more tend to put some restrictions on data leakages by the user, in channels becoming available, it has become hard to keep the form of disabling some aspect of the system, such as pace with securing these channels[14]. The figure 3 shows CD/DVD drives, USB ports and removable drives. But this some of the commonly used channels used for data user restrictions can easily be bypassed, by the sharing of exchange, as not all these channels are very easy to secure. access rights by users either intentionally (by trust) or 92 International Conference on Information and Communication Technology and Its Applications (ICTA 2016) unintentionally (by social engineering), there by E. Data Modification compromising the security of the confidentiality of that data The design of some DLP systems are created to compare that would be accessed. Users can make use of mobile the original sensitive data and inspected traffic flowing gadgets to snap pictures of sensitive information or even use through the system by using data signatures and patterns to hidden cameras to record the entire classified documents and achieve prevention of data leakages. In this system, detection transmit it remotely. The human factor will always be a occurs whenever there is a signature and patterns match to major challenge when deploying a DLP systems, as long as that of the confidential data or when there is a high in is human interactions with the system[17]. percentage of similarity to the confidential data. The major challenge of this system design is that confidential data are C. Access Rights mostly not sent in a form that will enable the system detect Access right has always been a key feature in the such modification. Data can be modified using various type deployment of any security mechanism including DLP of techniques, which are readily available online. These systems. Therefore it is of great importance to be able to confidential documents can be easily modified by removing categorize these access rights properly and to be able to some vital lines in the documents or adding to it, thereby separate each category of users from each other based on creating an entire different document before sending such their level of permission. DLP systems won’t be able to documents over the allowed channels. The user can also prevent illegitimate users from accessing confidential entirely change the structure or format of the document, information, if there is not a proper categorization of access thereby rendering the documents undetectable from the DLP right in place. Access rights is of great importance in systems [21]. preventing data loss in an organization and should always be In some other design DLP systems uses data hashing in updated regularly[18]. As an obsolete access right can have a analyzing the outgoing traffic by comparing the values huge impact on the entire system negatively, thereby making (including SHA1 and MD5) with the original confidential the system vulnerable to data loss. For instance, when a user data. The moment these two values matches each other, then is downgraded or dismissed from the organization and the detection of data leak occurs. The problem with hashing access rights are not updated according, it leaves the system design, is it becomes ineffective the moment the confidential vulnerable to data leakages, as the DLP system won’t be able documents is extensively modified, which in turns gives a to detect any data breach when that user tries to access different hash value [22]. information he/she is not permitted to access. F. Scalability and Integration A data leak can also occur by a legitimate user with the right access rights either intentionally (due to so many The volume data processed can affect the performance of factors such as financial gain or whistle blower) or any security mechanism deployed in securing an accidentally. For efficiency, a DLP system should be able organization’s assets. DLP systems can also be a victim of maintain and control access rights of the organization while such challenges, which means when deploying them either in also performing the function of protection of data from a host, network or storage section, it should be effective in intentional and unintentional leakages. performing its function and smoothly incorporated into the system without affecting or causing delay in the entire work flow of the organization’s system. Therefore factors affecting D. Encryption and Steganography the scalability of a DLP systems such as its computational Encryption is another major challenge been faced by ability and analyzation techniques should be considered network based DLP systems, as these systems uses different when deploying the system [23]. forms of analytical techniques in identifying copies of the There is usually some challenges faced when integrating sensitive data and comparing it with the original data that are the DLP systems during its deployment, as there are similar been classified as confidential. But with complex encryption function already been handled by other security mechanism of the confidential data by the user it makes it hard for the like firewalls and intrusion detection. Therefore before DLP system to be able to analyze such data content, thereby deployment, the entire system must be carefully analyzed creating a major vulnerability in the system. The implication and implemented to give an effective performance. As there of this, is that a confidential document can bypass the DLP shouldn’t be repetition of functions, as having two similar system detection mechanism when the user encrypts the function on the system can cause a delay in the entire process documents, thereby allowing the user to be able to send the of the system, thereby reducing the performance of the confidential document through his/her email as an system. attachment[19]. Stenography is another type of challenge similar to encryption, but it is more challenging to mitigate G. Data Classification and even impossible to detect when used by the user. The Data classification is the process of organizing data into user uses stenography tools to hide classified documents categories or levels for an effective and efficient use [24]. within other media, these media could be digital photos, This definition implies that, DLP systems rely entirely on audio files and video files. This becomes a challenge to DLP well-defined data classification to enable the system system as it won’t be able to detect the confidential data differentiate confidential data from normal data. The main inside those media[20]. In some instances a document can be purpose of classification of data is in determining the compressed, or converted to a different format, thereby baseline of security controls to be used in safe guarding data. making the system unable to detect such documents, as it There are different ways by which data can be classified won’t be able to analyze such documents. based on the organization classification, with terms like 93 International Conference on Information and Communication Technology and Its Applications (ICTA 2016) confidential, secret and top secret been used by the military systems, data controlling systems as well as transaction to classify their data, while for an Institution data can be systems. These systems are unique and separated by their classified as restricted data, private data and public functionality. Take for an example, the antivirus system data[25]. With this classification it becomes easy in cannot perform encryption of data but it works perfectly in identifying those confidential data, thereby making the the monitoring of the data source code. For this reasons system adequately equipped to protect those confidential corporate organization will require many types of security data. However the problem with data classification is systems in their protection of data. As these security system determining the level of secrecy of those sensitive data. For have specific functions, this makes the DLP system having there to be a proper classification of these secrecy levels, the more edge over the rest, as it has the ability to perform owner of the data for protection should be the one various functions. This reduces the cost of purchase a lot of responsible for this classification process. However, most security systems in the monitoring of the various security times the classification process is left to those people who gaps. The table 2 summarizes the features of a DLP when don’t have enough knowledge about all the data. This creates compared to similar protection system (Intrusion Prevention a vulnerability in the DLP system, as those who are not System (IPS) and Firewall System). permitted to see certain information are now equipped to having access to such confidential information. It is therefore important to properly have a good classification, as without it V. THE WAYS DLP SYSTEMS ANALYZES DATA the DLP system becomes ineffective. Though there are different ways by which DLP systems analyze their data, these analysis can be grouped into two TABLE II. COMPARISON OF DLP SYSTEM WITH IPS/FIREWALL major group. They are context analysis and content analysis. SYSTEMS The context focuses on the surroundings of the data while Security Gaps IPS/Firewall DLP Systems the content focuses on the actual data[26]. Systems  Context analysis: This method of analysis actual Network partitioning of No Yes network security analyzes the metadata properties with the Load Balancer Integration No Limited confidential data. It does this by examining the information about data and keeps track of the data Accelerated program delivery No Yes using various attributes of the data such as the size of TCP connection pooling No Yes the document, the source, the destination, when the document was created or modified and other SSL offloading No Yes properties. With this metadata attributes of the Built in authentication engine No Yes confidential data, a pattern and signature can be used Validate encrypted sessions No Yes to form a process in defining how the policies can be (MTA Sensor) created for the detection of data loss [27]. Multiple applications single No Yes  Content analysis: In this method, analysis focuses on sign on the content of the confidential data, which could be Injection attack protection Limited Yes text or any multimedia material. It does this by (XSS, SQL) comparing the transmitted data with the original Normalize encoded traffic No Yes confidential data and detects a breach if there is a Inspect HTTPS traffic No Yes (vary from high percentage in similarity [28]. This process can different be done through basically three techniques: data policy) fingerprinting (identifies patterns with exact or Session tampering/ hijacking/ No Yes riding protection partial match), regular expression (identifies its patterns based on words or text) and statistical Forceful browsing prevention No Yes analysis (using prerecorded information) [29]. Data theft protection, cloaking No Yes DLP systems could be either preventive or detective, Brute-force protection No Yes depending on the type methods been used by the Trojan/Warms/Virus/malware Yes (Back Yes (Block organization. The preventive methods includes: Policy and upload protection Door and report Access Rights, Virtualization and Isolation, Cryptographic Detection) users‟act) Approaches, Quantifying and Limiting; while detective Rate control protection No Yes methods includes: Data Identification, Social and Behavioral, Request, response rewrite No Yes Data Mining/Text Clustering, Quantifying and Limiting. Application access logging No Yes (depends and user audit trails on policy rule) A. Policy and Access Rights This type of method is widely suitable for organizations, as long as there is a proper classification of their data and a IV. COMPARISON WITH OTHER SECURITY TECHNIQUES well-defined access rights system in place. This becomes In the aspect of security, there are a lot of security easy to manage as the procedures are clearly stated and systems and security vendors in the market. These security makes it ideal for data ‘at rest’ and data ‘in use’. This systems can be classified or grouped as network security method is constrained by basically improper classification of systems, antivirus systems, monitoring systems, scanning data and not using the effective access controls. As it is a 94 International Conference on Information and Communication Technology and Its Applications (ICTA 2016) preventive method, it doesn’t have the capability to detect G. Data Mining and Text Clustering when a breach has occurred [18]. This method involves the ability to be able to predict when a data leakage will occur by learning about the data B. Virtualization and Isolation process and data leakages patterns over time. It is effective in It is based isolating the activities of the user virtually and detecting unstructured documents, making it less dependable only allowing the system process trusted function or data to on administrative interfacing, which makes the method easy pass through the system. This method usually requires to integrate. The method is faced with a very high false hardware in its implementation, thereby reducing the amount positive as it requires a learning phase to work, thereby of administrative functions as it makes use of the existing requiring a huge amount of processing power [35]. data classification on the system. However it isn’t cost effective and doesn’t detect when there is a data leakage[30]. VI. SOLUTION FOR A SUCCESSFUL DLP IMPLEMENTATION C. Cryptographic Approaches This approach involves encrypting the confidential For there to be a proper implementation of any DLP information with strong encryption tools to enable it produce systems, there are ten key steps we have considered and if a maximum level of security. This approach is almost used in this steps are followed would help an organization to all DLP systems as it has various options to encrypt such adequately implement the DLP systems for protection of files and it is effective for data ‘at rest’. The major challenge their confidential data. These steps are as follows: is that encryption doesn’t hide those confidential documents even though they might be encrypted. It isn’t a detective Step 1: Implementation of a universal technique and value method, making it vulnerable when there is a data proposal for DLP centered on a risk assessment leakage[31]. Step 2: Involve the right people with the right organization model D. Quantifying and Limiting Step 3: Identify sensitive data and understand how they are This method has an added advantage, as it also monitors handled the channels in which those data travels and blocks any Step 4: Provide a phased implementation based on sensitive data from passing through those channels. It can progress effectively be implemented for data ‘in transit’, ‘in use’ and Step 5: Minimize the impact to system performance and ‘at rest’, thereby making it easy to deploy it for a specific business operations attack on the organization system. As with the other Step 6: Create meaningful DLP policies and policy preventive methods, it makes it hard to detect data leakages management processes and if not properly deployed can disrupt the workflow of the Step 7: Implement effective event review and investigation entire system. It is also limited to specific scenarios of data mechanisms leakages thereby making it vulnerable to other data forms of Step 8: Provide analysis and meaningful reporting leakages[32]. Step 9: Implement security and compliance measures Step 10: Implement an organizational data flow and E. Social and Behavior Analysis oversight mechanism This method involves analyzing the level of interaction between people or in this case users of the organization and measuring this level, by creating adequate guidelines for the VII. CONCLUSION protection of sensitive data. When adequately implemented Many of organizations have given a great deal of prevents leakages by detecting any relationship that is of attention in protecting their sensitive data from been lost malicious intent and it is effective in all data states. As it is accidentally or intentionally. DLP systems cannot function difficult to predict such human behaviors, thereby leading to effectively in isolation, this implies that for a DLP system to a high percentage of false positives and also requiring the effectively function it requires linking other security administrator to regularly interact with the DLP system. This information process. However, before implementing any method also requires a huge amount of time in profiling the DLP system, there is need to adequately understand what various users and indexing each of their behavioral patterns confidential data the organization wants to hold, where does [33]. confidential data are to be stored in terms of locations as where those data are been stored are vital in its protection F. Data Identification and the destination and the channels this information will pass through. This methods uses a mechanism that compares data There are several challenges associated with DLP traffic flowing through the system with that of the original systems, before they are deployed it is necessary and as well confidential documents and tries to prevent such data from as important to adequately have a deep understanding and be been leaked when there is match. This method produces a able to analyze these various challenges associated with the very low false positive, when using fingerprinting in its system. It is also important to make the system easy to be analysis. However this method can easily be bypassed by used and managed, so as to avoid any form of complexity, as extremely modifying those data, making it impossible to the more complex a DLP system, the more likelihood the detect it [34]. system will be compromised by the user. 95 International Conference on Information and Communication Technology and Its Applications (ICTA 2016) As new technology are been developed and the ways this http://www.scmagazineuk.com/the-people-problem-how-to-manage- technologies communicates changes as well, it is of great the-human-factor-to-shore-up-security/article/494638 importance an organizations must keep pace with these [18] D. Gibson, "What's missing from Data Loss Prevention," Data Center Journal, 2012. increasing technology advancements by identifying new and [19] S. R. Raj, A. Cherian, and A. Abraham, "A Survey on Data Loss better ways in protecting data from been lost by unauthorized Prevention Techniques," International Journal of Science and users. Research, vol. 2, pp. 240-241, 2013. [20] N. B. Pamula, M. S. Naga, and P. K. Deepthi, "Preventing Data Leakage in Distributive Strategies by Steganography Technique," REFERENCES International Journal of Computer Science and Information [1] N. Kumaresan, "Key consideration in protecting sesitive data leakage Technologies, vol. 4, pp. 220-223, 2013. using Data Loss Prevention Tools," ISACA Journal, vol. 1, pp. 1-5, [21] S. W. Ahmad and G. R. Bamnote, "Data Leakage Detection and Data 2014. Prevention using Algorithm," International Journal of Computer [2] E. Bergstrom and R. M. Ahlfedt, "Information Classification Issues," Science and Application, vol. 6, pp. 394-399, 2013. Sprin International Publishing, pp. 27-41, 2014. [22] M. Hart, P. Manadhata, and R. Johnson, "Text Classification for Data [3] DataLossDB. (2016). 2015 Reported data breaches surpasses all Loss Prevention," in Privacy Enhancing Technologies, ed Waterloo, previous years. Available: http://blog.datalossdb.org ON, Canada: Springer Berlin Heidelberg, 2011, pp. 18-37. [4] IBM and Ponemon Institute LLC, "2015 Cost of Data Breach Study: [23] J. Thorkelson. (2010). Data Loss Prevention: Simplified. Available: Global Analysis," Ponemon Institute LLC Research Department http://www.codegreennetworks.com 2308 US 31 North Traverse City, Michigan 49686 USA 2015. [24] M. Rouse. (2015). Data Classification. Available: [5] Trend Micro. (2016). Data Protection Mishap leavees 55M http://searchdatamanagement.techtarget.com/data-classification Philippine Voters at Risk. Available: [25] R. Bragg, "Data Classification," in CISSP Training Guide, 1st ed 800 http://blog.trendmicro.com/treandlabs-security-intelligence/55m- East 96th Street, Indianapolis, Idiana: Pearson IT Certification, 2002, registered-voters-risk-philippine-commission-elections-hacked pp. 48-51. [6] BBC NEWS. (2015). TalkTalk hack 'affected 157,000 customers. [26] A. Bryman, Social Research Methods, 2nd ed. Great Clarendon Available: http://www.bbc.com/news/business-34743185 Street, Oxford, United Kingdom: Oxford University Press, 2004. [7] C. Osborne. (2015). Health insurer Anthem hit by hackers, up to 80 [27] S. A. Kale and S. V. Kulkari, "Data Leakage Detection," million records exposed. Available: International Journal of Advanced Research in Computer and http://www.zdnet.com/article/health-insurer-anthem-hit-by-hackers- Communication Engineering, vol. 1, pp. 668-678, 2012. up-to-80-million-records-exposed [28] K. A. Neuendorf, The Content Analysis Guidebook. Thousand Oaks, [8] T. Seals. (2016). Data Breach Trends to Evolve in 2016. Available: Ca.: Sage Publication Inc., 2002. http://infosecurity-magazine.com/news/data]breache-trends-to- [29] K. Krippendorf, Content Analysis: An introduction to its evolve-in methodology. Thousand Oaks, Ca.: Sage Publication Inc., 2004. [9] EYGM Limited. (2011). Data Loss Prevention: Keeping your [30] J. N. Mathews, W. Hu, M. Hapuarachchi, and T. Deshane, sensitive data out of the public domain. Available: "Quantifying the performanceof IsolationProperties of Virualization http://www.ey.com Systems," ACM, pp. 1-9, 2007. [10] R. R. Tahboub and Y. Saleh, "Data Leakage/Loss Prevention Systems [31] K. Scarfone. (2013). How to help DLP and Encryption Coexist. (DLP)," ResearchGate, 2014. Available: http://www.statetechmagazine.com/article/2013/11/how- [11] Jonathan Jesse and ITS Partners. (2015). Symantec DLP Overview. help-dlp-and-encryption-coexist-state Available: http://www.symantec.com/en/uk/business/theme.jsp?th [32] S. Vavilis, M. Petkovic, and N. Zannone, "Data Leakage [12] Price Waterhouse Coopers, "Data Loss Prevention: Keeping sensitive Quantification," presented at the Data Applications Security and data out of the wrong hands*," pp. 1-16, 2008. Privacy XXVIII: 28th Annual IFIP WG 11.3 Vienna, Austria, 2014. [13] N. Lord, "Experts on the Data Loss Prevention (DLP) Market in 2016 [33] J. M. Kizza, Computer Network Security and Cyber Ethic, 4th ed. & Beyond," ed, 2016. Jefferson, North Carolina: McFarland & Company, Inc., 2014. [14] V. Shaj and K. P. Kaliyamurthie, "A review of Data Leakage [34] M. Tu, K. Spoa-Harty, and L. Xiao, "Data Loss Prevention Detection," IJCSMC Journal, vol. 2, pp. 577-581, 2013. Management and Control: Inside Activity Incident [15] T. T. T. Huong and J. Corner, "The impact of communication Monitoring,Identification, and Tracking in Healthcare Enterprise channels on mobile banking adoption," International Journal of Environments," The Journal of Digital Forensics, Security and Law, Banking Marketing, vol. 34, pp. 78-109, 2014. vol. 10, pp. 27-44, 2015. [16] I. Ponemon, "The Human Factor in Data Protection " Trend Micro, [35] I. H. Witten and E. Frank, "Classification rule," in Data Mining pp. 1-27, 2012. Practical Machine Learning Tools and Techniques, 2nd ed 500 Sansome Street, Suite 400, San Francisco, CA 94111: Morgan [17] T. Pepper. (2016). The people problem: How to manage the human Kaufmann Publisher, 2005, pp. 200-213. factor to shore up security. Available: 96