=Paper=
{{Paper
|id=Vol-2175/paper12
|storemode=property
|title=Auditing DBMSes through Forensic Analysis
|pdfUrl=https://ceur-ws.org/Vol-2175/paper12.pdf
|volume=Vol-2175
|authors=James Wagner
|dblpUrl=https://dblp.org/rec/conf/vldb/Wagner18
}}
==Auditing DBMSes through Forensic Analysis==
Auditing DBMSes through Forensic Analysis James Wagner Supervised by Dr. Alexander Rasin School of Computing, DePaul University Chicago, IL USA jwagne32@depaul.edu ABSTRACT granularity (unavailable or disabled). Moreover, the storage The pervasive use of databases for the storage of critical and itself might be corrupt or contain multiple DBMSes. sensitive information in many organizations has led to an in- Digital forensics provides tools for independent analysis crease in the rate at which databases are exploited in com- with minimal assumptions about the environment. A partic- puter crimes. While there are several techniques and tools ularly important and well-recognized technique is file carv- available for database forensics, they usually assume apriori ing [9], which extracts files (but not DBMS files) from a disk database preparation, such as relying on tamper-detection image, including deleted or corrupted files. Traditional file software to already be in place or use of detailed logging. carving techniques interpret files (e.g., JPEG, PDF) indi- However, investigators need forensic tools and techniques vidually and rely on file headers. DBMS files, on the other that work on poorly-configured databases and make no as- hand, do not maintain a file header and are never indepen- sumptions about the extent of damage in a database. dent (e.g., table contents are stored separately from table In this paper, we present our database forensics meth- name and logical structure information). Even if DBMS ods, which are capable of examining database content from files could be carved, they cannot be meaningfully imported a storage image (disk or RAM) without using logs or any into a different DBMS and must be parsed to retrieve their system metadata. We describe how these methods can be content. To accomplish that task, DBMSes need their own used to detect security breaches in a compromised environ- set of digital forensics rules and tools. ment where the security threat arose from a privileged user Even in an environment with ideal log settings, a DBMS (or someone who has obtained such privileges). cannot necessarily guarantee log accuracy or immunity from tampering. For example, log tampering is a concern when a breach originated from a privileged user such as an adminis- 1. INTRODUCTION trator (DBA or an attacker who obtained DBA privileges). Cyber-crime (e.g., data exfiltration or computer fraud) is Tamper-proof logging mechanisms were proposed in related a significant concern in today’s society. A well-known fact work [7, 10], but these only prevent logs from modifications from security research and practice is that unbreakable secu- and do not account for attacks that skirt logging (e.g., log- rity measures are virtually impossible to create. For exam- ging was disabled). Knowing that even privileged users have ple, 1) incomplete access control restrictions allows users to almost no control of how lowest level storage behaves, an execute commands beyond their intended roles, and 2) users analysis of forensic artifacts provides a unique approach to may illegally obtain privileges by exploiting security holes identify tampering in an untrusted environment. in a Database Management System (DBMS) or OS code The goal of this work is to 1) develop DBMS forensic or through other means (e.g., social engineering). Thus, methods, and 2) use these methods to detect and describe se- in addition to deploying preventive measures (e.g., access curity breaches in compromised environments. Table 1 sum- control), it is necessary to 1) detect security breaches in a marizes the remainder of this paper; future work is bolded. timely fashion, and 2) collect evidence about attacks to de- vise counter-measures and assess the extent of the damage § Summary (e.g., what data was leaked or perturbed). This evidence can We describe our page-level DB forensics methods: provide preparation for legal action or valuable information • Page carving is our DB forensic method. [12, 13]. to prevent future attacks. • DBCarver [15] is our page carving implementation. DBMSes are targeted by criminals because they serve as 2 • A framework to generalize DBCarver output that repositories of data. Therefore, investigators must have the supports application development. capacity to examine and forensically interpret contents of a • DB anti-forensics protects against data theft. DBMS. Currently, an audit log with SQL query history is Forensic-based attack detection: a critical (and perhaps only) source of evidence for investi- • DBDetective [14] detects activity that occurred gators [5] when a malicious operation is suspected. In field when logging was disabled by a DBA. conditions, a DBMS may not provide the necessary logging • DBStorageAuditor [16] detects DBMS direct file 3 tampering (without SQL) by a SysAdmin. Proceedings of the VLDB 2018 Ph.D. Workshop, August 27, 2018. Rio de • We will address DBMS log backdating. Janeiro, Brazil. • We will quantify the accuracy of our systems. A Copyright (C) 2018 for this paper by its authors. Copying permitted for reproducible analysis will support our evidence. private and academic purposes.. Table 1: Summary of the remaining paper. 1 2. DATABASE FORENSICS Row 44 Row Delimiter Row1 Unlike traditional files (e.g., PDF), DBMS files do not 2, 9, 24 0, 0, 0, 1 Customer1 Identifier contain headers that allow for file identification. At the Raw Data Customer1 Customer1 same time, all row-store DBMSes use fixed-size pages to Joe Delimiter Joe Joe 60 store user data, auxiliary data (e.g., indexes and material- Row2 Delete 2, 14, 24 Delete 0, 226, 0, 57 ized views), and the system catalog. DBMS data is accessed Customer2 Customer2 Customer2 and cached in page units. Pages maintain a consistent struc- Jane Jane Jane ture, whereas individual record structure varies throughout 44 Row Meta Row3 DBMS storage, which is why we approach database forensics 2, 9, 24 0, 0, 0, 3 at the page level. In this section, we describe page carving Customer3 Data Customer3 Customer3 including our implementation (DBCarver), future work sup- Jim Jim Jim port application development from DBCarver output, and 1 2 3 anti-forensics techniques that can sanitize and hide data in Figure 1: Deleted row examples: 1-MySQL/Oracle, DBMS storage. 2-PostgreSQL and 3-SQLite 2.1 Page Carving Column-Store and NoSQL DBMSes. Currently, our page Database page carving is a method we previously intro- carving only supports row-store DBMSes. Column-store duced for the reconstruction of relational DBMSes without and NoSQL DBMSes do not use the same pages structure as relying on file system or the DBMS. Page carving is similar row-store DBMSes. Future work will expand our database to traditional file carving [9] in that data, including deleted forensic methods to column-store and NoSQL DBMSes. data, can be reconstructed from images or RAM snapshots without the use of a live system. Forensic tools, such as 2.2 DBCarver Sleuth Kit [1] and EnCASE Forensic [2], are commonly used We previously presented our implementation of page carv- by investigators to reconstruct file system data but are inca- ing called DBCarver [15]. Figure 2 provides an overview of pable of parsing DBMS files. None of the third party recov- DBCarver architecture, which consists of two main compo- ery tools (e.g., [6, 8]) are helpful for independent audit pur- nents: the parameter collector (A) and the carver (F). poses because (at best) they only recover “active” data from current tables. A database forensic tool (just like a forensic A Iteratively load synthetic data B DBMS D Parameter file system tool) should also reconstruct unallocated pieces Detector of data including deleted rows, auxiliary structures (indexes, Capture DB storage C MVs), or buffer cache space. While each DBMS uses its own page layout, a great deal of Generate DB overlap between page layouts allowed us to generalize stor- config. file G Disk Images RAM Images age for most row-store DBMSes. In [12] we presented a comparative page structure study for IBM DB2, Oracle, MS H Reconstructed Storage SQL Server, PostgreSQL, MySQL, SQLite, Firebird, and E ● Data pages (e.g., Apache Derby. In this work, we also described a parameter F table, index) set to define page layout for the purpose of reconstruction. DBCarver DB config. ● Deleted data Deleted Data. When data is deleted, the DBMS initially files ● Catalogs, logs marks it as deleted, rather than explicitly overwriting it. Figure 2: DBCarver architecture. This data becomes unallocated (free listed) storage – in [13] we described the expected lifetime of forensic evidence within The parameter detector loads synthetic data into a DBMS database storage following deletion and defragmentation. (B), captures storage (C), finds pages in storage, and cap- We described three categories of deleted data: records, pages, tures page layout parameters in a configuration file (E) – and values. A record is the minimum deletion unit and can a text file describing page-level layout for that particular be attributed to a DELETE, an old version of an UPDATE, or DBMS. Parameters include those described in [12], and have an aborted transaction. A deleted record is identified by since been expanded to support other metadata. DBCarver its delete marking during page reconstruction. Dropped or automatically generates parameters values for new DBM- rebuilt objects create deleted pages, which are identified by Ses, or new DBMS versions. While most DBMSes retain carving system catalog tables. Values from deleted records the same page layout across versions, we observed different are found in auxiliary objects – e.g., indexes; they are iden- parameter values between PostgreSQL versions 7.3 and 8.4. tified by mapping pointers back to records (only records The carver (F) uses the configuration files to reconstruct but not index values are deleted). We presented generalized any database content from disk images, RAM snapshots, or pointer deconstruction and pointer-record mapping in [16]. any other input file (G). The carver returns storage artifacts Figure 1 visualizes an example of deleted records for sev- (H), such as user records, metadata describing user data, eral DBMSes. In all three pages, Row2-(Customer2, Jane) is deleted data, and system catalogs. deleted while Row1-(Customer1, Joe) and Row2-(Customer3, Jim) are active. Page#1 shows a case when the row de- 2.3 Database Forensic Querying limiter is marked, such as in MySQL or Oracle. Page#2 Even though DBCarver provides a transparent view of shows when the raw data delimiter is marked in PostgreSQL. DBMS storage, the output lacks composability needed for Page#3 shows when the row identifier is marked in SQLite. application development. Applications that use DBCarver Figure 1 omits DB2 and SQL Server example because they output include the work in [16, 17, 15]. Currently, special- only alter the row directory on deletion. ized output is generated for applications that use DBCarver 2 output. Furthermore, analyzing DBCarver output often re- 3. DATABASE SECURITY quires an in-depth understanding of DBMS storage inter- Privileged users (e.g., DBA), by definition, have the abil- nals, which is unreasonable to expect from most users. ity to control and modify access permissions. Therefore, To introduce composability for application development audit logs alone are fundamentally unsuitable for the detec- with database forensic output, we propose a framework that tion of malicious, privileged users. DBMSes do not provide has two goals: 1) defines a standard set of fields that describe many tools to defend against insider threats. Interestingly, forensically extracted user data, system catalogs, auxiliary DBAs have little to no control over how data is stored at objects, and other metadata, and 2) an user-friendly Python the lowest level. Thus, malicious activity will still create library that interprets this set of fields. This module will re- inconsistencies within storage artifacts. In this section, we move the need for specialized knowledge of database storage consider attack vectors that are detectable using database in development of applications based on DBCarver output. forensics methods from Section 2. All of these solutions as- We propose building a unified and standard output that sume that some level of logging was enabled and is available. automates the initial forensic analysis. To do this, we will store metadata from the DBCarver output in a series of 3.1 DBDetective JSON objects that maintain a consistent structure for all Audit logs are a critical piece of evidence for investiga- row-store relational DBMSes, while the original DBMS snap- tors – and existing research has explored tamper-proof logs. shot returned by DBCarver will be loaded back into our inter- However, DBAs can disable logging for legitimate operations nal DBMS. The JSON objects will contain categories based (e.g., bulk loads). Therefore, we consider an attack where on the work of Garfinkel et al. [4], and designed to include logging was disabled, malicious activity was performed, and all available information from DBCarver output. logging was re-enabled. We proposed DBDetective in our Working with the forensic output may require users to previous work [14] to detect activity missing from the logs. have an in-depth understanding of DBMS storage, which is To detect unlogged activity, DBDetective compares the unreasonable because each DBMS uses a highly customized disk images and/or RAM snapshots output from DBCarver storage engine. Such a requirement may prevent users in against the audit logs. We classify two categories of hidden other domains from developing applications. We propose activity: record modifications and read-only queries (i.e., building a Python module to ease the interaction with the SQL SELECT). When a record is inserted or modified the JSON objects and the reconstructed data stored in a DBMS. record itself changes, page metadata may be updated (e.g., This module will contain methods to access individual prop- a delete mark is set) and index page(s) are likely to change. erties from the JSON files. Furthermore, this module will al- We flag any artifacts that cannot be explained by a log entry low for connections to be made between the metadata stored as suspicious, as shown in Figure 3. in the JSON files and the database snapshot stored in a re- lational DBMS. Log File DICE Output T1, DELETE FROM Customer Del. Page Type: Table WHERE City = ‘Chicago’; Flag Structure: Customer 2.4 Anti-Forensics Anti-forensics (AF) is the field of interfering with forensic 1, Christine, Chicago T2, DELETE FROM Customer techniques [3]. We note that digital forensic tools can be WHERE Name LIKE ‘Chris%’; 2, George, New York used by either investigators or criminals, to both protect 3, Christopher, Seattle data and to interfere with a criminal investigation. In this section, we discuss future work that uses AF to protect data. UNATTRIBUTED 4, Thomas, Austin DELETE Two of the most representative AF techniques we consider 5, Mary, Boston are data wiping and steganography. A corporation can use data wiping to erase the already-deleted customer informa- tion to prevent data theft. Steganography is a data hiding Figure 3: Detecting unattributed deleted records. technique – e.g., a means to discretely blow a whistle on com- pany’s wrongdoing. Most prior work in database AF has Figure 3 is an example of unaccounted, deleted row de- been highly DBMS-specific; e.g., Stahlberg erased deleted tection. DBCarver reconstructed 3 deleted rows from Cus- MySQL data by modifying the purge thread in source code [11]. tomer : (1,Christine,Chicago), (3,Christopher,Seattle), and We propose a more generalized sanitization method for all (4,Thomas,Austin). The log file contains two operations: (including closed-source) DBMSes. We distinguish four cat- DELETE FROM Customer WHERE City = ‘Chicago’ (T1 ) and egories of deleted DBMS data to wipe in order to prevent DELETE FROM Customer WHERE Name LIKE ‘Chris%’ (T2 ). Af- unintended data exposure: records, auxiliary data (e.g., in- ter comparing the deleted records to the log file operations, dexes), system catalog, and unallocated pages. To effec- DBDetective returned (4,Thomas,Austin), indicating a tively erase this data, the data itself must be overwritten deleted record that could not be attributed to any of the and page metadata (e.g., checksums and pointers) must be logged deletes. Here, we cannot conclude whether T1 or T2 updated accordingly. We further propose a steganography caused the deletion of (1,Christine,Chicago), but that is not strategy that additively alters the database state through necessary to identify record #4 as an unattributed delete. database file modification. This approach bypasses all con- When a SELECT query reads a table or a materialized view straints and logging mechanisms since the operation is per- from disk, it ultimately uses one of two access patterns: a formed without the DBMS. For example, domain constraints full table scan or an index access. Both of these query access can be violated, NULL can be added to a primary key column, types produce a consistent, repeatable caching pattern. Us- and foreign key constraints can be violated – making it un- ing metadata from the pages in the buffer cache, we identify likely that the hidden row is found through regular queries. caching patterns and match them to the logged commands. 3 3.2 DBStorageAuditor 4. CONCLUSION Privileged OS users commonly have access to database In this work, we presented page carving and our page files. Consider a SysAdmin who, acting as the root, mali- carving implementation, DBCarver. Future work will expand ciously edits a DBMS file in a Hex editor or through Python. this method to include support for column-store and NoSQL The DBMS is unaware of external file write activity taking DBMSes, offer meta-querying functionality, and incorporate place outside its own programmatic access and thus cannot anti-forensic methods to further protect data. We also pre- log it. Such an attack is a ‘black-hat’ application of anti- sented methods that use page carving to detect security forensics discussed in Section 2.4. In our previous work [16], breaches in untrusted environments. DBDetective consid- we proposed DBStorageAuditor to detect database file tam- ered an attack where logging was disabled, DBStorageAuditor pering. addressed DBMS file tampering, and future work will ad- To detect database file tampering, DBStorageAuditor [16] dress tampering of the system global clock to backdate logs. uses indexes to verify the integrity of table data. We first verify the integrity of the indexes by checking for tampering- 5. ACKNOWLEDGMENTS based inconsistencies within the B-Tree structure. Once the This work was partially funded by the US National Sci- index integrity is verified, we deconstruct the index point- ence Foundation Grant CNF-1656268. ers and match them to table records using the table page metadata; we generalized the deconstruction of index point- ers for all major DBMSes. We organize the index pointers 6. REFERENCES [1] B. Carrier. The sleuth kit. TSK. based on physical location to keep our matching approach http://www.sleuthkit.org/sleuthkit, 2011. scalable. Finally, any extraneous data or erased data found [2] L. Garber. Encase: A case study in computer-forensic through index and table comparison is flagged as suspicious. technology. IEEE Computer Magazine January, 2001. 3.3 Event Timeline Analysis [3] S. Garfinkel. Anti-forensics: Techniques, detection and countermeasures. In 2nd International Conference on Privileged users with access to the DBMS server have i-Warfare and Security, volume 20087, pages 77–84. the capability to change server information, specifically the Citeseer, 2007. global clock. This quietly affects the veracity of DBMS au- dit logs. Consider a system administrator who changes the [4] S. L. Garfinkel. Automating disk forensic processing server global clock to an earlier date, performs malicious with sleuthkit, xml, and python. SADFE, 2009. activity, and resets the global clock. Such an attack back- [5] R. T. Mercuri. On auditing audit trails. dates activity without altering the log files, and disguises Communications of the ACM, 46(1):17–20, 2003. when the actual execution time of the malicious activity. [6] OfficeRecovery. Recovery for mysql. As future work, we will detect such attempts to backdate http://www.officerecovery.com/. log entries. [7] J. M. Peha. Electronic commerce with verifiable audit In such an environment, any global or logical clock can not trails. In Proceedings of ISOC. Citeseer, 1999. be assumed to be reliable. Therefore, to create a timeline of [8] Percona. Percona data recovery tool for innodb. events, we believe it is necessary to use storage metadata, https://launchpad.net/percona-data-recovery-tool-for- which even a privileged user cannot modify. The internal innodb. RowID pseudo-column is of particular interest to construct a [9] G. G. Richard III and V. Roussev. Scalpel: A frugal, timeline. RowID is used by indexes and reflects the physical high performance file carver. In DFRWS, 2005. location of a record including its PageID. Whenever a page [10] R. T. Snodgrass et al. Tamper detection in audit logs. is modified, we can store the PageID to know when data was In Proceedings of the Thirtieth international modified. Thus, the order of the PageIDs must be consistent conference on Very large data bases-Volume 30, pages with the order of the log events. We will propose tamper- 504–515. VLDB Endowment, 2004. proof techniques to store the PageID. [11] P. Stahlberg, G. Miklau, and B. N. Levine. Threats to privacy in the forensic analysis of database systems. In 3.4 Quantitative Analysis and Reproducibility Proceedings of the 2007 ACM SIGMOD international As future work, we will determine the detection accuracy conference on Management of data, pages 91–102. for each attack described in this section. For each detection ACM, Citeseer, 2007. type, we will compute a confidence rating based on a variety [12] J. Wagner et al. Database forensic analysis through of environment variables (e.g., buffer cache size, volume of internal structure carving. In DFRWS, 2015. operations, and DBMS storage engine). For example, given [13] J. Wagner et al. Database image content explorer: a low volume of DELETE operations in Oracle, DBDetective Carving data that does not officially exist. In DFRWS, would detect attacks with higher accuracy because Oracle 2016. controls storage with a percent page utilization. This engine [14] J. Wagner et al. Carving database storage to detect setting prevents deleted records from being overwritten until and trace security breaches. In DFRWS, 2017. a page contains a significant quantity of deleted data. [15] J. Wagner et al. Database forensic analysis with To verify the presence of malicious operations, a repeat- dbcarver. In CIDR, 2017. able analysis analysis must be guaranteed. We will develop algorithms to collect the minimal subset of storage artifacts [16] J. Wagner et al. Detecting database file tampering needed to reproduce our results. These collected storage through page carving. In EDBT, 2018. artifacts must be sufficient to verify the security breach in- [17] J. Wagner, A. Rasin, D. H. T. That, and T. Malik. dependent of our analysis. For example, such functionality Pli: Augmenting live databases with custom clustered is needed to present evidence in court. indexes. In SSDBM, page 36. ACM, 2017. 4