=Paper=
{{Paper
|id=Vol-2175/paper12
|storemode=property
|title=Auditing DBMSes through Forensic Analysis
|pdfUrl=https://ceur-ws.org/Vol-2175/paper12.pdf
|volume=Vol-2175
|authors=James Wagner
|dblpUrl=https://dblp.org/rec/conf/vldb/Wagner18
}}
==Auditing DBMSes through Forensic Analysis==
<pdf width="1500px">https://ceur-ws.org/Vol-2175/paper12.pdf</pdf>
<pre>
                  Auditing DBMSes through Forensic Analysis

                                                              James Wagner
                                                  Supervised by Dr. Alexander Rasin
                                                School of Computing, DePaul University
                                                           Chicago, IL USA
                                                        jwagne32@depaul.edu


ABSTRACT                                                                      granularity (unavailable or disabled). Moreover, the storage
The pervasive use of databases for the storage of critical and                itself might be corrupt or contain multiple DBMSes.
sensitive information in many organizations has led to an in-                    Digital forensics provides tools for independent analysis
crease in the rate at which databases are exploited in com-                   with minimal assumptions about the environment. A partic-
puter crimes. While there are several techniques and tools                    ularly important and well-recognized technique is file carv-
available for database forensics, they usually assume apriori                 ing [9], which extracts files (but not DBMS files) from a disk
database preparation, such as relying on tamper-detection                     image, including deleted or corrupted files. Traditional file
software to already be in place or use of detailed logging.                   carving techniques interpret files (e.g., JPEG, PDF) indi-
However, investigators need forensic tools and techniques                     vidually and rely on file headers. DBMS files, on the other
that work on poorly-configured databases and make no as-                      hand, do not maintain a file header and are never indepen-
sumptions about the extent of damage in a database.                           dent (e.g., table contents are stored separately from table
  In this paper, we present our database forensics meth-                      name and logical structure information). Even if DBMS
ods, which are capable of examining database content from                     files could be carved, they cannot be meaningfully imported
a storage image (disk or RAM) without using logs or any                       into a different DBMS and must be parsed to retrieve their
system metadata. We describe how these methods can be                         content. To accomplish that task, DBMSes need their own
used to detect security breaches in a compromised environ-                    set of digital forensics rules and tools.
ment where the security threat arose from a privileged user                      Even in an environment with ideal log settings, a DBMS
(or someone who has obtained such privileges).                                cannot necessarily guarantee log accuracy or immunity from
                                                                              tampering. For example, log tampering is a concern when a
                                                                              breach originated from a privileged user such as an adminis-
1.    INTRODUCTION                                                            trator (DBA or an attacker who obtained DBA privileges).
   Cyber-crime (e.g., data exfiltration or computer fraud) is                 Tamper-proof logging mechanisms were proposed in related
a significant concern in today’s society. A well-known fact                   work [7, 10], but these only prevent logs from modifications
from security research and practice is that unbreakable secu-                 and do not account for attacks that skirt logging (e.g., log-
rity measures are virtually impossible to create. For exam-                   ging was disabled). Knowing that even privileged users have
ple, 1) incomplete access control restrictions allows users to                almost no control of how lowest level storage behaves, an
execute commands beyond their intended roles, and 2) users                    analysis of forensic artifacts provides a unique approach to
may illegally obtain privileges by exploiting security holes                  identify tampering in an untrusted environment.
in a Database Management System (DBMS) or OS code                                The goal of this work is to 1) develop DBMS forensic
or through other means (e.g., social engineering). Thus,                      methods, and 2) use these methods to detect and describe se-
in addition to deploying preventive measures (e.g., access                    curity breaches in compromised environments. Table 1 sum-
control), it is necessary to 1) detect security breaches in a                 marizes the remainder of this paper; future work is bolded.
timely fashion, and 2) collect evidence about attacks to de-
vise counter-measures and assess the extent of the damage                      §   Summary
(e.g., what data was leaked or perturbed). This evidence can
                                                                                 We describe our page-level DB forensics methods:
provide preparation for legal action or valuable information
                                                                                 • Page carving is our DB forensic method. [12, 13].
to prevent future attacks.
                                                                                 • DBCarver [15] is our page carving implementation.
   DBMSes are targeted by criminals because they serve as                      2
                                                                                 • A framework to generalize DBCarver output that
repositories of data. Therefore, investigators must have the
                                                                                   supports application development.
capacity to examine and forensically interpret contents of a
                                                                                   • DB anti-forensics protects against data theft.
DBMS. Currently, an audit log with SQL query history is
                                                                                 Forensic-based attack detection:
a critical (and perhaps only) source of evidence for investi-
                                                                                 • DBDetective [14] detects activity that occurred
gators [5] when a malicious operation is suspected. In field
                                                                                 when logging was disabled by a DBA.
conditions, a DBMS may not provide the necessary logging
                                                                                 • DBStorageAuditor [16] detects DBMS direct file
                                                                               3
                                                                                 tampering (without SQL) by a SysAdmin.
Proceedings of the VLDB 2018 Ph.D. Workshop, August 27, 2018. Rio de             • We will address DBMS log backdating.
Janeiro, Brazil.                                                                 • We will quantify the accuracy of our systems. A
Copyright (C) 2018 for this paper by its authors. Copying permitted for            reproducible analysis will support our evidence.
private and academic purposes..
                                                                                   Table 1: Summary of the remaining paper.

                                                                          1
2.    DATABASE FORENSICS                                                                       Row
                                                                                 44                               Row
                                                                                               Delimiter


                                                                      Row1
   Unlike traditional files (e.g., PDF), DBMS files do not                                               2, 9, 24                       0, 0, 0, 1
                                                                             Customer1                            Identifier
contain headers that allow for file identification. At the                                     Raw Data Customer1                      Customer1
same time, all row-store DBMSes use fixed-size pages to                         Joe            Delimiter   Joe                             Joe
                                                                                 60
store user data, auxiliary data (e.g., indexes and material-


                                                                      Row2
                                                                                    Delete   2, 14, 24 Delete 0, 226, 0, 57
ized views), and the system catalog. DBMS data is accessed               Customer2          Customer2          Customer2
and cached in page units. Pages maintain a consistent struc-                Jane               Jane               Jane
ture, whereas individual record structure varies throughout                   44   Row Meta


                                                                      Row3
DBMS storage, which is why we approach database forensics                                    2, 9, 24           0, 0, 0, 3
at the page level. In this section, we describe page carving             Customer3 Data     Customer3          Customer3
including our implementation (DBCarver), future work sup-                    Jim                Jim                Jim
port application development from DBCarver output, and                         1                 2                  3
anti-forensics techniques that can sanitize and hide data in          Figure 1: Deleted row examples: 1-MySQL/Oracle,
DBMS storage.                                                         2-PostgreSQL and 3-SQLite

2.1   Page Carving                                                      Column-Store and NoSQL DBMSes. Currently, our page
   Database page carving is a method we previously intro-             carving only supports row-store DBMSes. Column-store
duced for the reconstruction of relational DBMSes without             and NoSQL DBMSes do not use the same pages structure as
relying on file system or the DBMS. Page carving is similar           row-store DBMSes. Future work will expand our database
to traditional file carving [9] in that data, including deleted       forensic methods to column-store and NoSQL DBMSes.
data, can be reconstructed from images or RAM snapshots
without the use of a live system. Forensic tools, such as             2.2        DBCarver
Sleuth Kit [1] and EnCASE Forensic [2], are commonly used               We previously presented our implementation of page carv-
by investigators to reconstruct file system data but are inca-        ing called DBCarver [15]. Figure 2 provides an overview of
pable of parsing DBMS files. None of the third party recov-           DBCarver architecture, which consists of two main compo-
ery tools (e.g., [6, 8]) are helpful for independent audit pur-       nents: the parameter collector (A) and the carver (F).
poses because (at best) they only recover “active” data from
current tables. A database forensic tool (just like a forensic         A                     Iteratively load synthetic data B    DBMS D
                                                                             Parameter
file system tool) should also reconstruct unallocated pieces                 Detector
of data including deleted rows, auxiliary structures (indexes,                                 Capture DB storage C
MVs), or buffer cache space.
   While each DBMS uses its own page layout, a great deal of                  Generate DB
overlap between page layouts allowed us to generalize stor-                   config. file                       G
                                                                                                Disk Images          RAM Images
age for most row-store DBMSes. In [12] we presented a
comparative page structure study for IBM DB2, Oracle, MS                                                                    H Reconstructed Storage
SQL Server, PostgreSQL, MySQL, SQLite, Firebird, and                   E
                                                                                                                              ● Data pages (e.g.,
Apache Derby. In this work, we also described a parameter                                         F
                                                                                                                               table, index)
set to define page layout for the purpose of reconstruction.                                          DBCarver
                                                                        DB config.                                            ● Deleted data
   Deleted Data. When data is deleted, the DBMS initially                 files                                               ● Catalogs, logs
marks it as deleted, rather than explicitly overwriting it.
                                                                                      Figure 2: DBCarver architecture.
This data becomes unallocated (free listed) storage – in [13]
we described the expected lifetime of forensic evidence within           The parameter detector loads synthetic data into a DBMS
database storage following deletion and defragmentation.              (B), captures storage (C), finds pages in storage, and cap-
We described three categories of deleted data: records, pages,        tures page layout parameters in a configuration file (E) –
and values. A record is the minimum deletion unit and can             a text file describing page-level layout for that particular
be attributed to a DELETE, an old version of an UPDATE, or            DBMS. Parameters include those described in [12], and have
an aborted transaction. A deleted record is identified by             since been expanded to support other metadata. DBCarver
its delete marking during page reconstruction. Dropped or             automatically generates parameters values for new DBM-
rebuilt objects create deleted pages, which are identified by         Ses, or new DBMS versions. While most DBMSes retain
carving system catalog tables. Values from deleted records            the same page layout across versions, we observed different
are found in auxiliary objects – e.g., indexes; they are iden-        parameter values between PostgreSQL versions 7.3 and 8.4.
tified by mapping pointers back to records (only records                 The carver (F) uses the configuration files to reconstruct
but not index values are deleted). We presented generalized           any database content from disk images, RAM snapshots, or
pointer deconstruction and pointer-record mapping in [16].            any other input file (G). The carver returns storage artifacts
   Figure 1 visualizes an example of deleted records for sev-         (H), such as user records, metadata describing user data,
eral DBMSes. In all three pages, Row2-(Customer2, Jane) is            deleted data, and system catalogs.
deleted while Row1-(Customer1, Joe) and Row2-(Customer3,
Jim) are active. Page#1 shows a case when the row de-                 2.3        Database Forensic Querying
limiter is marked, such as in MySQL or Oracle. Page#2                    Even though DBCarver provides a transparent view of
shows when the raw data delimiter is marked in PostgreSQL.            DBMS storage, the output lacks composability needed for
Page#3 shows when the row identifier is marked in SQLite.             application development. Applications that use DBCarver
Figure 1 omits DB2 and SQL Server example because they                output include the work in [16, 17, 15]. Currently, special-
only alter the row directory on deletion.                             ized output is generated for applications that use DBCarver


                                                                  2
output. Furthermore, analyzing DBCarver output often re-               3.    DATABASE SECURITY
quires an in-depth understanding of DBMS storage inter-                   Privileged users (e.g., DBA), by definition, have the abil-
nals, which is unreasonable to expect from most users.                 ity to control and modify access permissions. Therefore,
   To introduce composability for application development              audit logs alone are fundamentally unsuitable for the detec-
with database forensic output, we propose a framework that             tion of malicious, privileged users. DBMSes do not provide
has two goals: 1) defines a standard set of fields that describe       many tools to defend against insider threats. Interestingly,
forensically extracted user data, system catalogs, auxiliary           DBAs have little to no control over how data is stored at
objects, and other metadata, and 2) an user-friendly Python            the lowest level. Thus, malicious activity will still create
library that interprets this set of fields. This module will re-       inconsistencies within storage artifacts. In this section, we
move the need for specialized knowledge of database storage            consider attack vectors that are detectable using database
in development of applications based on DBCarver output.               forensics methods from Section 2. All of these solutions as-
   We propose building a unified and standard output that              sume that some level of logging was enabled and is available.
automates the initial forensic analysis. To do this, we will
store metadata from the DBCarver output in a series of                 3.1    DBDetective
JSON objects that maintain a consistent structure for all                 Audit logs are a critical piece of evidence for investiga-
row-store relational DBMSes, while the original DBMS snap-             tors – and existing research has explored tamper-proof logs.
shot returned by DBCarver will be loaded back into our inter-          However, DBAs can disable logging for legitimate operations
nal DBMS. The JSON objects will contain categories based               (e.g., bulk loads). Therefore, we consider an attack where
on the work of Garfinkel et al. [4], and designed to include           logging was disabled, malicious activity was performed, and
all available information from DBCarver output.                        logging was re-enabled. We proposed DBDetective in our
   Working with the forensic output may require users to               previous work [14] to detect activity missing from the logs.
have an in-depth understanding of DBMS storage, which is                  To detect unlogged activity, DBDetective compares the
unreasonable because each DBMS uses a highly customized                disk images and/or RAM snapshots output from DBCarver
storage engine. Such a requirement may prevent users in                against the audit logs. We classify two categories of hidden
other domains from developing applications. We propose                 activity: record modifications and read-only queries (i.e.,
building a Python module to ease the interaction with the              SQL SELECT). When a record is inserted or modified the
JSON objects and the reconstructed data stored in a DBMS.              record itself changes, page metadata may be updated (e.g.,
This module will contain methods to access individual prop-            a delete mark is set) and index page(s) are likely to change.
erties from the JSON files. Furthermore, this module will al-          We flag any artifacts that cannot be explained by a log entry
low for connections to be made between the metadata stored             as suspicious, as shown in Figure 3.
in the JSON files and the database snapshot stored in a re-
lational DBMS.
                                                                                  Log File                       DICE Output
                                                                         T1, DELETE FROM Customer         Del. Page Type: Table
                                                                         WHERE City = ‘Chicago’;          Flag Structure: Customer
2.4    Anti-Forensics
   Anti-forensics (AF) is the field of interfering with forensic                                               1, Christine, Chicago
                                                                         T2, DELETE FROM Customer
techniques [3]. We note that digital forensic tools can be               WHERE Name LIKE ‘Chris%’;             2, George, New York
used by either investigators or criminals, to both protect
                                                                                                               3, Christopher, Seattle
data and to interfere with a criminal investigation. In this
section, we discuss future work that uses AF to protect data.                        UNATTRIBUTED              4, Thomas, Austin
                                                                                     DELETE
   Two of the most representative AF techniques we consider                                                    5, Mary, Boston
are data wiping and steganography. A corporation can use
data wiping to erase the already-deleted customer informa-
tion to prevent data theft. Steganography is a data hiding             Figure 3: Detecting unattributed deleted records.
technique – e.g., a means to discretely blow a whistle on com-
pany’s wrongdoing. Most prior work in database AF has                     Figure 3 is an example of unaccounted, deleted row de-
been highly DBMS-specific; e.g., Stahlberg erased deleted              tection. DBCarver reconstructed 3 deleted rows from Cus-
MySQL data by modifying the purge thread in source code [11].          tomer : (1,Christine,Chicago), (3,Christopher,Seattle), and
We propose a more generalized sanitization method for all              (4,Thomas,Austin). The log file contains two operations:
(including closed-source) DBMSes. We distinguish four cat-             DELETE FROM Customer WHERE City = ‘Chicago’ (T1 ) and
egories of deleted DBMS data to wipe in order to prevent               DELETE FROM Customer WHERE Name LIKE ‘Chris%’ (T2 ). Af-
unintended data exposure: records, auxiliary data (e.g., in-           ter comparing the deleted records to the log file operations,
dexes), system catalog, and unallocated pages. To effec-               DBDetective returned (4,Thomas,Austin), indicating a
tively erase this data, the data itself must be overwritten            deleted record that could not be attributed to any of the
and page metadata (e.g., checksums and pointers) must be               logged deletes. Here, we cannot conclude whether T1 or T2
updated accordingly. We further propose a steganography                caused the deletion of (1,Christine,Chicago), but that is not
strategy that additively alters the database state through             necessary to identify record #4 as an unattributed delete.
database file modification. This approach bypasses all con-               When a SELECT query reads a table or a materialized view
straints and logging mechanisms since the operation is per-            from disk, it ultimately uses one of two access patterns: a
formed without the DBMS. For example, domain constraints               full table scan or an index access. Both of these query access
can be violated, NULL can be added to a primary key column,            types produce a consistent, repeatable caching pattern. Us-
and foreign key constraints can be violated – making it un-            ing metadata from the pages in the buffer cache, we identify
likely that the hidden row is found through regular queries.           caching patterns and match them to the logged commands.


                                                                   3
3.2   DBStorageAuditor                                               4.   CONCLUSION
   Privileged OS users commonly have access to database                In this work, we presented page carving and our page
files. Consider a SysAdmin who, acting as the root, mali-            carving implementation, DBCarver. Future work will expand
ciously edits a DBMS file in a Hex editor or through Python.         this method to include support for column-store and NoSQL
The DBMS is unaware of external file write activity taking           DBMSes, offer meta-querying functionality, and incorporate
place outside its own programmatic access and thus cannot            anti-forensic methods to further protect data. We also pre-
log it. Such an attack is a ‘black-hat’ application of anti-         sented methods that use page carving to detect security
forensics discussed in Section 2.4. In our previous work [16],       breaches in untrusted environments. DBDetective consid-
we proposed DBStorageAuditor to detect database file tam-            ered an attack where logging was disabled, DBStorageAuditor
pering.                                                              addressed DBMS file tampering, and future work will ad-
   To detect database file tampering, DBStorageAuditor [16]          dress tampering of the system global clock to backdate logs.
uses indexes to verify the integrity of table data. We first
verify the integrity of the indexes by checking for tampering-       5.   ACKNOWLEDGMENTS
based inconsistencies within the B-Tree structure. Once the            This work was partially funded by the US National Sci-
index integrity is verified, we deconstruct the index point-         ence Foundation Grant CNF-1656268.
ers and match them to table records using the table page
metadata; we generalized the deconstruction of index point-
ers for all major DBMSes. We organize the index pointers
                                                                     6.   REFERENCES
                                                                      [1] B. Carrier. The sleuth kit. TSK.
based on physical location to keep our matching approach
                                                                          http://www.sleuthkit.org/sleuthkit, 2011.
scalable. Finally, any extraneous data or erased data found
                                                                      [2] L. Garber. Encase: A case study in computer-forensic
through index and table comparison is flagged as suspicious.
                                                                          technology. IEEE Computer Magazine January, 2001.
3.3   Event Timeline Analysis                                         [3] S. Garfinkel. Anti-forensics: Techniques, detection and
                                                                          countermeasures. In 2nd International Conference on
   Privileged users with access to the DBMS server have
                                                                          i-Warfare and Security, volume 20087, pages 77–84.
the capability to change server information, specifically the
                                                                          Citeseer, 2007.
global clock. This quietly affects the veracity of DBMS au-
dit logs. Consider a system administrator who changes the             [4] S. L. Garfinkel. Automating disk forensic processing
server global clock to an earlier date, performs malicious                with sleuthkit, xml, and python. SADFE, 2009.
activity, and resets the global clock. Such an attack back-           [5] R. T. Mercuri. On auditing audit trails.
dates activity without altering the log files, and disguises              Communications of the ACM, 46(1):17–20, 2003.
when the actual execution time of the malicious activity.             [6] OfficeRecovery. Recovery for mysql.
As future work, we will detect such attempts to backdate                  http://www.officerecovery.com/.
log entries.                                                          [7] J. M. Peha. Electronic commerce with verifiable audit
   In such an environment, any global or logical clock can not            trails. In Proceedings of ISOC. Citeseer, 1999.
be assumed to be reliable. Therefore, to create a timeline of         [8] Percona. Percona data recovery tool for innodb.
events, we believe it is necessary to use storage metadata,               https://launchpad.net/percona-data-recovery-tool-for-
which even a privileged user cannot modify. The internal                  innodb.
RowID pseudo-column is of particular interest to construct a          [9] G. G. Richard III and V. Roussev. Scalpel: A frugal,
timeline. RowID is used by indexes and reflects the physical              high performance file carver. In DFRWS, 2005.
location of a record including its PageID. Whenever a page           [10] R. T. Snodgrass et al. Tamper detection in audit logs.
is modified, we can store the PageID to know when data was                In Proceedings of the Thirtieth international
modified. Thus, the order of the PageIDs must be consistent               conference on Very large data bases-Volume 30, pages
with the order of the log events. We will propose tamper-                 504–515. VLDB Endowment, 2004.
proof techniques to store the PageID.                                [11] P. Stahlberg, G. Miklau, and B. N. Levine. Threats to
                                                                          privacy in the forensic analysis of database systems. In
3.4   Quantitative Analysis and Reproducibility                           Proceedings of the 2007 ACM SIGMOD international
   As future work, we will determine the detection accuracy               conference on Management of data, pages 91–102.
for each attack described in this section. For each detection             ACM, Citeseer, 2007.
type, we will compute a confidence rating based on a variety         [12] J. Wagner et al. Database forensic analysis through
of environment variables (e.g., buffer cache size, volume of              internal structure carving. In DFRWS, 2015.
operations, and DBMS storage engine). For example, given             [13] J. Wagner et al. Database image content explorer:
a low volume of DELETE operations in Oracle, DBDetective                  Carving data that does not officially exist. In DFRWS,
would detect attacks with higher accuracy because Oracle                  2016.
controls storage with a percent page utilization. This engine
                                                                     [14] J. Wagner et al. Carving database storage to detect
setting prevents deleted records from being overwritten until
                                                                          and trace security breaches. In DFRWS, 2017.
a page contains a significant quantity of deleted data.
                                                                     [15] J. Wagner et al. Database forensic analysis with
   To verify the presence of malicious operations, a repeat-
                                                                          dbcarver. In CIDR, 2017.
able analysis analysis must be guaranteed. We will develop
algorithms to collect the minimal subset of storage artifacts        [16] J. Wagner et al. Detecting database file tampering
needed to reproduce our results. These collected storage                  through page carving. In EDBT, 2018.
artifacts must be sufficient to verify the security breach in-       [17] J. Wagner, A. Rasin, D. H. T. That, and T. Malik.
dependent of our analysis. For example, such functionality                Pli: Augmenting live databases with custom clustered
is needed to present evidence in court.                                   indexes. In SSDBM, page 36. ACM, 2017.


                                                                 4

</pre>