=Paper= {{Paper |id=None |storemode=property |title=MVAL: Addressing the Insider Threat by Valuation-based Query Processing |pdfUrl=https://ceur-ws.org/Vol-1020/paper_10.pdf |volume=Vol-1020 |dblpUrl=https://dblp.org/rec/conf/gvd/BarthelS13 }} ==MVAL: Addressing the Insider Threat by Valuation-based Query Processing== https://ceur-ws.org/Vol-1020/paper_10.pdf
                      MVAL: Addressing the Insider Threat by
                        Valuation-based Query Processing

                            Stefan Barthel                                            Eike Schallehn
         Institute of Technical and Business Information              Institute of Technical and Business Information
                             Systems                                                      Systems
            Otto-von-Guericke-University Magdeburg                       Otto-von-Guericke-University Magdeburg
                       Magdeburg, Germany                                           Magdeburg, Germany
                    stefan.barthel@ovgu.de                                     eike.schallehn@ovgu.de

ABSTRACT                                                              by considering relational and SQL operations and describing
The research presented in this paper is inspired by problems          possible valuation derivations for them.
of conventional database security mechanisms to address the
insider threat, i.e. authorized users abusing granted privi-          2.   PRINCIPLES OF DATA VALUATION
leges for illegal or disadvantageous accesses. The basic idea           In [1] we outlined our approach of a leakage-resistant data
is to restrict the data one user can access by a valuation            valuation which computes a monetary value (mval) for each
of data, e.g. a monetary value of data items, and, based              query. This is based on the following basic principles: Every
on that, introducing limits for accesses. The specific topic          attribute Ai ∈ R of a base relation schema R is valuated by
of the present paper is the conceptual background, how the            a certain monetary value (mval(Ai ) ∈ R). The attribute
process of querying valuated data leads to valuated query             valuation for base tables are part of the data dictionary and
results. For this, by analyzing operations of the relational          can for instance be specified as an extension of the SQL
algebra and SQL, derivation functions are added.                      DDL:
                                                                      CREATE TABLE table_1
1.   INTRODUCTION                                                     (
   An acknowledged main threat to data security are fraud-               attribute_1 INT PRIMARY KEY MVAL 0.1,
ulent accesses by authorized users, often referred to as the             attribute_2 UNIQUE COMMENT ’important’ MVAL 10,
insider threat [2]. To address this problem, in [1] we pro-              attribute_3 DATE
posed a novel approach of detecting authorization misuse              );
based on a valuation of data, i.e. of an assigned descrip-
                                                                        With these attribute valuations, we derive a monetary
tion of the worth of data management in a system, which
                                                                      value for one tuple t ∈ r(R) given by Equation (1), as well
could for instance be interpreted as monetary values. Ac-
                                                                      as the total monetary value of the relation r(R) given by
cordingly, possible security leaks exist if users access more
                                                                      Equation (2), if data is extracted by a query.
valuable data than they are allowed to within a query or
cumulated over a given time period. E.g., a bank account                                                 X
manager accessing a single customer record does not repre-                         mval(t ∈ r(R)) =             mval(Ai )        (1)
sent a problem, while dumping all data in an unrestricted                                               Ai ∈R
query should be prohibited. Here, common approaches like                               X
role-based security mechanisms typically fail.                         mval(r(R)) =            mval(t) = |r(R)| ∗ mval(t ∈ r(R)) (2)
   According to our proposal, the data valuation is first of                          t∈r(R)
all based on the relation definitions, i.e. as part of the data
dictionary information about the value of data items such as             To be able to consider the mval for a query as well as sev-
attribute values and, derived from that, entire tuples and re-        eral queries of one user over a certain period of time, we log
lations. Then, a key question is how the valuation of a query         all mvals in an alert log and compare the current cumulated
result can be derived from the input valuations, because per-         mval per user to two thresholds. If a user exceeds the first
forming operations on the base data causes transformations            threshold – suspicious threshold – she will be categorized as
that have an impact on the data’s significance.                       suspect. After additionally exceeding the truncation thresh-
   This problem is addressed in the research presented here           old her query output will be limited by hiding tuples and
                                                                      presenting a user notification. We embedded our approach
                                                                      in an additional layer in the security defense-in-depth model
                                                                      for raw data, which we have enhanced by a backup entity
                                                                      (see Fig. 1). Furthermore, encryption has to be established
                                                                      to prevent data theft via unauthorized, physical reads as
                                                                      well as backup theft. In this paper we are going into detail
                                                                      about how to handle operations like joins, aggregate func-
                                                                      tions, stored procedures as well as common functions.
25th GI-Workshop on Foundations of Databases (Grundlagen von Daten-
banken), 28.05.2013 - 31.05.2013, Ilmenau, Germany.                      Most of the data stored in a database can be easily iden-
Copyright is held by the author/owner(s).                             tified as directly interpretable. One example would be an
                  Views                                             by uncertainty has to be less valuable than properly set at-
                                                                    tribute values. Therefore, the monetary value should be
             Access Control                                         only a percentage of the respective monetary value of an at-
                                                                    tribute value. If several source attribute values are involved,
             Data Valuation                                         we recommend to value the computed attribute value as a
               Encryption                          DBMS Level       percentage of the monetary value of all participating source
                  Raw
                                                                    attribute values. In general, we suggest a maximum of 50%
                  Data                                              for both valuations. Furthermore, we need to consider the
                                                                    overall purpose of our leakage-resistant data valuation which
                                                  Physical Level    shall prevent extractions of large amounts of data. There-
                                                                    fore, the percentage needs to be increased with the amount
                                                                    of data, but not in a way that an unset or unknown attribute
                                                                    value becomes equivalent valuable than a properly set one.
                                                                    For that reason, exponential growth is not a suitable option.
                 Backup                                             Additionally, we have to focus a certain area of application,
                  Data                                              because a trillion attributes (1012 ) are conceivable whereas a
               Encryption                                           septillion attributes (1024 ) are currently not realistic. From
                            Backup                                  the overall view on our data valuation, we assume depending
                                                                    on the application, that the extraction of sensitive data be-
Figure 1: Security defense model on DBMS and                        comes critical when 103 up to 109 attribute values will be ex-
physical level                                                      tracted. Therefore, the growth of our uncertainty factor UF
                                                                    increases much more until 109 attribute values than after-
                                                                    wards, which predominantly points to a logarithmic growth.
employee-table, where each tuple has a value for attributes         We also do not need to have a huge difference of the factor if
"first name", "surname" and "gender". In this case, it is           theoretically much more attribute values shall be extracted
also quite easy to calculate the monetary value for a query         (e.g., 1014 and more), because with respect to an extraction
(r(Remp )) by simply summarizing all mval per attribute and         limiting approach, it is way too much data to return. This
multiply those with the number of involved rows (see Eq.            assumption does also refer to a logarithmic increase. We
(3)).                                                               conclude that the most promising formula that was adapted
                                                                    to fit our needs is shown in Eq. (4).
                           X
    mval(r(Remp )) =                mval(Ai ) ∗ |r(Remp )|    (3)
                         Ai ∈Remp
                                                                                          1
                                                                                  UF =      log10 (|valAi ,...,Ak | + 1)       (4)
                                                                                         30
However, it becomes more challenging if an additional at-
tribute "license plate number" is added, which does have
some unset or unknown attribute values - in most cases
NULL values. By knowing there is a NULL value for a                 3.    DERIVING VALUATIONS FOR
certain record, this could be interpreted as either simply un-
known whether there is any car or unset because this person
                                                                          DATABASE OPERATIONS
has no car. So there is an uncertainty that could lead to an          In this chapter we will describe valuation derivation for
information gain which would be uncovered if no adequate            main database operations by first discussing core relational
valuation exists. Some other potentially implicit informa-          operations. Furthermore, we address specifics of join oper-
tion gains are originated from joins and aggregate functions        ations and finally functions (aggregate, user-defined, stored
which we do mention in the regarding section.                       procedures) which are defined in SQL.
   Because the terms information gain and information loss
are widely used and do not have a uniform definition, we do
define them for further use. We call a situation where an           3.1    Core Operations of Relational Algebra
attacker received new data (resp. information) information             The relational algebra [4] consists of six basic operators,
gain and the same situation in the view of the data owner           where selection, projection, and rename are unary opera-
an information loss.                                                tions and union, set difference, and Cartesian product are
                                                                    operators that take two relations as input (binary opera-
Uncertainty Factor                                                  tion). Due to the fact that applying rename to a relation or
Some operators used for query processing obviously reduce           attribute will not change the monetary value, we will only
the information content of the result set (e.g. selection, ag-      consider the rest.
gregations, semi joins, joins with resulting NULL values),
but there is still an uncertain, implicit information gain.
Since, the information gain by uncertainty is blurry, mean-         Projection
ing in some cases more indicative than in others, we have           The projection πattr_list (r(R)) is a unary operation and
to distinguish uncertainty of one attribute value generated         eliminates all attributes (columns) of an input relation r(R)
out of one source attribute value (e.g., generated NULL val-        except those mentioned in the attribute list. For computa-
ues) and attribute values which are derived from informa-           tion of the monetary value of such a projection, only mval
tion of several source attribute values (e.g., aggregations).       for chosen attributes of the input relation are considered
In case of one source attribute value, an information gain          while taking into account that a projection may eliminate
duplicates (shown in Eq. (5)).                                        fully aware that by a user mistake, e.g. using cross join
                                                                      instead of natural join, thresholds will be exceeded and the
                             k
                             X                                        user will be classified as potentially suspicious. However, we
 mval(πAj ,..,Ak (r(R))) =         mval(Ai ) ∗ |πAj ,..,Ak (r(R))|    recommend a multiplication of the monetary value of both
                             i=j                                      source relations instead of a summation due to the fact that
                                                                (5)   the calculation of the monetary value needs to be consistent
                                                                      also by combining different operators. For that reason, by
Selection                                                             following our recommendation, we ensure that an inner join
According to the relational algebra, a selection of a certain         is valuated with the same monetary value as the respective
relation σpred r(R) reduces tuples to a subset which satisfy          combination of a cross join (Cartesian product) and selection
specified predicates. Because the selection reduces the num-          on the join condition.
ber of tuples, the calculation of the monetary value does not
have to consider those filtered tuples and only the number              mval(r(R1 × R2 )) =
of present tuples are relevant (shown in Eq. (6)).                          mval(t ∈ r(R1 )) ∗ |r(R1 )| + mval(t ∈ r(R2 )) ∗ |r(R2 )|
                                                                                                                                    (9)
  mval(σpred (r(R))) = mval(t ∈ r(R)) ∗ |σpred (r(R))|          (6)

Set Union
                                                                      3.2     Join Operations
                                                                         In the context of relational databases, a join is a binary
A relation of all distinct elements (resp. tuples) of any two         operation of two tables (resp. data sources). The result set
relations is called the union (denoted by ∪) of those re-             of a join is an association of tuples from one table with tuples
lations. For performing set union, the two involved rela-             from another table by concatenating concerned attributes.
tions must be union-compatible – they must have the same              Joining is an important operation and most often perfor-
set of attributes. In symbols, the union is represented as            mance critical to certain queries that target tables whose
R1 ∪ R2 = {x : x ∈ R1 ∨ x ∈ R2 }. However, if two re-                 relationships to each other cannot be followed directly. Be-
lations contain identical tuples, within a resulting relation         cause the type of join affects the number of resulting tuples
these tuples do only exist once, meaning duplicates are elim-         and their attributes, the monetary value of each join needs
inated. Accordingly, the mval of a union of two relations is          to be calculated independently.
computed by adding mval of both relations, subtracted with
mval of duplicates (shown in Eq. (7)).                                Inner Join
        mval(R1 ∪ R2 ) = mval(r(R1 ))+                                An inner join produces a result table containing composite
                                                                      rows of involved tables that match some pre-defined, or ex-
                                                                (7)
                             X
            mval(r(R2 )) −         mval(ti ∈ r(R1 ∩ R2 )              plicitly specified, join condition. This join condition can be
                             i                                        any simple or compound search condition, but does not have
                                                                      to contain a subquery reference. The valuation of an inner
Set Difference                                                        join is computed by the sum of the monetary values of all
The difference of relations R1 and R2 is the relation that            attributes of a composite row multiplied by the number of
contains all the tuples that are in R1 , but do not belong to         rows within the result set. Because the join attribute Ajoin
R2 . The set difference is denoted by R1 − R2 or R1 \R2 and           of two joined tables has to be counted only once, we need
defined by R1 \R2 = {x : x ∈ R1 ∧ x ∈    / R2 }. Also, the set        to subtract it (shown in Eq. (10)).
difference is union-compatible, meaning the relations must
                                                                         mval(r(R1 ./ R2 ) = |r(R1 ./ R2 )| ∗
have the same number of attributes and the domain of each
attribute is the same in both R1 and R2 . The mval of a set              (mval(t ∈ r(R1 )) + (mval(t ∈ r(R2 )) − mval(Ajoin ))
difference of two relations is computed by subtracting the                                                                   (10)
mval of tuples that have both relations in common from the
monetary value of R1 given by Equation (8).                           Outer Join
                                     X                                An outer join does not require matching records for each
 mval(R1 \R2 ) = mval(r(R1 ) −             mval(ti ∈ r(R1 ∩ R2 )      tuple of concerned tables. The joined result table retains all
                                       i                              rows from at least one of the tables mentioned in the FROM
                                                                (8)   clause, as long as those rows are consistent with the search
                                                                      condition. Outer joins are subdivided further into left, right,
Cartesian Product                                                     and full outer joins. The result set of a left outer join (or left
The Cartesian product, also known as cross product, is an             join) includes all rows of the first mentioned table (left of
operator which works on two relations, just as set union              the join keyword) merged with attribute values of the right
and set difference. However, the Cartesian product is the             table where the join attribute matches. In case there is no
costliest operator to evaluate [9], because it combines the           match, attributes of the right table are set to NULL. The
tuples of one relation with all the tuples of the other relation      right outer join (or right join) will return rows that have data
– it pairs rows from both tables. Therefore, if the input             in the right table, even if there’s no matching rows in the left
relations R1 and R2 have n and m rows, respectively, the              table enhanced by atteributes (with NULL values) of the left
result set will contain n ∗ m rows and consist of columns of          table. A full outer join is used to retain the non-matching
R1 and the columns of R2 . Because, the number of tuples              information of all affected tables by including non-matching
of the outgoing relations are known, the monetary value is a          rows in the result set. To cumulate the monetary value
summation of all attribute valuations multiplied by number            for a query that contains a left or right outer join, we only
of rows of both relations given by Equation (9). We are               need to compute the monetary value of an inner join of both
tables and add the mval of an antijoin r(R1 . R2 ) ⊆ r(R1 )       of the ISO (1987) and ANSI (1986) standard for the SQL
which includes only tuples of R1 that do not have a join          database query language.
partner in R2 (shown in Eq. (11)). For the monetary value of        To be able to compute the monetary value of a derived,
a full outer join, we additionally would consider an antijoin     aggregated attribute, we need to consider two more factors.
r(R2 . R1 ) ⊆ r(R2 ) which includes tuples of R2 that do not      First of all, we divided aggregate function into two groups:
have a join partner given by Equation (12)).                      informative and conservative.
         mval(r(R1 1 R2 )) = mval(r(R1 ./ R2 ))+                    1. Informative are those aggregate functions where the
                                                          (11)         aggregated value of a certain aggregate function leads
          mval(r(R1 . R2 ))
                                                                       to an information gain of the entire input of all at-
                                                                       tribute values. This means that every single attribute
         mval(r(R1 1 R2 )) = mval(r(R1 ./ R2 ))+                       value participates in the computation of the aggre-
                                                          (12)
            mval(r(R1 . R2 )) + mval(r(R2 . R1 ))                      gated attribute value. Representatives for informative
                                                                       aggregate functions are COUNT, AVG and SUM.
Semi Join
                                                                    2. Conservative, on the contrary, are those functions where
A semi join is similar to the inner join, but with the addition        the aggregated value is represented by only one at-
that only attributes of one relation are represented in the            tribute value, but in consideration of all other attribute
result set. Semi joins are subdivided further into left and            values. So if the aggregated value are again separated
right semi joins. The left semi join operator returns each row         from the input set, all other attribute values will re-
from the first input relation (left of the join keyword) when          main. Conservative aggregate functions are MAX and
there is a matching row in the second input relation (right            MIN.
of the join keyword). The right semi join is computed vice
versa. The monetary value for a query that uses semi joins          The second factor that needs to be considered is the num-
can be easily cumulated by multiplying the sum of monetary        ber of attributes that are used to compute the aggregated
values for included attributes with number of matching rows       values. In case of a conservative aggregate function, it is
of the outgoing relation (shown in Eq. (13)).                     simple, because only one attribute value is part of the out-
                         X                                        put. For that reason we recommend to leave the mval of
  mval(r(R1 n R2 )) =            mval(Ai ) ∗ |r(R1 n R2 )| (13)   the source attribute unchanged (shown in Eq. (14)).
                        Ai ∈R1                                      mval(Ai ) = mval(M AX(Ai )) = mval(M IN (Ai ))          (14)
Nevertheless, we do have an information gain by knowing              For the informative aggregate functions the computation
join attributes of R1 have some join partners within R2           is more challenging due to several participating attribute
which are not considered. But adding our uncertainty factor       values. Because several input attribute values are concerned,
UF in this equation would lead to inconsistency by cumu-          we recommend the usage of our uncertainty factor which
lating the mval of a semi join compared to the mval of a          we already mentioned in a prior section. With the uncer-
combination of a natural join and a projection. In future         tainty factor it is possible to integrate the number of at-
work, we will solve this issue by presenting a calculation        tribute values in a way that a higher number of concerned
that is based on a combination of projections and joins to        attributes leads to an increase in percentage terms of the
cover such an implicit information gain.                          monetary value of the aggregated attribute value given by
                                                                  Equation (15).
3.3    Aggregate Functions
   In computer science, an aggregate function is a function         mval(COU N T (Ai )) = mval(SU M (Ai )) =
where the values of multiple rows are grouped together as                                 1                                 (15)
                                                                     mval(AV G(Ai )) =      log10 (|Ai | + 1) ∗ mval(Ai )
input on certain criteria to form a single value of more sig-                            30
nificant meaning. The SQL aggregate functions are useful
when mathematical operations must be performed on all or
                                                                  3.4   Scalar Functions
on a group of values. For that reason, they are frequently          Besides the SQL aggregate functions, which return a sin-
used with the GROUP BY clause within a SELECT state-              gle value, calculated from values in a column, there are also
ment. According to the SQL standard, the following aggre-         scalar functions defined in SQL, that return a single value
gate function are implemented in most DBMS and the ones           based on the input value. The possibly most commonly used
used most often: COUNT, AVG, SUM, MAX, and MIN.                   and well known scalar functions are:
   All aggregate functions are deterministic, i.e. they return       • UCASE() - Converts a field to upper case
the same value any time they are called by using the same
set of input values. SQL aggregate functions return a sin-           • LCASE() - Converts a field to lower case
gle value, calculated from values within one column of a             • LEN() - Returns the length of a text field
arbitrary relation [10]. However, it should be noted that ex-
cept for COUNT, these functions return a NULL value when             • ROUND() - Rounds a number to a specified degree
no rows are selected. For example, the function SUM per-
                                                                     • FORMAT() - Formats how a field is to be displayed
formed on no rows returns NULL, not zero as one might ex-
pect. Furthermore, except for COUNT, aggregate functions             Returned values of this scalar functions are always derived
ignore NULL values at all during computation. All aggre-          from one source attribute value, and some of them do not
gate function are defined in SQL:2011 standard or ISO/IEC         even change the main content of the attribute value. There-
9075:2011 (under the general title "Information technology        fore, we recommend that the monetary value of the source
- Database languages - SQL") which is the seventh revision        attribute stays untouched.
3.5     User-Defined Functions                                         Furthermore, by summing all partial result, we make sure
   User-defined functions (UDF ) are subroutines made up               that the worst case of information loss is considered, entirely
of one or several SQL or programming extension statements              in line with our general idea of a leakage resistant data val-
that can be used to encapsulate code for reuse. Most database          uation that should prevent a massive data extraction. How-
management systems (DBMS) allow users to create their                  ever, since SPs represent a completed unit, by reaching the
own user-defined functions and do not limit them to the                truncate threshold the whole SP will be blocked and rolled
built-in functions of their SQL programming language (e.g.,            back. For that reason, we recommend smaller SPs resp.
TSQL, PL/SQL, etc.). User-defined functions in most sys-               split existing SPs in DBS with an enabled leakage resistant
tems are created by using the CREATE FUNCTION state-                   data valuation.
ment and other users than the owner must be granted ap-
propriate permissions on a function before they can use it.            4.   RELATED WORK
Furthermore, UDFs can be either deterministic or nondeter-                Conventional database management systems mostly use
ministic. A deterministic function always returns the same             access control models to face unauthorized access on data.
results if the input is the equal and a nondeterministic func-         However, these are insufficient when an authorized individ-
tion returns different results every time it is called.                ual extracts data regardless whether she is the owner or
   On the basis of the multiple possibilities offered by most          has stolen that account. Several methods were conceived to
DBMS, it is impossible to estimate all feasible results of a           eliminate those weaknesses. We refer to Park and Giordano
UDF. Also, due to several features like shrinking, concate-            [14], who give an overview of requirements needed to address
nating, and encrypting of return values, a valuation of a              the insider threat.
single or an array of output values is practically impossible.            Authorization views partially achieve those crucial goals of
For this reason we decided not to calculate the monetary               an extended access control and have been proposed several
value depending on the output of a UDF, much more we                   times. For example, Rizvi et al. [15] as well as Rosenthal
do consider the attribute values that are passed to an UDF             et al. [16] use authorization-transparent views. In detail,
(shown in Eq. (16)). This assumption is also the most re-              incoming user queries are only admitted, if they can be an-
liable, because it does not matter what happens inside an              swered using information contained in authorization views.
UDF – like a black box – the information loss after inserting          Contrary to this, we do not prohibit a query in its entirety.
cannot get worse.                                                      Another approach based on views was introduced by Motro
                                                                       [12]. Motro handles only conjunctive queries and answers
      mval(U DFoutput (Aa , .., Ag )) =                                a query only with a part of the result set, but without any
                                           p
                                           X                    (16)   indication why it is partial. We do handle information en-
        mval(U DFinput (Ak , .., Ap )) =            mval(Ai )          hancing (e.g., joins), as well as coarsening operations (e.g.,
                                              i=k                      aggregation) and we do display a user notification. All au-
                                                                       thorization view approaches require an explicit definition of
3.6     Stored Procedures                                              a view for each possible access need, which also imposes
   Stored procedures (SP) are stored similar to user-defined           the burden of knowing and directly querying these views.
functions (UDF ) within a database system. The major dif-              In contrast, the monetary values of attributes are set while
ference is that stored procedures have to be called and the            defining the tables and the user can query the tables or views
return values of UDFs are used in other SQL statements in              she is used to. Moreover, the equivalence test of general re-
the same way pre-installed functions are used (e.g., LEN,              lational queries is undecidable and equivalence for conjunc-
ROUND, etc.). A stored procedure, which is depending on                tive queries is known to be NP complete [3]. Therefore, the
the DBMS also called proc, sproc, StoredProc or SP, is a               leakage-resistant data valuation is more applicable, because
group of SQL statements compiled into a single execution               it does not have to face those challenges.
plan [13] and mostly developed for applications that need                 However, none of these methods does consider the sensi-
to access easily a relational database system. Furthermore,            tivity level of data that is extracted by an authorized user.
SPs combine and provide logic also for extensive or complex            In the field of privacy-preserving data publishing (PPDP),
processing that requires execution of several SQL statement,           on the contrary, several methods are provided for publishing
which had to be implemented in an application before. Also             useful information while preserving data privacy. In detail,
a nesting of SPs is feasible by executing one stored procedure         multiple security-related measures (e.g., k-anonymity [17],
from within another. A typical use for SPs refers to data              l-Diversity [11]) have been proposed, which aggregate infor-
validation (integrated into the database) or access control            mation within a data extract in a way that they can not lead
mechanisms [13].                                                       to an identification of a single individual. We refer to Fung et
   Because stored procedures have such a complex structure,            al. [5], who give a detailed overview of recent developments
nesting is also legitimate and SPs are "only" a group of               in methods and tools of PPDP. However, these mechanisms
SQL statements, we recommend to value each single state-               are mainly used for privacy-preserving tasks and are not in
ment within a SP and sum up all partial results (shown in              use when an insider accesses data. They are not applica-
Eq. (17). With this assumption we do follow the principal              ble for our scenario, because they do not consider a line by
that single SQL statements are moved into stored proce-                line extraction over time as well as the information loss by
dures to provide a simple access for applications which only           aggregating attributes.
need to call the procedures.                                              To the best of our knowledge, there is only the approach of
                                                                       Harel et al. ([6], [7], [8]) that is comparable to our data val-
                                        k
                                        X                              uation to prevent suspicious, authorized data extractions.
      mval(SP (r(Rj ), .., r(Rk ))) =         mval(r(Ri ))      (17)   Harel et al. introduce the Misuseability Weight (M-score)
                                        i=j                            that desribes the sensitivity level of the data exposed to
the user. Hence, Harel et al. focus on the protection of the       [4] E. F. Codd. A Relational Model of Data for Large
quality of information, whereas our approach predominantly             Shared Data Banks. ACM Communication,
preserves the extraction of a collection of data (quantity of          13(6):377–387, June 1970.
information). Harel et al. also do not consider extractions        [5] B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu.
over time, logging of malicious requester and the backup pro-          Privacy-Preserving Data Publishing: A Survey of
cess. In addition, mapping attributes to a certain monetary            Recent Developments. ACM Comput. Surv.,
value is much more applicable and intuitive, than mapping              42(4):14:1–14:53, June 2010.
to a artificial M-score.                                           [6] A. Harel, A. Shabtai, L. Rokach, and Y. Elovici.
  Our extended authorization control does not limit the sys-           M-score: Estimating the Potential Damage of Data
tem to a simple query-authorization control without any                Leakage Incident by Assigning Misuseability Weight.
protection against the insider threat, rather we allow a query         In Proc. of the 2010 ACM Workshop on Insider
to be executed whenever the information carried by the                 Threats, Insider Threats’10, pages 13–20. ACM, 2010.
query is legitimate according to the specified authorizations      [7] A. Harel, A. Shabtai, L. Rokach, and Y. Elovici.
and thresholds.                                                        Eliciting Domain Expert Misuseability Conceptions.
                                                                       In Proc. of the 6th Int’l Conference on Knowledge
5.   CONCLUSIONS AND FUTURE WORK                                       Capture, K-CAP’11, pages 193–194. ACM, 2011.
                                                                   [8] A. Harel, A. Shabtai, L. Rokach, and Y. Elovici.
   In this paper we described conceptual background details
                                                                       M-Score: A Misuseability Weight Measure. IEEE
for a novel approach for database security. The key contri-
                                                                       Trans. Dependable Secur. Comput., 9(3):414–428, May
bution is to derive valuations for query results by considering
                                                                       2012.
the most important operations of the relational algebra as
                                                                   [9] T. Helleseth and T. Klove. The Number of Cross-Join
well as SQL and providing specific mval functions for each
                                                                       Pairs in Maximum Length Linear Sequences. IEEE
of them. While some of these rules are straight forward, e.g.
                                                                       Transactions on Information Theory, 37(6):1731–1733,
for core operations like selection and projection, other oper-
                                                                       Nov. 1991.
ations like specific join operations require some more thor-
ough considerations. Further operations, e.g. grouping and        [10] P. A. Laplante. Dictionary of Computer Science,
aggregation or user-defined function, would actually require           Engineering and Technology. CRC Pressl,
application specific valuations. To minimize the overhead              London,England, 1st edition, 2000.
for using valuation-based security, we discuss and recom-         [11] A. Machanavajjhala, D. Kifer, J. Gehrke, and
mend some reasonable valuation functions for these cases,              M. Venkitasubramaniam. L-Diversity: Privacy Beyond
too.                                                                   k-Anonymity. ACM Trans. Knowl. Discov. Data,
   As the results presented here merely are of conceptual              1(1):1–50, Mar. 2007.
nature, our current and future research includes considering      [12] A. Motro. An Access Authorization Model for
implementation alternatives, e.g. integrated with a given              Relational Databases Based on Algebraic
DBMS or as part of a middleware or driver as well as eval-             Manipulation of View Definitions. In Proc. of the 5th
uating the overhead and the effectiveness of the approach.             Int’l Conference on Data Engineering, pages 339–347.
We will also come up with a detailed recommendation of                 IEEE Computer Society, 1989.
how to set monetary values appropriate to different environ-      [13] J. Natarajan, S. Shaw, R. Bruchez, and M. Coles. Pro
ments and situations. Furthermore, we plan to investigate              T-SQL 2012 Programmer’s Guide. Apress,
further possible use cases for data valuation, such as billing         Berlin-Heidelberg, Germany, 3rd edition, 2012.
of data-providing services on a fine-grained level and con-       [14] J. S. Park and J. Giordano. Access Control
trolling benefit/cost trade-offs for data security and safety.         Requirements for Preventing Insider Threats. In Proc.
                                                                       of the 4th IEEE Int’l Conference on Intelligence and
                                                                       Security Informatics, ISI’06, pages 529–534. Springer,
6.   ACKNOWLEDGMENTS                                                   2006.
  This research has been funded in part by the German             [15] S. Rizvi, A. Mendelzon, S. Sudarshan, and P. Roy.
Federal Ministry of Education and Science (BMBF) through               Extending Query Rewriting Techniques for
the Research Program under Contract FKZ: 13N10818.                     Fine-Grained Access Control. In Proc. of the 2004
                                                                       ACM SIGMOD Int’l Conference on Management of
                                                                       Data, SIGMOD’04, pages 551–562. ACM, 2004.
7.   REFERENCES                                                   [16] A. Rosenthal and E. Sciore. View Security as the Basis
 [1] S. Barthel and E. Schallehn. The Monetary Value of                for Data Warehouse Security. In CAiSE Workshop on
     Information: A Leakage-Resistant Data Valuation. In               Design and Management of Data Warehouses,
     BTW Workshops, BTW’2013, pages 131–138. Köln                      DMDW’2000, pages 5–6. CEUR-WS, 2000.
     Verlag, 2013.                                                [17] L. Sweeney. K-Anonymity: A Model For Protecting
 [2] E. Bertino and R. Sandhu. Database Security -                     Privacy. Int. J. Uncertain. Fuzziness Knowl.-Based
     Concepts, Approaches, and Challenges. IEEE                        Syst., 10(5):557–570, Oct. 2002.
     Dependable and Secure Comp., 2(1):2 –19, Mar. 2005.
 [3] A. K. Chandra and P. M. Merlin. Optimal
     Implementation of Conjunctive Queries in Relational
     Data Bases. In Proc. of the 9th Annual ACM
     Symposium on Theory of Computing, STOC’77, pages
     77–90. ACM, 1977.