=Paper= {{Paper |id=Vol-28/paper-8 |storemode=property |title=View security as the basis for data warehouse security |pdfUrl=https://ceur-ws.org/Vol-28/paper8.pdf |volume=Vol-28 |dblpUrl=https://dblp.org/rec/conf/dmdw/RosenthalS00 }} ==View security as the basis for data warehouse security== https://ceur-ws.org/Vol-28/paper8.pdf
                            View Security as the Basis for Data Warehouse Security
                                            Arnon Rosenthal                                    Edward Sciore
                                         The MITRE Corporation                           Boston College and MITRE
                                           Bedford, MA, USA                               Chestnut Hill, MA, USA
                                             arnie@mitre.org                                   sciore@bc.edu

                                                                                   infer access rights across subsystems, without infringing
                                         Abstract                                  on their local autonomy. This problem has not been
                                                                                   addressed in systems to date. As a result, the warehouse
        Access. permissions in a data warehouse are                                DBA (W-DBA) not only has to manually specify access
        currently managed in a separate world from                                 rights on all warehouse data, the W-DBA must also be
        the sources’ policies. The consequences are                                trusted by all sources to specify these rights
        inconsistencies, slow response to change, and                              appropriately.
        wasted administrative work. We present a
        different approach, which treats the sources’                                 The goal of this paper is to provide a theory that
        exported tables and the warehouse as part of                               permits automated inference of many permissions for
        the same distributed database. Our main result                             the warehouse, in a way that minimizes both the
        is a way to control derived products by                                    learning curve for administrators and the amount of new
        extending SQL grants rather than creating                                  software that vendors would implement. We do not
        entirely new mechanisms. We provide a                                      propose a separate theory that requires vendors to
        powerful, sound inference theory that derives                              implement major new mechanisms, or administrators to
        permissions on warehouse tables (both                                      learn and execute new tasks. Instead, our theory is a
        materialized and virtual), making the system                               natural extension of the standard SQL grant/revoke
        easier to administer and its applications more                             model, to systems with redundant and derived data.
        robust. We also propose a new permission                                   Nearly all the capability comes from adapting general
        construct suitable for views that filter data                              mechanisms for view security, and by exploiting
        from mutually-suspicious parties.                                          available technology for generating equivalent queries.
1         Introduction                                                             1.1    Overview of Our Results
A key challenge for data warehouse security is how to                              Our theory extends SQL in three ways.
manage the entire system coherently – from sources
and their export tables, to warehouse stored tables                                   First, we split the notion of “access permission on a
(conventional and cubes) and views defined over the                                table” into two separately-administered issues: who is
warehouse. Permissions on the warehouse must satisfy                               allowed to access what information (information
the restrictions of the data owners, and be updated                                permissions), and who is allowed to access which
quickly as those local concerns evolve. Yet the system                             physical tables (physical permissions). An information
cannot demand extensive administrator time, since                                  permission is the result of an enterprise-wide decision,
there are too few people with both technical skills to                             and should be consistently applied to all the views,
understand derivation logic, and business skills to                                replicas and derivatives in the system.1 In contrast,
balance security versus accessibility.                                             physical permissions are local, and need not be
                                                                                   consistent.
   Thus the critical problem is how to automatically
coordinate the access rights of the warehouse with                                    For example, the decision that “Employee salary
those of the sources. To do so, one must be able to                                information is releasable to payroll clerks and cost
                                                                                   analysts” is an information permission, whereas the
The copyright of this paper belongs to the paper’s authors. Permission to copy
                                                                                   decision that “Cost analysts are allowed to run queries
without fee all or part of this material is granted provided that the copies are   on the warehouse” is a physical permission. The effect
not made or distributed for direct commercial advantage.                           of separating these concerns is that each set of
Proceedings of the International Workshop on Design                                permissions may evolve independently without
and Management of Data Warehouses (DMDW'2000)
Stockholm, Sweden, June 5-6, 2000
(M. Jeusfeld, H. Shu, M. Staudt, G. Vossen, eds.)                                     1
                                                                                          Consistency means having an unambiguous
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-28/              statement for each table or tuple. We do support policies
                                                                                   that are conditional based on data value.


A. Rosenthal, E. Sciore                                                                                                                 8-1
invalidating the other, and each in isolation has very      models include authentication and authorization within
natural semantics.                                          their models (e.g., [Jo94, Cas97]). However, recent
                                                            industrial trends tend to pull these capabilities toward a
   Our second extension provides a powerful inference       middleware credentials infrastructure, and hence beyond
mechanism. In SQL, a user is allowed to execute a           the control of database researchers.
query Q if the user has permission on all tables
mentioned in Q. We extend SQL so that a user can               Because warehouses emphasize read operations, we
also execute Q if there is an equivalent query Q’ for       focus on the operations Read and Grant-Read. As
which the user has permission. That is, releasability       remarked in [Cas97], a similar inference theory works
depends on the result, not on the computation. We say       for update, but is outside the scope of this paper.
that Q’ is a witness for Q. This extension relies on the
query rewriting capability of the database system to        2    Basic Definitions
determine equivalence. It is not necessary for the
system to have any particular level of rewrite capability   A permission is a 4-tuple (subject, operation, object,
– we can exploit whatever degree of rewriting a vendor      mode). A subject is an abstract user, representing an
can afford to provide. We therefore dare to hope that       individual, a group, role, process, etc. We say subject s
the results would be practical and attract vendors.         has been granted a permission if it is granted either
                                                            explicitly to s, or to an ancestor in a group or role
   Third, we propose a new construct that broadens the      hierarchy. In principle, each subject is uniquely named
creation of views over mutually suspicious                  and is valid in all systems (though explicit grants to
organizations. Conventionally, a source can reduce risk     individuals may be rare outside their home system).2
by exporting a view that filters out sensitive data. In
SQL, a view can be created (with Grant authority) only         An object (also called a table) belongs to some
if there is a user that has Grant authority on all          schema, in some system, either a source or a warehouse.
mentioned tables. We motivate and define an extension       It can be a relation or OLAP cube (including, trivially,
that allows the desensitizing view to be over several       an individual cell), and can be either materialized or
mutually-suspicious sources, where no person is trusted     virtual. We refer to a table exported by a source as a
to have Grant permission over all of them.                  source table.

   Implementing our approach would improve existing            Our set of allowable operations is the same as in
systems in several ways. We meet the goals of notable       SQL. In this paper we focus on the operations Read and
previous proposals [Cap97, Cas97, Jo94], while using        Grant-Read. If subject s is granted a Read permission
fewer new mechanisms and providing greater                  on table T, then s is allowed to access all of T. If s is
flexibility. For example, our model allows authority to     granted a Grant-Read permission on T, then s is allowed
pass across system boundaries, and supports multiple        to grant a Read or s permission on T to any subject.
layers (e.g. of warehouses and data marts) without
requiring new constructs. Our model also aids the             The mode of a permission is either information or
system administrator, by automatically inferring many       physical, and will be discussed in Section 3.
Read and Grant-Read privileges from the warehouse
view definition, and by providing rules that determine         We treat each warehouse table as a view
the allowable queries on data cubes (and rollups). The      (materialized or virtual) over the tables exported by
authority to grant onward is provided by normal view        sources or in the warehouse. A view is defined by an
inference of Grant-Read. The model also helps with          SQL query, and may include user-defined functions. If
views defined over warehouse tables – it emphasizes         V is a view, we write its defining query as QV. The
layers of views rather than system boundaries.              inputs to a query Q are the objects mentioned in it; to
                                                            show the inputs, we denote the query as Q(T1, …, Tn). Q
1.2    Scope Limitations                                    can be a black box (either opaque code or secret), but
                                                            we assume that every query’s input set is known.
Warehouse security entails many issues that are not
addressed here. We believe that industrial progress is      3    Information and Physical Permissions
most likely if we design a robust, useful module for
security issues of derived data, rather than partial        The mode of a permission specifies that it is either an
solutions for a broad range of requirements. The            information permission or a physical permission. A
derived data facility would be complemented by              subject s is allowed to access a table only if it has both
modules that handle issues such as query rewrite, role      information and physical permissions on that table.
management, separation of duty, and inference attacks.
                                                            2
   We offload the problem of managing user sets to a          This paper addresses what permissions should exist.
                                                            Partitioning by physical system is permitted in the
role/group management module. Several previous
                                                            physical implementation, which we do not address.


A. Rosenthal, E. Sciore                                                                                           8-2
   A permission (s, op, T, “information”) indicates that   permissions (s, operation, object, “information”) and
the information in T should be accessible to s, for        (s, operation, object, “physical”). Each Grant or
operation op. It concerns releasability of knowledge,      Revoke would (conceptually) cause all the permissions
not access to the physical container T. Information        to be recomputed. It will be challenging to implement
policies apply globally, spanning source systems and       these semantics efficiently and reliably among
warehouse tables. They are unaffected by creation of       distributed heterogeneous DBMSs. The alternative is to
redundant copies or new interfaces – the policy on a       implement them by human administrators.
table would be unaffected by a decision to create a
mirror table or an extract at another site.                4     Permission Inference
   For example, a medical system may contain several       A permission is explicit if it is granted directly by an
information policy assertions concerning when              authorized grantor. When the information for a table T
TreatmentCost-values can be releasable to certain          is derivable as Q(T1, … Tn), we may be able to infer
subjects. One assertion might be “to auditors”. Others     additional permissions. This section presents a theory
might be: “In New York, to the role BillingClerk”; and     for determining the implied permissions in effect at a
“in Florida, to the role InsuranceAdjuster”. These         particular schema (typically the warehouse).
assertions correspond to information permissions, and
refer to all copies of the information.                       We want the rationale for permission inference to be
                                                           understandable (even by nontechnical managers) and to
   A physical permission authorizes an execution           be easy to automate. In earlier papers we relied on
strategy to use a single physical resource (usually a      inference   rules,   whose      justification   required
materialized table). Whereas information permissions       considerable argument. In this section we propose a
are global, physical permissions allow local autonomy.     simpler “witness” semantics; inference rules are
By withholding physical permissions, a resource owner      implementation aids derived to track witnesses.
can limit workload, allow only users who have paid a
fee, or allow only corporate employees. We suspect         4.1     A Theory of Witnesses
that physical permissions will typically be administered   Informally, a subject should have permission to execute
in coarse granules, e.g., to give a user physical access   a query if and only if the query can be expressed in
to all tables on a particular platform. It seems best to   terms of tables (base or view) for which the user has
administer in terms of database schemas, so as to allow    explicit permission.
finer granularity (e.g., to exclude blobs), and to use
groups and roles defined in the DBMS.                         Formally, let T be a table (base or view). We have
                                                           the following definitions:
   Our long-term vision is to allow a security
administrator to see a giant distributed database that     •     A query Q is equivalent to T if the output of Q
spans many platforms. This database would be                     always contains the same tuples as T.
managed using Grant/Revoke, with implied
permissions as described below. (Grant and Revoke          •     A permission (s, op, T, mode) is implied if there
would be extended by a parameter that distinguishes              exists an equivalent query Q(T1, …, Tn) such that
 mode).                                  each permission (s, op, Ti, mode) has been granted
                                                                 explicitly. Query Q is called the witness for the
   It will be difficult technically and politically to           implied permission.
extend DBMS capabilities to cope with new modes,
and to notify each other of the need to Grant and             The inference mechanism for permissions is the same
Revoke derived permissions. We suspect progress will       for both physical and information permissions.
be faster if security management is part of a systems      However, the interpretation of the implied permission is
manager (e.g., Tivoli) that communicates with each         different, depending on the mode.
relevant DBMS. Permission changes would be
replicated to the metadata manager, and it would              An implied information permission (s, read, T)
replicate derived changes to the other systems.            means that the information in T is releasable to s. In
                                                           effect, a subject need not care whether an information
   In such an architecture, granting an ordinary SQL       permission is explicit or implied, nor whether T is a
permission in a DBMS would create both information         materialized view, a view computed over several
and physical permissions in the metadata manager. The      sources, or a base table.
metadata manager would determine derived
permissions. The SQL permission (s, operation, object)        An implied physical permission on T asserts that
would be granted on a derived table (typically in the      there exists at least one way to compute T for which the
warehouse) if and only if the subject has both             physical permissions are available. If T were a



A. Rosenthal, E. Sciore                                                                                        8-3
materialized table, however, the subject would not have     query rewrite is to improve metadata (e.g., access
physical access to T – the subject (or the database         permissions), they are unlikely to spend the substantial
query processor) would still need to use the tables from    sums needed to do it well (e.g., to exploit constraints, to
the witness query.                                          extend it to new operators, or to employ materialized
                                                            views). However, they have been willing to spend the
4.2    Query Rewrite Techniques                             money in pursuit of performance gains. Therefore if we
Our definition of implied permission requires that the      want to get the most from query equivalence, we need
system be able to find a witness query equivalent to T.     to hook into the query processing module.
The theory of query equivalence is mature. There have
been numerous research articles, and it is heavily             Second, our benchmark is to do better than others
exploited in query optimizers. Exploring the details        have, and to allow only justifiable permissions. Given
would be a distraction from the security issues, and        that current systems perform no inference, even the
from creating a derived data security module. Instead,      implementation of the simple view substitution strategy
we explore the benefits of exploiting three particular      would be a significant improvement.            The more
rewriting strategies. A larger set of motivating            strategies implemented, the more powerful the system.
examples and a discussion of the issues involved in         However, “completeness” is a minor goal. In fact, a
using the query processor’s rewriter appear in [Ro99b].     complete set of equivalents is impossible due to
                                                            incomplete declarations of constraints and operator
 • View Substitution: View substitution replaces a          rewrite rules, and because the general rewrite problem is
   mention of a view in a query by the view’s               undecidable. Moreover, even a complete set of
   definition. We can infer that if a subject s has the     equivalents may give suboptimal permissions; only a
   necessary permissions on the source tables, then s       health care expert would know that “Patients where
   also has permission on the view. In this case, the       Age>21” deserves more permissions than Patients.
   view is simply an alternate interface to the same
                                                            4.3      Administering a Warehouse
   data. (In contrast, the current SQL standard
   requires that all view users obtain explicit grants      Our witness theory specifies which information
   from the view owner.)                                    permissions are to be automatically inferred and
                                                            declared on warehouse tables. This section examines
 • Semantic Query Optimization: If the user queries a       the consequent benefits for warehouse administration
   view V, some source data that underlies V may be         that arise from it.
   irrelevant to the query result. Query processors
   routinely exploit integrity constraints and relational   4.3.1    Modes of Administration
   algebra to rewrite queries in a simpler form. For        Standard SQL infers view privileges only when a view
   example, suppose V is the join of two tables; then       is defined – the view definer receives the intersection of
   typically a query on V is possible only if the user      her privileges on the input tables. However, the view
   has permission on both tables. However, if the join      definer must then explicitly specify all other
   is on the key of one table and referential integrity     permissions on the view. Our model infers information
   holds, then a query that accesses only fields from       permissions for other users, so the W-DBA need not do
   the foreign-key table need not perform the join.         so. Beyond this, normal “grant option” semantics apply.
   Thus, the subject issuing the query would not need       That is, if a source administrator owns all the tables that
   permission on V, but just the one table.                 underlie a warehouse view, she may administer the
                                                            view’s permissions directly, grant “grant option” to a
 • Rewrite in terms of other views: Subjects are often      warehouse administrator, or combine the modes.
   given access to information through views when
   they do not have permissions on the base tables. A          [Cap97, Cas97] propose a rich security model for
   query mentioning base tables may therefore be            tightly coupled federated databases. Many of the
   equivalent to one mentioning only views. In such a       facilities apply also to warehouses. Derived data
   case, the system can infer that the subject issuing      security entwines with models of data integration, entity
   the query has permission to execute it.                  and object data models, and some aspects of group
                                                            management with negative permissions. They require a
   Our model does not require any particular degree of      new sort of specification, to tell whether federation
query rewrite, nor do we propose algorithms in this         administrators can delete or supplement permissions
paper. Below, we elaborate on two issues: why push          inferred from the sources. They also proposed features
rewrite to another module, and why accept incomplete        in areas that we do not address, such as top-down
enumeration of rewrites.                                    administration, and determining where authorizations
                                                            may be checked.
  First, DBMS vendors have historically given
metadata management a low priority. If the purpose of



A. Rosenthal, E. Sciore                                                                                            8-4
   These papers motivated many of our requirements,          source databases. That means that the system needs to
but the complexity seems likely to deter major               populate permission tables on the warehouse.3 Our
commercial implementation. In areas where the models         theory of witnesses allows us to determine whether a
overlap, ours seems to have fewer new constructs. Yet        subject is entitled to access a table T. However, the
we can simulate the modes of local autonomy proposed         warehouse needs to know the set of all subjects
in [Cap97, Cas97], and additional ones besides.              authorized to access T, i.e., the set of users for which
                                                             witnesses can be found. This section shows how it is
   One such mode has source administrators make all          possible to compute that set.
decisions; Read or Update permissions are then
inferred to the view. This resembles our inference (and         The basic idea is that the set of users authorized to
includes Update). However, it does not appear to infer       execute a query (not considering equivalence) is the
privileges     to     views      defined      over     the   intersection of the user sets of its inputs. The user set of
federation/warehouse base tables, and does not allow         a view can thus be determined by taking the union of
an upper-tier view (e.g., a statistical rollup) to receive   these individual user sets for each equivalent query
additional permissions from the underlying source            found.
administrator(s). Our model allows all this. The W-
DBA gives physical permission in advance to All-                Given T, the system first computes all queries
Employees, but receives no Grant-Read privileges, and        equivalent to T, structured as a directed graph in which
hence does nothing more. Going beyond the previous           each subexpression appears only once. One can traverse
work, in our model, an administrator s who possesses         the graph, computing permissions for each table in that
Grant-Read information permissions on all of T1, …,          graph from the permissions on its predecessors. If the
Tn, possesses it on V(T1, …, Tn). Even though T is in the    graph is acyclic, this can be done once, bottom up.
warehouse and s is associated with a source system,
she automatically has the right to grant additional             It is possible to have cycles in derivability (e.g., a
information permissions on T.                                whole table from horizontal or vertical partitions). We
                                                             have developed algorithms that appear to handle the
   Another proposed mode is to have joint signoff,           general case with similar order of complexity, but a
where the W-DBA must approve each permission on a            larger constant factor. The general case can also be
warehouse table. This can be achieved in our model by        handled by translating the above recursions into Datalog
having the W-DBA grant the physical permissions              (one clause per view definition), and using semi-Naïve
selectively. Finally, there is a mode where the W-DBA        evaluation [Ul89].
actually grants information permissions; our model
handles this by asking each source administrator to          5    Within-View Permissions
grant Grant-Read on exported tables to the W-DBA.
                                                             A warehouse’s derivation is likely to draw from many
4.3.2    Deriving OLAP Permission Rules                      parts of an organization, or even from multiple
                                                             organizations. For example, a drug-effects warehouse
Witness theory works to guide the proper access              might draw information from many hospitals. With the
permissions for OLAP [Pr00]. Consider a cube having          current SQL model, each organization must grant a
several hierarchical dimensions. Suppose the source          warehouse administrator Read (and Grant-Read)
declares that subject has permissions on a view defined      permissions on their exported data – only then can
by some rollup level on each dimension. Then in our          someone define and administer a warehouse table view
model, a query is permissible only when a witness can        over the combined information. But what happens if no
be found, i.e., if it can be derived from views at the       person enjoys such universal trust? And how would 500
permitted rollup level. It will not be permissible to        hospitals or 50000 doctors’ offices negotiate to choose
access the warehouse data at a finer granularity than        such a paragon?
the rollups, even if the filtered data is later rolled up.

   This is in contrast with some warehouse systems              Our approach is to exploit multi-table4 views that
[UPa00], which allow the query to execute, but               combine information in a way that makes it less
surreptitiously replace the user’s query by a query to a     sensitive. A source can stipulate that its exported data
subcube for which the user has access. We consider           can be used only for computing the view. The SQL
this unsatisfactory, too likely to lead to bad decisions.
It would be far preferable to suggest to the user one or
more queries for which they do have sufficient access.       3
                                                                To maximize rewrite possibilities, the warehouse
4.4     Computing Local Permissions                          could also consider rewrites in terms of source tables,
                                                             while using its own copy of source permissions.
When a query is issued to a warehouse, one wants to          4
                                                               If V is computed only from table T, the definition is
test permissions there, rather than refer the issue to the   uninteresting -- one could just grant select to s on V.


A. Rosenthal, E. Sciore                                                                                              8-5
syntax from the administrator of one of the view’s             the two tables and decides whether the entering person
input tables might become:                                     fits the description of a person who is being sought.
               grant select to s on T within V
   That is, subject s can access T, but only from within          Each airport’s authorities want access to view V,
view V.                                                        defined by:
                                                                     select * from ENTRANT, WANTED
   This model still requires trust that the warehouse                where match(ENTRANT, WANTED)
computes the view correctly, and makes no other use of               and ENTRANT.Airport = $MY_Airport
the information. One must also trust the other sites not
to subvert the filtering effects of the view, as illustrated      The border authorities should not have access to all
in Example 1. But it will require malice and skill rather      arrest warrants, and the police searchers should not
than carelessness to subvert the software.                     know about all citizens’ travels. So neither will give the
                                                               other’s representative Read authority on their entire
5.1    Examples                                                table. However, they both are willing to give read
We give three examples where the warehouse                     access within V to the head agent at each airport.
computes a view that is less sensitive, in a way that
could not be achieved by single-source views.                  Example 3 – Intersection of child and parent
                                                               Suppose that a hospital has two tables:
Example 1 – Totals over a large set                                   PATIENT[P#, Age,…] and
The warehouse supports financial studies of hospitals,                SURGERY[P#, ProcedureDone, Date, …]
by providing COST data about hospital patients, in a           PATIENT is the parent table (every surgery has a valid
cube with dimensions Treatment, Age, and State (i.e.,          patient), but most patients have no surgery records.
where each entry is a statewide average). Some
hospitals might be willing to release data only if it is          A researcher s studying surgery on the old might
hidden within Statewide totals. The warehouse defines          create the view
a view V to compute the cube, and each hospital issues:               select * from PATIENT, SURGERY
     grant read on table MyHospital.COST within V                     where PATIENT.Age > 80
                                                                      and PATIENT.P# = SURGERY.P#
   Elaborating this example, the warehouse might               In current systems, s would need access on both tables
contain other tables with more detailed information,           in order to create (and access) the view. In our model, it
from hospitals that are willing to provide it.                 is only necessary for the administrator of each table to
Permissions on that more detailed information allows           give u permission on the table within the view.
that information to be used for computing V. Hospitals
in New York City might feel there is sufficient                   This example can also be used to illustrate the
anonymity in being included in citywide totals; they           benefits of automatic inference. Suppose a research
could define V’ that totals by city. V is derivable from       user s with access to PATIENT and SURGERY
V’. A few hospitals that particularly trust the                periodically runs a reporting application that executes
researchers, or whose cost information is public, may          the above view. Now suppose new regulations are
grant access to the raw COST data.                             enacted that forbid u from seeing children’s data, i.e., s
                                                               is given access only to the view V’, defined by:
   In this example, the hospitals trust that the                         select * From PATIENT where Age>21
warehouse software will compute V (or V’) and discard          In current systems, the reporting application would fail.
the raw inputs. They also trust that the other hospitals       With query rewrite, the query could instead execute
will not spoof or reveal information in ways that enable       over the (non-materialized) view V’. Details appear in
individual hospital results to be detected.5                   [Ros99b].

                                                               5.2     Semantics for Within-View Permissions
Example 2 – Peer-to-Peer Intersection
Consider tables that track people entering the country         Intuitively, a subject s is able to access view V if s has
on international flights (denoted ENTRANT), and                access to each input table, within V. Formally, we have:
people who are wanted by the police (WANTED).
There is a procedure “match” which examines rows in            •     A within-view permission is a 5-tuple (subject,
                                                                     operation, object, mode, view).

5                                                                 The meaning of within-view permission P = (s, read,
 Confidentiality would be lost if all but one hospital in
                                                               T, m, V) is as follows. If P is an information permission,
a state filled its COST table with $0 for each entry.
                                                               then s has clearance to view the values of T that
would presumably have serious operational effects.             contribute to V. If P is a physical permission, then s is
Another attack, also unlikely, would be for all the
others to reveal their true costs.


A. Rosenthal, E. Sciore                                                                                              8-6
allowed to execute a query that physically accesses T,          access to the warehouse table. The same
but only for the purpose of computing V.                        mechanism is helpful for views defined above the
                                                                warehouse base tables.
  One can straightforwardly extend the witness            •     The theory of witnesses for individual users
semantics of Section 4.1 as follows.                            underpins a recursion that can efficiently determine
                                                                the permissions to be granted on each stored
   A permission (s, op, T, mode) is an implied                  warehouse table. When source permissions change,
permission if there exists a query Q and views {Vi}             the inference can be redone, completely
equivalent to T, such that for each object Ti mentioned         automatically. An efficient “delta” inference
in Q, either                                                    process appears feasible for Grants.
• the permission (s, op, Ti, mode) has been explicitly    •     There is no need to customize the security inference
    granted, or                                                 rules to handle OLAP or extract/transform/load
• the within-view permission (s, op, Ti, mode, Vi)              operators. Instead, one can use algebraic and
    has been explicitly granted, where Q is equivalent          constraint-based rewrite rules provided by the
    to Vi.                                                      query optimizer.
   Query Q is called the witness for the implied          •     We showed that local autonomy can be preserved
permission.                                                     (using physical permissions) without sacrificing the
                                                                inference of information permissions. Capabilities
  This definition allows us to model each of the                proposed in other research [Cap97, Cas97] were
examples in Section 3.1.                                        subsumed.
                                                          •     “Within-view” permissions enable additional
   A consequence of the above definition is that a user         collaboration    among       mutually     suspicious
who receives within-view permissions on the inputs of           organizations.
view V gets (by inference) the permission on V. Thus,
users are able to access a view even when nobody is       6.2     Open Problems
trusted to grant on all underlying tables. A user can     This section describes several open problems in easing
even be a view administrator with the ability to grant    the administration of warehouse access permissions.
access to V, if the within-view permissions are for the   First we describe pragmatic issues, and then theoretical
Grant-Read operation.                                     ones.
We have studied examples that require more complex           Perhaps the biggest pragmatic issue is creation of a
within-view permissions, in which multiple witnesses      tool based on the theory, so user experience could be
need to be coordinated. However, the extra complexity     observed. An html mockup of an environment for
of the permission mode makes them less useful in          managing metadata across layers in a multi-tier system
practical situations, and we therefore relegate their     is available at [Ro99a]. The mockup was useful in
development to a future paper.                            identifying requirements, and holes in the theory. But
                                                          we lack the resources (especially grad student labor) to
6     Summary and Future Work                             create and test a real tool. Also, our tool explored only
                                                          the semantics. It will be necessary to embed it in a
6.1    Contributions                                      systems management environment that coordinates
                                                          permissions with each relevant DBMS.
The greatest strengths of our approach are simplicity,
architecture and location independence, and                  A second pragmatic issue is how to produce a query
compatibility with relational technology. Our strongest   rewriter. It is infeasible to keep up with new query
result is the discovery that some pleasantly simple       operators and constraint constructs. Fortunately,
concepts – separation of physical permissions, and the    DBMSs already have a component that does such
“witness”-based theory for permissions on derived data    inference rather well, and in which big money is
– can turn a difficult, messy problem into one that can   invested – the query optimizer. The key issue, partly
be addressed with few technical complications. We         technical and partly political, is whether query
obtained a consistent, defensible theory for inferring                                                        6
                                                          processors will export their rewrite functionality.
new permissions from those which were explicitly
asserted. At the same time, we built upon rather than
replaced existing relational technology.
                                                          6
   More specifically, our approach provides the              In fact, warehouse query processors increasingly
following benefits:                                       rewrite queries to exploit materialized views. However,
• We can determine whether permissions on                 some modifications will be needed, to avoid exploring
     sources’ exported tables justify granting a user     multiple witnesses over the same input set (e.g., for
                                                          different join orders), and to avoid cost-based pruning.


A. Rosenthal, E. Sciore                                                                                         8-7
   Further inference capabilities would be available if    would seem appropriate to distinguish physical and
the extract/transform/load operations were describable     information permissions, and to grant access based on
as SQL expressions (with new SQL operators such as         whatever rewrites are easy to implement. More
coalesce columns, and user-defined functions).             sophisticated rewrites seem quite useful to employ
Currently, the E/T/L views tend to be described in         predefined views or eliminate unneeded tables, but will
code, though some products (such as the Microsoft          be costly to implement.
Repository [Be99]) will identify the input objects.
                                                              Finally, a message to builders of query processors:
   Next, we believe that sources and warehouse users       Make the query rewrite facilities available for other
need to negotiate permissions in terms of roles they       purposes. Rewrites are invaluable at inferring access
both understand. Without this, inference is of limited     permissions and other metadata, and yet are too costly
use. Role and group management is a difficult task. We     to implement just for metadata purposes.
hope to see rapid progress in this area, perhaps driven
by task and workflow models. One also needs to see if      7    References
knowledge of data derivation can help in defining such
roles.                                                     [Be99] P. Bernstein, et. al., “Microsoft Repository
                                                           Version 2 and the Open Information Model,”
   Finally, we need to understand the benefits of          Information Systems 24(2), 1999.
implementing each feature. For example, how many
users would want views that filter across several          [Cap97] S. De Capitani di Vimercati, P. Samarati,
sources, or cycles in view derivability? Which kinds of    Authorization Specification and Enforcement in
rewrites are most effective in giving users additional     Federated Database Systems, Journal of Computer
justified permissions?                                     Security, vol. 5, n. 2, 1997, pp. 155-188.

   Several theoretical problems remain open. We need       [Cas97] S. Castano, S. De Capitani di Vimercati, M.G.
efficient algorithms to infer permissions. We want to      Fugini, Automated Derivation of Global Authorizations
get a calculus of access predicates, to compute in         for Database Federations, Journal of Computer Security,
advance what warehouse permissions should be               vol. 5, n. 4, 1997, pp. 271-301.
granted. (Currently, for each request, one seeks a
witness). Also, [Ro00] proposed to attach limitations to   [Jo94] D. Jonscher, K. Dittrich, “An Approach for
grant-option permissions (e.g., to allow onward grants     Building Secure Database Federations”, VLDB,
only within the recipient’s department). Such limitation   Santiago, Chile, 1994.
predicates deserve to be added to the inference theory.
Next, to extend the theory to federations and ordinary     [Pr00] T. Priebe, G, Pernul, “Data Warehouse Security
views, one needs to handle permissions for update          in GOAL”, submitted for publication.
operations, and user-defined operations (e.g., Hire-
Employee).                                                 [Ro99a] A. Rosenthal, E. Sciore, G. Gengo,
                                                           “Demonstration of Multi-Tier Metadata Management”,
   Finally, research is needed to support permission       at www.mitre.org/resources/centers/it/staffpages/arnie/
requests that originate at the top (as in “make my view
have the following inferred permissions”) [Cas97]. If      [Ro99b] A. Rosenthal, E. Sciore, V. Doshi, “Security
permission inference is seen as defining a view, top-      Administration for Federations, Warehouses, and other
down asks to update the view. As with conventional         Derived Data”, IFIP 11.3 Working Conference on
view updates, there are often multiple ways to meet the    Database Security, Seattle 1999. (Kluwer, 2000).
user’s intent. Generation of those alternatives needs to
be guided by a theory, and automated by tools.             [Ro00] A. Rosenthal, E. Sciore, “Extending SQL’s
                                                           Grant and Revoke Operations”, IFIP Workshop on
                                                           Database Security, Amsterdam, August 2000
6.3    Conclusion
                                                           [Ul89] J. Ullman, Principles of Database and
The proposed facilities can provide significant benefits   Knowledge-Base Systems, Vol. 2, Computer Science
for administering distributed databases and views. At      Press, Rockville MD, 1989.
the same time, we minimized complexity for DBMS
implementers, by building over rather than replacing       [UPa00] Instructions from U. Pennsylvania data
SQL facilities.                                            warehouse
                                                           http://www.upenn.edu/computing/da/dw/security.html
   We contend that a “derived data access permissions”
module deserves implementation today with RDBMSs,
especially distributed RDBMSs. For the short term, it



A. Rosenthal, E. Sciore                                                                                       8-8