=Paper=
{{Paper
|id=Vol-28/paper-8
|storemode=property
|title=View security as the basis for data warehouse security
|pdfUrl=https://ceur-ws.org/Vol-28/paper8.pdf
|volume=Vol-28
|dblpUrl=https://dblp.org/rec/conf/dmdw/RosenthalS00
}}
==View security as the basis for data warehouse security==
View Security as the Basis for Data Warehouse Security
Arnon Rosenthal Edward Sciore
The MITRE Corporation Boston College and MITRE
Bedford, MA, USA Chestnut Hill, MA, USA
arnie@mitre.org sciore@bc.edu
infer access rights across subsystems, without infringing
Abstract on their local autonomy. This problem has not been
addressed in systems to date. As a result, the warehouse
Access. permissions in a data warehouse are DBA (W-DBA) not only has to manually specify access
currently managed in a separate world from rights on all warehouse data, the W-DBA must also be
the sources’ policies. The consequences are trusted by all sources to specify these rights
inconsistencies, slow response to change, and appropriately.
wasted administrative work. We present a
different approach, which treats the sources’ The goal of this paper is to provide a theory that
exported tables and the warehouse as part of permits automated inference of many permissions for
the same distributed database. Our main result the warehouse, in a way that minimizes both the
is a way to control derived products by learning curve for administrators and the amount of new
extending SQL grants rather than creating software that vendors would implement. We do not
entirely new mechanisms. We provide a propose a separate theory that requires vendors to
powerful, sound inference theory that derives implement major new mechanisms, or administrators to
permissions on warehouse tables (both learn and execute new tasks. Instead, our theory is a
materialized and virtual), making the system natural extension of the standard SQL grant/revoke
easier to administer and its applications more model, to systems with redundant and derived data.
robust. We also propose a new permission Nearly all the capability comes from adapting general
construct suitable for views that filter data mechanisms for view security, and by exploiting
from mutually-suspicious parties. available technology for generating equivalent queries.
1 Introduction 1.1 Overview of Our Results
A key challenge for data warehouse security is how to Our theory extends SQL in three ways.
manage the entire system coherently – from sources
and their export tables, to warehouse stored tables First, we split the notion of “access permission on a
(conventional and cubes) and views defined over the table” into two separately-administered issues: who is
warehouse. Permissions on the warehouse must satisfy allowed to access what information (information
the restrictions of the data owners, and be updated permissions), and who is allowed to access which
quickly as those local concerns evolve. Yet the system physical tables (physical permissions). An information
cannot demand extensive administrator time, since permission is the result of an enterprise-wide decision,
there are too few people with both technical skills to and should be consistently applied to all the views,
understand derivation logic, and business skills to replicas and derivatives in the system.1 In contrast,
balance security versus accessibility. physical permissions are local, and need not be
consistent.
Thus the critical problem is how to automatically
coordinate the access rights of the warehouse with For example, the decision that “Employee salary
those of the sources. To do so, one must be able to information is releasable to payroll clerks and cost
analysts” is an information permission, whereas the
The copyright of this paper belongs to the paper’s authors. Permission to copy
decision that “Cost analysts are allowed to run queries
without fee all or part of this material is granted provided that the copies are on the warehouse” is a physical permission. The effect
not made or distributed for direct commercial advantage. of separating these concerns is that each set of
Proceedings of the International Workshop on Design permissions may evolve independently without
and Management of Data Warehouses (DMDW'2000)
Stockholm, Sweden, June 5-6, 2000
(M. Jeusfeld, H. Shu, M. Staudt, G. Vossen, eds.) 1
Consistency means having an unambiguous
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-28/ statement for each table or tuple. We do support policies
that are conditional based on data value.
A. Rosenthal, E. Sciore 8-1
invalidating the other, and each in isolation has very models include authentication and authorization within
natural semantics. their models (e.g., [Jo94, Cas97]). However, recent
industrial trends tend to pull these capabilities toward a
Our second extension provides a powerful inference middleware credentials infrastructure, and hence beyond
mechanism. In SQL, a user is allowed to execute a the control of database researchers.
query Q if the user has permission on all tables
mentioned in Q. We extend SQL so that a user can Because warehouses emphasize read operations, we
also execute Q if there is an equivalent query Q’ for focus on the operations Read and Grant-Read. As
which the user has permission. That is, releasability remarked in [Cas97], a similar inference theory works
depends on the result, not on the computation. We say for update, but is outside the scope of this paper.
that Q’ is a witness for Q. This extension relies on the
query rewriting capability of the database system to 2 Basic Definitions
determine equivalence. It is not necessary for the
system to have any particular level of rewrite capability A permission is a 4-tuple (subject, operation, object,
– we can exploit whatever degree of rewriting a vendor mode). A subject is an abstract user, representing an
can afford to provide. We therefore dare to hope that individual, a group, role, process, etc. We say subject s
the results would be practical and attract vendors. has been granted a permission if it is granted either
explicitly to s, or to an ancestor in a group or role
Third, we propose a new construct that broadens the hierarchy. In principle, each subject is uniquely named
creation of views over mutually suspicious and is valid in all systems (though explicit grants to
organizations. Conventionally, a source can reduce risk individuals may be rare outside their home system).2
by exporting a view that filters out sensitive data. In
SQL, a view can be created (with Grant authority) only An object (also called a table) belongs to some
if there is a user that has Grant authority on all schema, in some system, either a source or a warehouse.
mentioned tables. We motivate and define an extension It can be a relation or OLAP cube (including, trivially,
that allows the desensitizing view to be over several an individual cell), and can be either materialized or
mutually-suspicious sources, where no person is trusted virtual. We refer to a table exported by a source as a
to have Grant permission over all of them. source table.
Implementing our approach would improve existing Our set of allowable operations is the same as in
systems in several ways. We meet the goals of notable SQL. In this paper we focus on the operations Read and
previous proposals [Cap97, Cas97, Jo94], while using Grant-Read. If subject s is granted a Read permission
fewer new mechanisms and providing greater on table T, then s is allowed to access all of T. If s is
flexibility. For example, our model allows authority to granted a Grant-Read permission on T, then s is allowed
pass across system boundaries, and supports multiple to grant a Read or s permission on T to any subject.
layers (e.g. of warehouses and data marts) without
requiring new constructs. Our model also aids the The mode of a permission is either information or
system administrator, by automatically inferring many physical, and will be discussed in Section 3.
Read and Grant-Read privileges from the warehouse
view definition, and by providing rules that determine We treat each warehouse table as a view
the allowable queries on data cubes (and rollups). The (materialized or virtual) over the tables exported by
authority to grant onward is provided by normal view sources or in the warehouse. A view is defined by an
inference of Grant-Read. The model also helps with SQL query, and may include user-defined functions. If
views defined over warehouse tables – it emphasizes V is a view, we write its defining query as QV. The
layers of views rather than system boundaries. inputs to a query Q are the objects mentioned in it; to
show the inputs, we denote the query as Q(T1, …, Tn). Q
1.2 Scope Limitations can be a black box (either opaque code or secret), but
we assume that every query’s input set is known.
Warehouse security entails many issues that are not
addressed here. We believe that industrial progress is 3 Information and Physical Permissions
most likely if we design a robust, useful module for
security issues of derived data, rather than partial The mode of a permission specifies that it is either an
solutions for a broad range of requirements. The information permission or a physical permission. A
derived data facility would be complemented by subject s is allowed to access a table only if it has both
modules that handle issues such as query rewrite, role information and physical permissions on that table.
management, separation of duty, and inference attacks.
2
We offload the problem of managing user sets to a This paper addresses what permissions should exist.
Partitioning by physical system is permitted in the
role/group management module. Several previous
physical implementation, which we do not address.
A. Rosenthal, E. Sciore 8-2
A permission (s, op, T, “information”) indicates that permissions (s, operation, object, “information”) and
the information in T should be accessible to s, for (s, operation, object, “physical”). Each Grant or
operation op. It concerns releasability of knowledge, Revoke would (conceptually) cause all the permissions
not access to the physical container T. Information to be recomputed. It will be challenging to implement
policies apply globally, spanning source systems and these semantics efficiently and reliably among
warehouse tables. They are unaffected by creation of distributed heterogeneous DBMSs. The alternative is to
redundant copies or new interfaces – the policy on a implement them by human administrators.
table would be unaffected by a decision to create a
mirror table or an extract at another site. 4 Permission Inference
For example, a medical system may contain several A permission is explicit if it is granted directly by an
information policy assertions concerning when authorized grantor. When the information for a table T
TreatmentCost-values can be releasable to certain is derivable as Q(T1, … Tn), we may be able to infer
subjects. One assertion might be “to auditors”. Others additional permissions. This section presents a theory
might be: “In New York, to the role BillingClerk”; and for determining the implied permissions in effect at a
“in Florida, to the role InsuranceAdjuster”. These particular schema (typically the warehouse).
assertions correspond to information permissions, and
refer to all copies of the information. We want the rationale for permission inference to be
understandable (even by nontechnical managers) and to
A physical permission authorizes an execution be easy to automate. In earlier papers we relied on
strategy to use a single physical resource (usually a inference rules, whose justification required
materialized table). Whereas information permissions considerable argument. In this section we propose a
are global, physical permissions allow local autonomy. simpler “witness” semantics; inference rules are
By withholding physical permissions, a resource owner implementation aids derived to track witnesses.
can limit workload, allow only users who have paid a
fee, or allow only corporate employees. We suspect 4.1 A Theory of Witnesses
that physical permissions will typically be administered Informally, a subject should have permission to execute
in coarse granules, e.g., to give a user physical access a query if and only if the query can be expressed in
to all tables on a particular platform. It seems best to terms of tables (base or view) for which the user has
administer in terms of database schemas, so as to allow explicit permission.
finer granularity (e.g., to exclude blobs), and to use
groups and roles defined in the DBMS. Formally, let T be a table (base or view). We have
the following definitions:
Our long-term vision is to allow a security
administrator to see a giant distributed database that • A query Q is equivalent to T if the output of Q
spans many platforms. This database would be always contains the same tuples as T.
managed using Grant/Revoke, with implied
permissions as described below. (Grant and Revoke • A permission (s, op, T, mode) is implied if there
would be extended by a parameter that distinguishes exists an equivalent query Q(T1, …, Tn) such that
mode). each permission (s, op, Ti, mode) has been granted
explicitly. Query Q is called the witness for the
It will be difficult technically and politically to implied permission.
extend DBMS capabilities to cope with new modes,
and to notify each other of the need to Grant and The inference mechanism for permissions is the same
Revoke derived permissions. We suspect progress will for both physical and information permissions.
be faster if security management is part of a systems However, the interpretation of the implied permission is
manager (e.g., Tivoli) that communicates with each different, depending on the mode.
relevant DBMS. Permission changes would be
replicated to the metadata manager, and it would An implied information permission (s, read, T)
replicate derived changes to the other systems. means that the information in T is releasable to s. In
effect, a subject need not care whether an information
In such an architecture, granting an ordinary SQL permission is explicit or implied, nor whether T is a
permission in a DBMS would create both information materialized view, a view computed over several
and physical permissions in the metadata manager. The sources, or a base table.
metadata manager would determine derived
permissions. The SQL permission (s, operation, object) An implied physical permission on T asserts that
would be granted on a derived table (typically in the there exists at least one way to compute T for which the
warehouse) if and only if the subject has both physical permissions are available. If T were a
A. Rosenthal, E. Sciore 8-3
materialized table, however, the subject would not have query rewrite is to improve metadata (e.g., access
physical access to T – the subject (or the database permissions), they are unlikely to spend the substantial
query processor) would still need to use the tables from sums needed to do it well (e.g., to exploit constraints, to
the witness query. extend it to new operators, or to employ materialized
views). However, they have been willing to spend the
4.2 Query Rewrite Techniques money in pursuit of performance gains. Therefore if we
Our definition of implied permission requires that the want to get the most from query equivalence, we need
system be able to find a witness query equivalent to T. to hook into the query processing module.
The theory of query equivalence is mature. There have
been numerous research articles, and it is heavily Second, our benchmark is to do better than others
exploited in query optimizers. Exploring the details have, and to allow only justifiable permissions. Given
would be a distraction from the security issues, and that current systems perform no inference, even the
from creating a derived data security module. Instead, implementation of the simple view substitution strategy
we explore the benefits of exploiting three particular would be a significant improvement. The more
rewriting strategies. A larger set of motivating strategies implemented, the more powerful the system.
examples and a discussion of the issues involved in However, “completeness” is a minor goal. In fact, a
using the query processor’s rewriter appear in [Ro99b]. complete set of equivalents is impossible due to
incomplete declarations of constraints and operator
• View Substitution: View substitution replaces a rewrite rules, and because the general rewrite problem is
mention of a view in a query by the view’s undecidable. Moreover, even a complete set of
definition. We can infer that if a subject s has the equivalents may give suboptimal permissions; only a
necessary permissions on the source tables, then s health care expert would know that “Patients where
also has permission on the view. In this case, the Age>21” deserves more permissions than Patients.
view is simply an alternate interface to the same
4.3 Administering a Warehouse
data. (In contrast, the current SQL standard
requires that all view users obtain explicit grants Our witness theory specifies which information
from the view owner.) permissions are to be automatically inferred and
declared on warehouse tables. This section examines
• Semantic Query Optimization: If the user queries a the consequent benefits for warehouse administration
view V, some source data that underlies V may be that arise from it.
irrelevant to the query result. Query processors
routinely exploit integrity constraints and relational 4.3.1 Modes of Administration
algebra to rewrite queries in a simpler form. For Standard SQL infers view privileges only when a view
example, suppose V is the join of two tables; then is defined – the view definer receives the intersection of
typically a query on V is possible only if the user her privileges on the input tables. However, the view
has permission on both tables. However, if the join definer must then explicitly specify all other
is on the key of one table and referential integrity permissions on the view. Our model infers information
holds, then a query that accesses only fields from permissions for other users, so the W-DBA need not do
the foreign-key table need not perform the join. so. Beyond this, normal “grant option” semantics apply.
Thus, the subject issuing the query would not need That is, if a source administrator owns all the tables that
permission on V, but just the one table. underlie a warehouse view, she may administer the
view’s permissions directly, grant “grant option” to a
• Rewrite in terms of other views: Subjects are often warehouse administrator, or combine the modes.
given access to information through views when
they do not have permissions on the base tables. A [Cap97, Cas97] propose a rich security model for
query mentioning base tables may therefore be tightly coupled federated databases. Many of the
equivalent to one mentioning only views. In such a facilities apply also to warehouses. Derived data
case, the system can infer that the subject issuing security entwines with models of data integration, entity
the query has permission to execute it. and object data models, and some aspects of group
management with negative permissions. They require a
Our model does not require any particular degree of new sort of specification, to tell whether federation
query rewrite, nor do we propose algorithms in this administrators can delete or supplement permissions
paper. Below, we elaborate on two issues: why push inferred from the sources. They also proposed features
rewrite to another module, and why accept incomplete in areas that we do not address, such as top-down
enumeration of rewrites. administration, and determining where authorizations
may be checked.
First, DBMS vendors have historically given
metadata management a low priority. If the purpose of
A. Rosenthal, E. Sciore 8-4
These papers motivated many of our requirements, source databases. That means that the system needs to
but the complexity seems likely to deter major populate permission tables on the warehouse.3 Our
commercial implementation. In areas where the models theory of witnesses allows us to determine whether a
overlap, ours seems to have fewer new constructs. Yet subject is entitled to access a table T. However, the
we can simulate the modes of local autonomy proposed warehouse needs to know the set of all subjects
in [Cap97, Cas97], and additional ones besides. authorized to access T, i.e., the set of users for which
witnesses can be found. This section shows how it is
One such mode has source administrators make all possible to compute that set.
decisions; Read or Update permissions are then
inferred to the view. This resembles our inference (and The basic idea is that the set of users authorized to
includes Update). However, it does not appear to infer execute a query (not considering equivalence) is the
privileges to views defined over the intersection of the user sets of its inputs. The user set of
federation/warehouse base tables, and does not allow a view can thus be determined by taking the union of
an upper-tier view (e.g., a statistical rollup) to receive these individual user sets for each equivalent query
additional permissions from the underlying source found.
administrator(s). Our model allows all this. The W-
DBA gives physical permission in advance to All- Given T, the system first computes all queries
Employees, but receives no Grant-Read privileges, and equivalent to T, structured as a directed graph in which
hence does nothing more. Going beyond the previous each subexpression appears only once. One can traverse
work, in our model, an administrator s who possesses the graph, computing permissions for each table in that
Grant-Read information permissions on all of T1, …, graph from the permissions on its predecessors. If the
Tn, possesses it on V(T1, …, Tn). Even though T is in the graph is acyclic, this can be done once, bottom up.
warehouse and s is associated with a source system,
she automatically has the right to grant additional It is possible to have cycles in derivability (e.g., a
information permissions on T. whole table from horizontal or vertical partitions). We
have developed algorithms that appear to handle the
Another proposed mode is to have joint signoff, general case with similar order of complexity, but a
where the W-DBA must approve each permission on a larger constant factor. The general case can also be
warehouse table. This can be achieved in our model by handled by translating the above recursions into Datalog
having the W-DBA grant the physical permissions (one clause per view definition), and using semi-Naïve
selectively. Finally, there is a mode where the W-DBA evaluation [Ul89].
actually grants information permissions; our model
handles this by asking each source administrator to 5 Within-View Permissions
grant Grant-Read on exported tables to the W-DBA.
A warehouse’s derivation is likely to draw from many
4.3.2 Deriving OLAP Permission Rules parts of an organization, or even from multiple
organizations. For example, a drug-effects warehouse
Witness theory works to guide the proper access might draw information from many hospitals. With the
permissions for OLAP [Pr00]. Consider a cube having current SQL model, each organization must grant a
several hierarchical dimensions. Suppose the source warehouse administrator Read (and Grant-Read)
declares that subject has permissions on a view defined permissions on their exported data – only then can
by some rollup level on each dimension. Then in our someone define and administer a warehouse table view
model, a query is permissible only when a witness can over the combined information. But what happens if no
be found, i.e., if it can be derived from views at the person enjoys such universal trust? And how would 500
permitted rollup level. It will not be permissible to hospitals or 50000 doctors’ offices negotiate to choose
access the warehouse data at a finer granularity than such a paragon?
the rollups, even if the filtered data is later rolled up.
This is in contrast with some warehouse systems Our approach is to exploit multi-table4 views that
[UPa00], which allow the query to execute, but combine information in a way that makes it less
surreptitiously replace the user’s query by a query to a sensitive. A source can stipulate that its exported data
subcube for which the user has access. We consider can be used only for computing the view. The SQL
this unsatisfactory, too likely to lead to bad decisions.
It would be far preferable to suggest to the user one or
more queries for which they do have sufficient access. 3
To maximize rewrite possibilities, the warehouse
4.4 Computing Local Permissions could also consider rewrites in terms of source tables,
while using its own copy of source permissions.
When a query is issued to a warehouse, one wants to 4
If V is computed only from table T, the definition is
test permissions there, rather than refer the issue to the uninteresting -- one could just grant select to s on V.
A. Rosenthal, E. Sciore 8-5
syntax from the administrator of one of the view’s the two tables and decides whether the entering person
input tables might become: fits the description of a person who is being sought.
grant select to s on T within V
That is, subject s can access T, but only from within Each airport’s authorities want access to view V,
view V. defined by:
select * from ENTRANT, WANTED
This model still requires trust that the warehouse where match(ENTRANT, WANTED)
computes the view correctly, and makes no other use of and ENTRANT.Airport = $MY_Airport
the information. One must also trust the other sites not
to subvert the filtering effects of the view, as illustrated The border authorities should not have access to all
in Example 1. But it will require malice and skill rather arrest warrants, and the police searchers should not
than carelessness to subvert the software. know about all citizens’ travels. So neither will give the
other’s representative Read authority on their entire
5.1 Examples table. However, they both are willing to give read
We give three examples where the warehouse access within V to the head agent at each airport.
computes a view that is less sensitive, in a way that
could not be achieved by single-source views. Example 3 – Intersection of child and parent
Suppose that a hospital has two tables:
Example 1 – Totals over a large set PATIENT[P#, Age,…] and
The warehouse supports financial studies of hospitals, SURGERY[P#, ProcedureDone, Date, …]
by providing COST data about hospital patients, in a PATIENT is the parent table (every surgery has a valid
cube with dimensions Treatment, Age, and State (i.e., patient), but most patients have no surgery records.
where each entry is a statewide average). Some
hospitals might be willing to release data only if it is A researcher s studying surgery on the old might
hidden within Statewide totals. The warehouse defines create the view
a view V to compute the cube, and each hospital issues: select * from PATIENT, SURGERY
grant read on table MyHospital.COST within V where PATIENT.Age > 80
and PATIENT.P# = SURGERY.P#
Elaborating this example, the warehouse might In current systems, s would need access on both tables
contain other tables with more detailed information, in order to create (and access) the view. In our model, it
from hospitals that are willing to provide it. is only necessary for the administrator of each table to
Permissions on that more detailed information allows give u permission on the table within the view.
that information to be used for computing V. Hospitals
in New York City might feel there is sufficient This example can also be used to illustrate the
anonymity in being included in citywide totals; they benefits of automatic inference. Suppose a research
could define V’ that totals by city. V is derivable from user s with access to PATIENT and SURGERY
V’. A few hospitals that particularly trust the periodically runs a reporting application that executes
researchers, or whose cost information is public, may the above view. Now suppose new regulations are
grant access to the raw COST data. enacted that forbid u from seeing children’s data, i.e., s
is given access only to the view V’, defined by:
In this example, the hospitals trust that the select * From PATIENT where Age>21
warehouse software will compute V (or V’) and discard In current systems, the reporting application would fail.
the raw inputs. They also trust that the other hospitals With query rewrite, the query could instead execute
will not spoof or reveal information in ways that enable over the (non-materialized) view V’. Details appear in
individual hospital results to be detected.5 [Ros99b].
5.2 Semantics for Within-View Permissions
Example 2 – Peer-to-Peer Intersection
Consider tables that track people entering the country Intuitively, a subject s is able to access view V if s has
on international flights (denoted ENTRANT), and access to each input table, within V. Formally, we have:
people who are wanted by the police (WANTED).
There is a procedure “match” which examines rows in • A within-view permission is a 5-tuple (subject,
operation, object, mode, view).
5 The meaning of within-view permission P = (s, read,
Confidentiality would be lost if all but one hospital in
T, m, V) is as follows. If P is an information permission,
a state filled its COST table with $0 for each entry.
then s has clearance to view the values of T that
would presumably have serious operational effects. contribute to V. If P is a physical permission, then s is
Another attack, also unlikely, would be for all the
others to reveal their true costs.
A. Rosenthal, E. Sciore 8-6
allowed to execute a query that physically accesses T, access to the warehouse table. The same
but only for the purpose of computing V. mechanism is helpful for views defined above the
warehouse base tables.
One can straightforwardly extend the witness • The theory of witnesses for individual users
semantics of Section 4.1 as follows. underpins a recursion that can efficiently determine
the permissions to be granted on each stored
A permission (s, op, T, mode) is an implied warehouse table. When source permissions change,
permission if there exists a query Q and views {Vi} the inference can be redone, completely
equivalent to T, such that for each object Ti mentioned automatically. An efficient “delta” inference
in Q, either process appears feasible for Grants.
• the permission (s, op, Ti, mode) has been explicitly • There is no need to customize the security inference
granted, or rules to handle OLAP or extract/transform/load
• the within-view permission (s, op, Ti, mode, Vi) operators. Instead, one can use algebraic and
has been explicitly granted, where Q is equivalent constraint-based rewrite rules provided by the
to Vi. query optimizer.
Query Q is called the witness for the implied • We showed that local autonomy can be preserved
permission. (using physical permissions) without sacrificing the
inference of information permissions. Capabilities
This definition allows us to model each of the proposed in other research [Cap97, Cas97] were
examples in Section 3.1. subsumed.
• “Within-view” permissions enable additional
A consequence of the above definition is that a user collaboration among mutually suspicious
who receives within-view permissions on the inputs of organizations.
view V gets (by inference) the permission on V. Thus,
users are able to access a view even when nobody is 6.2 Open Problems
trusted to grant on all underlying tables. A user can This section describes several open problems in easing
even be a view administrator with the ability to grant the administration of warehouse access permissions.
access to V, if the within-view permissions are for the First we describe pragmatic issues, and then theoretical
Grant-Read operation. ones.
We have studied examples that require more complex Perhaps the biggest pragmatic issue is creation of a
within-view permissions, in which multiple witnesses tool based on the theory, so user experience could be
need to be coordinated. However, the extra complexity observed. An html mockup of an environment for
of the permission mode makes them less useful in managing metadata across layers in a multi-tier system
practical situations, and we therefore relegate their is available at [Ro99a]. The mockup was useful in
development to a future paper. identifying requirements, and holes in the theory. But
we lack the resources (especially grad student labor) to
6 Summary and Future Work create and test a real tool. Also, our tool explored only
the semantics. It will be necessary to embed it in a
6.1 Contributions systems management environment that coordinates
permissions with each relevant DBMS.
The greatest strengths of our approach are simplicity,
architecture and location independence, and A second pragmatic issue is how to produce a query
compatibility with relational technology. Our strongest rewriter. It is infeasible to keep up with new query
result is the discovery that some pleasantly simple operators and constraint constructs. Fortunately,
concepts – separation of physical permissions, and the DBMSs already have a component that does such
“witness”-based theory for permissions on derived data inference rather well, and in which big money is
– can turn a difficult, messy problem into one that can invested – the query optimizer. The key issue, partly
be addressed with few technical complications. We technical and partly political, is whether query
obtained a consistent, defensible theory for inferring 6
processors will export their rewrite functionality.
new permissions from those which were explicitly
asserted. At the same time, we built upon rather than
replaced existing relational technology.
6
More specifically, our approach provides the In fact, warehouse query processors increasingly
following benefits: rewrite queries to exploit materialized views. However,
• We can determine whether permissions on some modifications will be needed, to avoid exploring
sources’ exported tables justify granting a user multiple witnesses over the same input set (e.g., for
different join orders), and to avoid cost-based pruning.
A. Rosenthal, E. Sciore 8-7
Further inference capabilities would be available if would seem appropriate to distinguish physical and
the extract/transform/load operations were describable information permissions, and to grant access based on
as SQL expressions (with new SQL operators such as whatever rewrites are easy to implement. More
coalesce columns, and user-defined functions). sophisticated rewrites seem quite useful to employ
Currently, the E/T/L views tend to be described in predefined views or eliminate unneeded tables, but will
code, though some products (such as the Microsoft be costly to implement.
Repository [Be99]) will identify the input objects.
Finally, a message to builders of query processors:
Next, we believe that sources and warehouse users Make the query rewrite facilities available for other
need to negotiate permissions in terms of roles they purposes. Rewrites are invaluable at inferring access
both understand. Without this, inference is of limited permissions and other metadata, and yet are too costly
use. Role and group management is a difficult task. We to implement just for metadata purposes.
hope to see rapid progress in this area, perhaps driven
by task and workflow models. One also needs to see if 7 References
knowledge of data derivation can help in defining such
roles. [Be99] P. Bernstein, et. al., “Microsoft Repository
Version 2 and the Open Information Model,”
Finally, we need to understand the benefits of Information Systems 24(2), 1999.
implementing each feature. For example, how many
users would want views that filter across several [Cap97] S. De Capitani di Vimercati, P. Samarati,
sources, or cycles in view derivability? Which kinds of Authorization Specification and Enforcement in
rewrites are most effective in giving users additional Federated Database Systems, Journal of Computer
justified permissions? Security, vol. 5, n. 2, 1997, pp. 155-188.
Several theoretical problems remain open. We need [Cas97] S. Castano, S. De Capitani di Vimercati, M.G.
efficient algorithms to infer permissions. We want to Fugini, Automated Derivation of Global Authorizations
get a calculus of access predicates, to compute in for Database Federations, Journal of Computer Security,
advance what warehouse permissions should be vol. 5, n. 4, 1997, pp. 271-301.
granted. (Currently, for each request, one seeks a
witness). Also, [Ro00] proposed to attach limitations to [Jo94] D. Jonscher, K. Dittrich, “An Approach for
grant-option permissions (e.g., to allow onward grants Building Secure Database Federations”, VLDB,
only within the recipient’s department). Such limitation Santiago, Chile, 1994.
predicates deserve to be added to the inference theory.
Next, to extend the theory to federations and ordinary [Pr00] T. Priebe, G, Pernul, “Data Warehouse Security
views, one needs to handle permissions for update in GOAL”, submitted for publication.
operations, and user-defined operations (e.g., Hire-
Employee). [Ro99a] A. Rosenthal, E. Sciore, G. Gengo,
“Demonstration of Multi-Tier Metadata Management”,
Finally, research is needed to support permission at www.mitre.org/resources/centers/it/staffpages/arnie/
requests that originate at the top (as in “make my view
have the following inferred permissions”) [Cas97]. If [Ro99b] A. Rosenthal, E. Sciore, V. Doshi, “Security
permission inference is seen as defining a view, top- Administration for Federations, Warehouses, and other
down asks to update the view. As with conventional Derived Data”, IFIP 11.3 Working Conference on
view updates, there are often multiple ways to meet the Database Security, Seattle 1999. (Kluwer, 2000).
user’s intent. Generation of those alternatives needs to
be guided by a theory, and automated by tools. [Ro00] A. Rosenthal, E. Sciore, “Extending SQL’s
Grant and Revoke Operations”, IFIP Workshop on
Database Security, Amsterdam, August 2000
6.3 Conclusion
[Ul89] J. Ullman, Principles of Database and
The proposed facilities can provide significant benefits Knowledge-Base Systems, Vol. 2, Computer Science
for administering distributed databases and views. At Press, Rockville MD, 1989.
the same time, we minimized complexity for DBMS
implementers, by building over rather than replacing [UPa00] Instructions from U. Pennsylvania data
SQL facilities. warehouse
http://www.upenn.edu/computing/da/dw/security.html
We contend that a “derived data access permissions”
module deserves implementation today with RDBMSs,
especially distributed RDBMSs. For the short term, it
A. Rosenthal, E. Sciore 8-8