<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>View Security as the Basis for Data Warehouse Security</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Arnon Rosenthal</string-name>
          <email>arnie@mitre.org</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Edward Sciore</string-name>
          <email>sciore@bc.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Boston College and MITRE</institution>
          ,
          <addr-line>Chestnut Hill, MA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The MITRE Corporation</institution>
          ,
          <addr-line>Bedford, MA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>8</fpage>
      <lpage>8</lpage>
      <abstract>
        <p>Access. permissions in a data warehouse are currently managed in a separate world from the sources' policies. The consequences are inconsistencies, slow response to change, and wasted administrative work. We present a different approach, which treats the sources' exported tables and the warehouse as part of the same distributed database. Our main result is a way to control derived products by extending SQL grants rather than creating entirely new mechanisms. We provide a powerful, sound inference theory that derives permissions on warehouse tables (both materialized and virtual), making the system easier to administer and its applications more robust. We also propose a new permission construct suitable for views that filter data from mutually-suspicious parties.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>A key challenge for data warehouse security is how to
manage the entire system coherently – from sources
and their export tables, to warehouse stored tables
(conventional and cubes) and views defined over the
warehouse. Permissions on the warehouse must satisfy
the restrictions of the data owners, and be updated
quickly as those local concerns evolve. Yet the system
cannot demand extensive administrator time, since
there are too few people with both technical skills to
understand derivation logic, and business skills to
balance security versus accessibility.</p>
      <p>Thus the critical problem is how to automatically
coordinate the access rights of the warehouse with
those of the sources. To do so, one must be able to
The copyright of this paper belongs to the paper’s authors. Permission to copy
without fee all or part of this material is granted provided that the copies are
not made or distributed for direct commercial advantage.</p>
      <p>Proceedings of the International Workshop on Design
and Management of Data Warehouses (DMDW'2000)
Stockholm, Sweden, June 5-6, 2000
(M. Jeusfeld, H. Shu, M. Staudt, G. Vossen, eds.)
http://sunsite.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-28/
infer access rights across subsystems, without infringing
on their local autonomy. This problem has not been
addressed in systems to date. As a result, the warehouse
DBA (W-DBA) not only has to manually specify access
rights on all warehouse data, the W-DBA must also be
trusted by all sources to specify these rights
appropriately.</p>
      <p>The goal of this paper is to provide a theory that
permits automated inference of many permissions for
the warehouse, in a way that minimizes both the
learning curve for administrators and the amount of new
software that vendors would implement. We do not
propose a separate theory that requires vendors to
implement major new mechanisms, or administrators to
learn and execute new tasks. Instead, our theory is a
natural extension of the standard SQL grant/revoke
model, to systems with redundant and derived data.
Nearly all the capability comes from adapting general
mechanisms for view security, and by exploiting
available technology for generating equivalent queries.
1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Overview of Our Results</title>
      <sec id="sec-2-1">
        <title>Our theory extends SQL in three ways.</title>
        <p>First, we split the notion of “access permission on a
table” into two separately-administered issues: who is
allowed to access what information (information
permissions), and who is allowed to access which
physical tables (physical permissions). An information
permission is the result of an enterprise-wide decision,
and should be consistently applied to all the views,
replicas and derivatives in the system.1 In contrast,
physical permissions are local, and need not be
consistent.</p>
        <p>For example, the decision that “Employee salary
information is releasable to payroll clerks and cost
analysts” is an information permission, whereas the
decision that “Cost analysts are allowed to run queries
on the warehouse” is a physical permission. The effect
of separating these concerns is that each set of
permissions may evolve independently without</p>
        <p>Consistency means having an unambiguous
statement for each table or tuple. We do support policies
that are conditional based on data value.
invalidating the other, and each in isolation has very
natural semantics.</p>
        <p>Our second extension provides a powerful inference
mechanism. In SQL, a user is allowed to execute a
query Q if the user has permission on all tables
mentioned in Q. We extend SQL so that a user can
also execute Q if there is an equivalent query Q’ for
which the user has permission. That is, releasability
depends on the result, not on the computation. We say
that Q’ is a witness for Q. This extension relies on the
query rewriting capability of the database system to
determine equivalence. It is not necessary for the
system to have any particular level of rewrite capability
– we can exploit whatever degree of rewriting a vendor
can afford to provide. We therefore dare to hope that
the results would be practical and attract vendors.</p>
        <p>Third, we propose a new construct that broadens the
creation of views over mutually suspicious
organizations. Conventionally, a source can reduce risk
by exporting a view that filters out sensitive data. In
SQL, a view can be created (with Grant authority) only
if there is a user that has Grant authority on all
mentioned tables. We motivate and define an extension
that allows the desensitizing view to be over several
mutually-suspicious sources, where no person is trusted
to have Grant permission over all of them.</p>
        <p>Implementing our approach would improve existing
systems in several ways. We meet the goals of notable
previous proposals [Cap97, Cas97, Jo94], while using
fewer new mechanisms and providing greater
flexibility. For example, our model allows authority to
pass across system boundaries, and supports multiple
layers (e.g. of warehouses and data marts) without
requiring new constructs. Our model also aids the
system administrator, by automatically inferring many
Read and Grant-Read privileges from the warehouse
view definition, and by providing rules that determine
the allowable queries on data cubes (and rollups). The
authority to grant onward is provided by normal view
inference of Grant-Read. The model also helps with
views defined over warehouse tables – it emphasizes
layers of views rather than system boundaries.
1.2</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Scope Limitations</title>
      <p>Warehouse security entails many issues that are not
addressed here. We believe that industrial progress is
most likely if we design a robust, useful module for
security issues of derived data, rather than partial
solutions for a broad range of requirements. The
derived data facility would be complemented by
modules that handle issues such as query rewrite, role
management, separation of duty, and inference attacks.</p>
      <p>We offload the problem of managing user sets to a
role/group management module. Several previous
models include authentication and authorization within
their models (e.g., [Jo94, Cas97]). However, recent
industrial trends tend to pull these capabilities toward a
middleware credentials infrastructure, and hence beyond
the control of database researchers.</p>
      <p>Because warehouses emphasize read operations, we
focus on the operations Read and Grant-Read. As
remarked in [Cas97], a similar inference theory works
for update, but is outside the scope of this paper.
2</p>
      <sec id="sec-3-1">
        <title>Basic Definitions</title>
        <p>A permission is a 4-tuple (subject, operation, object,
mode). A subject is an abstract user, representing an
individual, a group, role, process, etc. We say subject s
has been granted a permission if it is granted either
explicitly to s, or to an ancestor in a group or role
hierarchy. In principle, each subject is uniquely named
and is valid in all systems (though explicit grants to
individuals may be rare outside their home system).2</p>
        <p>An object (also called a table) belongs to some
schema, in some system, either a source or a warehouse.
It can be a relation or OLAP cube (including, trivially,
an individual cell), and can be either materialized or
virtual. We refer to a table exported by a source as a
source table.</p>
        <p>Our set of allowable operations is the same as in
SQL. In this paper we focus on the operations Read and
Grant-Read. If subject s is granted a Read permission
on table T, then s is allowed to access all of T. If s is
granted a Grant-Read permission on T, then s is allowed
to grant a Read or s permission on T to any subject.</p>
        <p>The mode of a permission is either information or
physical, and will be discussed in Section 3.</p>
        <p>We treat each warehouse table as a view
(materialized or virtual) over the tables exported by
sources or in the warehouse. A view is defined by an
SQL query, and may include user-defined functions. If
V is a view, we write its defining query as QV. The
inputs to a query Q are the objects mentioned in it; to
show the inputs, we denote the query as Q(T1, …, Tn). Q
can be a black box (either opaque code or secret), but
we assume that every query’s input set is known.
3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Information and Physical Permissions</title>
        <p>The mode of a permission specifies that it is either an
information permission or a physical permission. A
subject s is allowed to access a table only if it has both
information and physical permissions on that table.
2 This paper addresses what permissions should exist.
Partitioning by physical system is permitted in the
physical implementation, which we do not address.</p>
        <p>A permission (s, op, T, “information”) indicates that
the information in T should be accessible to s, for
operation op. It concerns releasability of knowledge,
not access to the physical container T. Information
policies apply globally, spanning source systems and
warehouse tables. They are unaffected by creation of
redundant copies or new interfaces – the policy on a
table would be unaffected by a decision to create a
mirror table or an extract at another site.</p>
        <p>For example, a medical system may contain several
information policy assertions concerning when
TreatmentCost-values can be releasable to certain
subjects. One assertion might be “to auditors”. Others
might be: “In New York, to the role BillingClerk”; and
“in Florida, to the role InsuranceAdjuster”. These
assertions correspond to information permissions, and
refer to all copies of the information.</p>
        <p>A physical permission authorizes an execution
strategy to use a single physical resource (usually a
materialized table). Whereas information permissions
are global, physical permissions allow local autonomy.
By withholding physical permissions, a resource owner
can limit workload, allow only users who have paid a
fee, or allow only corporate employees. We suspect
that physical permissions will typically be administered
in coarse granules, e.g., to give a user physical access
to all tables on a particular platform. It seems best to
administer in terms of database schemas, so as to allow
finer granularity (e.g., to exclude blobs), and to use
groups and roles defined in the DBMS.</p>
        <p>Our long-term vision is to allow a security
administrator to see a giant distributed database that
spans many platforms. This database would be
managed using Grant/Revoke, with implied
permissions as described below. (Grant and Revoke
would be extended by a parameter that distinguishes
&lt;information | physical&gt; mode).</p>
        <p>It will be difficult technically and politically to
extend DBMS capabilities to cope with new modes,
and to notify each other of the need to Grant and
Revoke derived permissions. We suspect progress will
be faster if security management is part of a systems
manager (e.g., Tivoli) that communicates with each
relevant DBMS. Permission changes would be
replicated to the metadata manager, and it would
replicate derived changes to the other systems.</p>
        <p>In such an architecture, granting an ordinary SQL
permission in a DBMS would create both information
and physical permissions in the metadata manager. The
metadata manager would determine derived
permissions. The SQL permission (s, operation, object)
would be granted on a derived table (typically in the
warehouse) if and only if the subject has both
permissions (s, operation, object, “information”) and
(s, operation, object, “physical”). Each Grant or
Revoke would (conceptually) cause all the permissions
to be recomputed. It will be challenging to implement
these semantics efficiently and reliably among
distributed heterogeneous DBMSs. The alternative is to
implement them by human administrators.
4</p>
      </sec>
      <sec id="sec-3-3">
        <title>Permission Inference</title>
        <p>A permission is explicit if it is granted directly by an
authorized grantor. When the information for a table T
is derivable as Q(T1, … Tn), we may be able to infer
additional permissions. This section presents a theory
for determining the implied permissions in effect at a
particular schema (typically the warehouse).</p>
        <p>We want the rationale for permission inference to be
understandable (even by nontechnical managers) and to
be easy to automate. In earlier papers we relied on
inference rules, whose justification required
considerable argument. In this section we propose a
simpler “witness” semantics; inference rules are
implementation aids derived to track witnesses.
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>A Theory of Witnesses</title>
      <p>Informally, a subject should have permission to execute
a query if and only if the query can be expressed in
terms of tables (base or view) for which the user has
explicit permission.</p>
      <sec id="sec-4-1">
        <title>Formally, let T be a table (base or view). the following definitions:</title>
      </sec>
      <sec id="sec-4-2">
        <title>We have</title>
        <p>•
•</p>
        <p>A query Q is equivalent to T if the output of Q
always contains the same tuples as T.</p>
        <p>A permission (s, op, T, mode) is implied if there
exists an equivalent query Q(T1, …, Tn) such that
each permission (s, op, Ti, mode) has been granted
explicitly. Query Q is called the witness for the
implied permission.</p>
        <p>The inference mechanism for permissions is the same
for both physical and information permissions.
However, the interpretation of the implied permission is
different, depending on the mode.</p>
        <p>An implied information permission (s, read, T)
means that the information in T is releasable to s. In
effect, a subject need not care whether an information
permission is explicit or implied, nor whether T is a
materialized view, a view computed over several
sources, or a base table.</p>
        <p>An implied physical permission on T asserts that
there exists at least one way to compute T for which the
physical permissions are available. If T were a
materialized table, however, the subject would not have
physical access to T – the subject (or the database
query processor) would still need to use the tables from
the witness query.
4.2</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Query Rewrite Techniques</title>
      <p>Our definition of implied permission requires that the
system be able to find a witness query equivalent to T.
The theory of query equivalence is mature. There have
been numerous research articles, and it is heavily
exploited in query optimizers. Exploring the details
would be a distraction from the security issues, and
from creating a derived data security module. Instead,
we explore the benefits of exploiting three particular
rewriting strategies. A larger set of motivating
examples and a discussion of the issues involved in
using the query processor’s rewriter appear in [Ro99b].
• View Substitution: View substitution replaces a
mention of a view in a query by the view’s
definition. We can infer that if a subject s has the
necessary permissions on the source tables, then s
also has permission on the view. In this case, the
view is simply an alternate interface to the same
data. (In contrast, the current SQL standard
requires that all view users obtain explicit grants
from the view owner.)
• Semantic Query Optimization: If the user queries a
view V, some source data that underlies V may be
irrelevant to the query result. Query processors
routinely exploit integrity constraints and relational
algebra to rewrite queries in a simpler form. For
example, suppose V is the join of two tables; then
typically a query on V is possible only if the user
has permission on both tables. However, if the join
is on the key of one table and referential integrity
holds, then a query that accesses only fields from
the foreign-key table need not perform the join.
Thus, the subject issuing the query would not need
permission on V, but just the one table.
• Rewrite in terms of other views: Subjects are often
given access to information through views when
they do not have permissions on the base tables. A
query mentioning base tables may therefore be
equivalent to one mentioning only views. In such a
case, the system can infer that the subject issuing
the query has permission to execute it.</p>
      <p>Our model does not require any particular degree of
query rewrite, nor do we propose algorithms in this
paper. Below, we elaborate on two issues: why push
rewrite to another module, and why accept incomplete
enumeration of rewrites.</p>
      <p>First, DBMS vendors have historically given
metadata management a low priority. If the purpose of
query rewrite is to improve metadata (e.g., access
permissions), they are unlikely to spend the substantial
sums needed to do it well (e.g., to exploit constraints, to
extend it to new operators, or to employ materialized
views). However, they have been willing to spend the
money in pursuit of performance gains. Therefore if we
want to get the most from query equivalence, we need
to hook into the query processing module.</p>
      <p>Second, our benchmark is to do better than others
have, and to allow only justifiable permissions. Given
that current systems perform no inference, even the
implementation of the simple view substitution strategy
would be a significant improvement. The more
strategies implemented, the more powerful the system.
However, “completeness” is a minor goal. In fact, a
complete set of equivalents is impossible due to
incomplete declarations of constraints and operator
rewrite rules, and because the general rewrite problem is
undecidable. Moreover, even a complete set of
equivalents may give suboptimal permissions; only a
health care expert would know that “Patients where
Age&gt;21” deserves more permissions than Patients.
4.3</p>
    </sec>
    <sec id="sec-6">
      <title>Administering a Warehouse</title>
      <p>Our witness theory specifies which information
permissions are to be automatically inferred and
declared on warehouse tables. This section examines
the consequent benefits for warehouse administration
that arise from it.
4.3.1</p>
    </sec>
    <sec id="sec-7">
      <title>Modes of Administration</title>
      <p>Standard SQL infers view privileges only when a view
is defined – the view definer receives the intersection of
her privileges on the input tables. However, the view
definer must then explicitly specify all other
permissions on the view. Our model infers information
permissions for other users, so the W-DBA need not do
so. Beyond this, normal “grant option” semantics apply.
That is, if a source administrator owns all the tables that
underlie a warehouse view, she may administer the
view’s permissions directly, grant “grant option” to a
warehouse administrator, or combine the modes.</p>
      <p>[Cap97, Cas97] propose a rich security model for
tightly coupled federated databases. Many of the
facilities apply also to warehouses. Derived data
security entwines with models of data integration, entity
and object data models, and some aspects of group
management with negative permissions. They require a
new sort of specification, to tell whether federation
administrators can delete or supplement permissions
inferred from the sources. They also proposed features
in areas that we do not address, such as top-down
administration, and determining where authorizations
may be checked.</p>
      <p>These papers motivated many of our requirements,
but the complexity seems likely to deter major
commercial implementation. In areas where the models
overlap, ours seems to have fewer new constructs. Yet
we can simulate the modes of local autonomy proposed
in [Cap97, Cas97], and additional ones besides.</p>
      <p>One such mode has source administrators make all
decisions; Read or Update permissions are then
inferred to the view. This resembles our inference (and
includes Update). However, it does not appear to infer
privileges to views defined over the
federation/warehouse base tables, and does not allow
an upper-tier view (e.g., a statistical rollup) to receive
additional permissions from the underlying source
administrator(s). Our model allows all this. The
WDBA gives physical permission in advance to
AllEmployees, but receives no Grant-Read privileges, and
hence does nothing more. Going beyond the previous
work, in our model, an administrator s who possesses
Grant-Read information permissions on all of T1, …,
Tn, possesses it on V(T1, …, Tn). Even though T is in the
warehouse and s is associated with a source system,
she automatically has the right to grant additional
information permissions on T.</p>
      <p>Another proposed mode is to have joint signoff,
where the W-DBA must approve each permission on a
warehouse table. This can be achieved in our model by
having the W-DBA grant the physical permissions
selectively. Finally, there is a mode where the W-DBA
actually grants information permissions; our model
handles this by asking each source administrator to
grant Grant-Read on exported tables to the W-DBA.
4.3.2</p>
    </sec>
    <sec id="sec-8">
      <title>Deriving OLAP Permission Rules</title>
      <p>Witness theory works to guide the proper access
permissions for OLAP [Pr00]. Consider a cube having
several hierarchical dimensions. Suppose the source
declares that subject has permissions on a view defined
by some rollup level on each dimension. Then in our
model, a query is permissible only when a witness can
be found, i.e., if it can be derived from views at the
permitted rollup level. It will not be permissible to
access the warehouse data at a finer granularity than
the rollups, even if the filtered data is later rolled up.</p>
      <p>This is in contrast with some warehouse systems
[UPa00], which allow the query to execute, but
surreptitiously replace the user’s query by a query to a
subcube for which the user has access. We consider
this unsatisfactory, too likely to lead to bad decisions.
It would be far preferable to suggest to the user one or
more queries for which they do have sufficient access.
4.4</p>
    </sec>
    <sec id="sec-9">
      <title>Computing Local Permissions</title>
      <p>When a query is issued to a warehouse, one wants to
test permissions there, rather than refer the issue to the
source databases. That means that the system needs to
populate permission tables on the warehouse.3 Our
theory of witnesses allows us to determine whether a
subject is entitled to access a table T. However, the
warehouse needs to know the set of all subjects
authorized to access T, i.e., the set of users for which
witnesses can be found. This section shows how it is
possible to compute that set.</p>
      <p>The basic idea is that the set of users authorized to
execute a query (not considering equivalence) is the
intersection of the user sets of its inputs. The user set of
a view can thus be determined by taking the union of
these individual user sets for each equivalent query
found.</p>
      <p>Given T, the system first computes all queries
equivalent to T, structured as a directed graph in which
each subexpression appears only once. One can traverse
the graph, computing permissions for each table in that
graph from the permissions on its predecessors. If the
graph is acyclic, this can be done once, bottom up.</p>
      <p>It is possible to have cycles in derivability (e.g., a
whole table from horizontal or vertical partitions). We
have developed algorithms that appear to handle the
general case with similar order of complexity, but a
larger constant factor. The general case can also be
handled by translating the above recursions into Datalog
(one clause per view definition), and using semi-Naïve
evaluation [Ul89].
5</p>
      <sec id="sec-9-1">
        <title>Within-View Permissions</title>
        <p>A warehouse’s derivation is likely to draw from many
parts of an organization, or even from multiple
organizations. For example, a drug-effects warehouse
might draw information from many hospitals. With the
current SQL model, each organization must grant a
warehouse administrator Read (and Grant-Read)
permissions on their exported data – only then can
someone define and administer a warehouse table view
over the combined information. But what happens if no
person enjoys such universal trust? And how would 500
hospitals or 50000 doctors’ offices negotiate to choose
such a paragon?</p>
        <p>Our approach is to exploit multi-table4 views that
combine information in a way that makes it less
sensitive. A source can stipulate that its exported data
can be used only for computing the view. The SQL
3 To maximize rewrite possibilities, the warehouse
could also consider rewrites in terms of source tables,
while using its own copy of source permissions.
4 If V is computed only from table T, the definition is
uninteresting -- one could just grant select to s on V.
syntax from the administrator of one of the view’s
input tables might become:</p>
        <p>grant select to s on T within V</p>
        <p>That is, subject s can access T, but only from within
view V.</p>
        <p>This model still requires trust that the warehouse
computes the view correctly, and makes no other use of
the information. One must also trust the other sites not
to subvert the filtering effects of the view, as illustrated
in Example 1. But it will require malice and skill rather
than carelessness to subvert the software.
5.1</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Examples</title>
      <p>We give three examples where the warehouse
computes a view that is less sensitive, in a way that
could not be achieved by single-source views.</p>
    </sec>
    <sec id="sec-11">
      <title>Example 1 – Totals over a large set</title>
      <p>The warehouse supports financial studies of hospitals,
by providing COST data about hospital patients, in a
cube with dimensions Treatment, Age, and State (i.e.,
where each entry is a statewide average). Some
hospitals might be willing to release data only if it is
hidden within Statewide totals. The warehouse defines
a view V to compute the cube, and each hospital issues:
grant read on table MyHospital.COST within V
Elaborating this example, the warehouse might
contain other tables with more detailed information,
from hospitals that are willing to provide it.
Permissions on that more detailed information allows
that information to be used for computing V. Hospitals
in New York City might feel there is sufficient
anonymity in being included in citywide totals; they
could define V’ that totals by city. V is derivable from
V’. A few hospitals that particularly trust the
researchers, or whose cost information is public, may
grant access to the raw COST data.</p>
      <p>In this example, the hospitals trust that the
warehouse software will compute V (or V’) and discard
the raw inputs. They also trust that the other hospitals
will not spoof or reveal information in ways that enable
individual hospital results to be detected.5</p>
    </sec>
    <sec id="sec-12">
      <title>Example 2 – Peer-to-Peer Intersection</title>
      <p>Consider tables that track people entering the country
on international flights (denoted ENTRANT), and
people who are wanted by the police (WANTED).
There is a procedure “match” which examines rows in
5 Confidentiality would be lost if all but one hospital in
a state filled its COST table with $0 for each entry.
would presumably have serious operational effects.
Another attack, also unlikely, would be for all the
others to reveal their true costs.
the two tables and decides whether the entering person
fits the description of a person who is being sought.</p>
      <p>Each airport’s authorities want access to view V,
defined by:
select * from ENTRANT, WANTED
where match(ENTRANT, WANTED)
and ENTRANT.Airport = $MY_Airport</p>
      <p>The border authorities should not have access to all
arrest warrants, and the police searchers should not
know about all citizens’ travels. So neither will give the
other’s representative Read authority on their entire
table. However, they both are willing to give read
access within V to the head agent at each airport.</p>
    </sec>
    <sec id="sec-13">
      <title>Example 3 – Intersection of child and parent</title>
      <p>Suppose that a hospital has two tables:</p>
      <p>PATIENT[P#, Age,…] and</p>
      <p>SURGERY[P#, ProcedureDone, Date, …]
PATIENT is the parent table (every surgery has a valid
patient), but most patients have no surgery records.</p>
      <p>A researcher s studying surgery on the old might
create the view
select * from PATIENT, SURGERY
where PATIENT.Age &gt; 80
and PATIENT.P# = SURGERY.P#
In current systems, s would need access on both tables
in order to create (and access) the view. In our model, it
is only necessary for the administrator of each table to
give u permission on the table within the view.</p>
      <p>This example can also be used to illustrate the
benefits of automatic inference. Suppose a research
user s with access to PATIENT and SURGERY
periodically runs a reporting application that executes
the above view. Now suppose new regulations are
enacted that forbid u from seeing children’s data, i.e., s
is given access only to the view V’, defined by:
select * From PATIENT where Age&gt;21
In current systems, the reporting application would fail.
With query rewrite, the query could instead execute
over the (non-materialized) view V’. Details appear in
[Ros99b].
5.2</p>
    </sec>
    <sec id="sec-14">
      <title>Semantics for Within-View Permissions</title>
      <p>Intuitively, a subject s is able to access view V if s has
access to each input table, within V. Formally, we have:
•</p>
      <p>A within-view permission is a 5-tuple (subject,
operation, object, mode, view).</p>
      <p>The meaning of within-view permission P = (s, read,
T, m, V) is as follows. If P is an information permission,
then s has clearance to view the values of T that
contribute to V. If P is a physical permission, then s is
allowed to execute a query that physically accesses T,
but only for the purpose of computing V.</p>
      <p>One can straightforwardly extend the witness
semantics of Section 4.1 as follows.</p>
      <p>A permission (s, op, T, mode) is an implied
permission if there exists a query Q and views {Vi}
equivalent to T, such that for each object Ti mentioned
in Q, either
• the permission (s, op, Ti, mode) has been explicitly
granted, or
• the within-view permission (s, op, Ti, mode, Vi)
has been explicitly granted, where Q is equivalent
to Vi.</p>
      <p>Query Q is called the witness for the implied
permission.</p>
      <p>This definition allows us to model each of the
examples in Section 3.1.</p>
      <p>A consequence of the above definition is that a user
who receives within-view permissions on the inputs of
view V gets (by inference) the permission on V. Thus,
users are able to access a view even when nobody is
trusted to grant on all underlying tables. A user can
even be a view administrator with the ability to grant
access to V, if the within-view permissions are for the
Grant-Read operation.</p>
      <p>We have studied examples that require more complex
within-view permissions, in which multiple witnesses
need to be coordinated. However, the extra complexity
of the permission mode makes them less useful in
practical situations, and we therefore relegate their
development to a future paper.
6
6.1</p>
      <sec id="sec-14-1">
        <title>Summary and Future Work</title>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>Contributions</title>
      <p>The greatest strengths of our approach are simplicity,
architecture and location independence, and
compatibility with relational technology. Our strongest
result is the discovery that some pleasantly simple
concepts – separation of physical permissions, and the
“witness”-based theory for permissions on derived data
– can turn a difficult, messy problem into one that can
be addressed with few technical complications. We
obtained a consistent, defensible theory for inferring
new permissions from those which were explicitly
asserted. At the same time, we built upon rather than
replaced existing relational technology.</p>
      <p>More specifically, our approach provides the
following benefits:
• We can determine whether permissions on
sources’ exported tables justify granting a user
•
•
•
access to the warehouse table. The same
mechanism is helpful for views defined above the
warehouse base tables.</p>
      <p>The theory of witnesses for individual users
underpins a recursion that can efficiently determine
the permissions to be granted on each stored
warehouse table. When source permissions change,
the inference can be redone, completely
automatically. An efficient “delta” inference
process appears feasible for Grants.</p>
      <p>There is no need to customize the security inference
rules to handle OLAP or extract/transform/load
operators. Instead, one can use algebraic and
constraint-based rewrite rules provided by the
query optimizer.</p>
      <p>We showed that local autonomy can be preserved
(using physical permissions) without sacrificing the
inference of information permissions. Capabilities
proposed in other research [Cap97, Cas97] were
subsumed.
“Within-view” permissions enable additional
collaboration among mutually suspicious
organizations.</p>
      <p>Perhaps the biggest pragmatic issue is creation of a
tool based on the theory, so user experience could be
observed. An html mockup of an environment for
managing metadata across layers in a multi-tier system
is available at [Ro99a]. The mockup was useful in
identifying requirements, and holes in the theory. But
we lack the resources (especially grad student labor) to
create and test a real tool. Also, our tool explored only
the semantics. It will be necessary to embed it in a
systems management environment that coordinates
permissions with each relevant DBMS.</p>
      <p>A second pragmatic issue is how to produce a query
rewriter. It is infeasible to keep up with new query
operators and constraint constructs. Fortunately,
DBMSs already have a component that does such
inference rather well, and in which big money is
invested – the query optimizer. The key issue, partly
technical and partly political, is whether query
processors will export their rewrite functionality.6
6 In fact, warehouse query processors increasingly
rewrite queries to exploit materialized views. However,
some modifications will be needed, to avoid exploring
multiple witnesses over the same input set (e.g., for
different join orders), and to avoid cost-based pruning.</p>
      <p>Further inference capabilities would be available if
the extract/transform/load operations were describable
as SQL expressions (with new SQL operators such as
coalesce columns, and user-defined functions).
Currently, the E/T/L views tend to be described in
code, though some products (such as the Microsoft
Repository [Be99]) will identify the input objects.</p>
      <p>Next, we believe that sources and warehouse users
need to negotiate permissions in terms of roles they
both understand. Without this, inference is of limited
use. Role and group management is a difficult task. We
hope to see rapid progress in this area, perhaps driven
by task and workflow models. One also needs to see if
knowledge of data derivation can help in defining such
roles.</p>
      <p>Finally, we need to understand the benefits of
implementing each feature. For example, how many
users would want views that filter across several
sources, or cycles in view derivability? Which kinds of
rewrites are most effective in giving users additional
justified permissions?</p>
      <p>Several theoretical problems remain open. We need
efficient algorithms to infer permissions. We want to
get a calculus of access predicates, to compute in
advance what warehouse permissions should be
granted. (Currently, for each request, one seeks a
witness). Also, [Ro00] proposed to attach limitations to
grant-option permissions (e.g., to allow onward grants
only within the recipient’s department). Such limitation
predicates deserve to be added to the inference theory.
Next, to extend the theory to federations and ordinary
views, one needs to handle permissions for update
operations, and user-defined operations (e.g.,
HireEmployee).</p>
      <p>Finally, research is needed to support permission
requests that originate at the top (as in “make my view
have the following inferred permissions”) [Cas97]. If
permission inference is seen as defining a view,
topdown asks to update the view. As with conventional
view updates, there are often multiple ways to meet the
user’s intent. Generation of those alternatives needs to
be guided by a theory, and automated by tools.
6.3</p>
    </sec>
    <sec id="sec-16">
      <title>Conclusion</title>
      <p>The proposed facilities can provide significant benefits
for administering distributed databases and views. At
the same time, we minimized complexity for DBMS
implementers, by building over rather than replacing
SQL facilities.</p>
      <p>We contend that a “derived data access permissions”
module deserves implementation today with RDBMSs,
especially distributed RDBMSs. For the short term, it
would seem appropriate to distinguish physical and
information permissions, and to grant access based on
whatever rewrites are easy to implement. More
sophisticated rewrites seem quite useful to employ
predefined views or eliminate unneeded tables, but will
be costly to implement.</p>
      <p>Finally, a message to builders of query processors:
Make the query rewrite facilities available for other
purposes. Rewrites are invaluable at inferring access
permissions and other metadata, and yet are too costly
to implement just for metadata purposes.
7</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Be99]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          , et. al.,
          <source>“Microsoft Repository Version 2 and the Open Information Model,” Information Systems</source>
          <volume>24</volume>
          (
          <issue>2</issue>
          ),
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>[Cap97] S. De Capitani di Vimercati</surname>
          </string-name>
          , P. Samarati,
          <article-title>Authorization Specification and Enforcement in Federated Database Systems</article-title>
          ,
          <source>Journal of Computer Security</source>
          , vol.
          <volume>5</volume>
          , n. 2,
          <issue>1997</issue>
          , pp.
          <fpage>155</fpage>
          -
          <lpage>188</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Cas97]
          <string-name>
            <given-names>S.</given-names>
            <surname>Castano</surname>
          </string-name>
          , S. De Capitani di Vimercati, M.G.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Fugini</surname>
          </string-name>
          ,
          <source>Automated Derivation of Global Authorizations for Database Federations, Journal of Computer Security</source>
          , vol.
          <volume>5</volume>
          , n. 4,
          <issue>1997</issue>
          , pp.
          <fpage>271</fpage>
          -
          <lpage>301</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Jo94]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jonscher</surname>
          </string-name>
          , K. Dittrich, “
          <article-title>An Approach for Building Secure Database Federations”</article-title>
          , VLDB, Santiago, Chile,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [Pr00]
          <string-name>
            <given-names>T.</given-names>
            <surname>Priebe</surname>
          </string-name>
          ,
          <string-name>
            <surname>G</surname>
          </string-name>
          , Pernul, “
          <article-title>Data Warehouse Security in GOAL”, submitted for publication.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Ro99a]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          , E. Sciore, G. Gengo, “
          <article-title>Demonstration of Multi-Tier Metadata Management”, at www</article-title>
          .mitre.org/resources/centers/it/staffpages/arnie/ [Ro99b]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Sciore</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Doshi</surname>
          </string-name>
          , “
          <article-title>Security Administration for Federations, Warehouses, and other Derived Data”</article-title>
          ,
          <source>IFIP 11.3 Working Conference on Database Security</source>
          ,
          <year>Seattle 1999</year>
          . (Kluwer,
          <year>2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Ro00]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rosenthal</surname>
          </string-name>
          , E. Sciore, “
          <article-title>Extending SQL's Grant and Revoke Operations”</article-title>
          , IFIP Workshop on Database Security, Amsterdam,
          <year>August 2000</year>
          [Ul89]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ullman</surname>
          </string-name>
          , Principles of Database and
          <string-name>
            <surname>Knowledge-Base</surname>
            <given-names>Systems</given-names>
          </string-name>
          , Vol.
          <volume>2</volume>
          , Computer Science Press,
          <string-name>
            <surname>Rockville</surname>
            <given-names>MD</given-names>
          </string-name>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>