=Paper= {{Paper |id=Vol-1184/paper10 |storemode=property |title=Publishing L2TAP Logs to Facilitate Transparency and Accountability |pdfUrl=https://ceur-ws.org/Vol-1184/ldow2014_paper_10.pdf |volume=Vol-1184 |dblpUrl=https://dblp.org/rec/conf/www/SamaviC14 }} ==Publishing L2TAP Logs to Facilitate Transparency and Accountability== https://ceur-ws.org/Vol-1184/ldow2014_paper_10.pdf
     Publishing L2TAP Logs to Facilitate Transparency and
                       Accountability

                            Reza Samavi                                                Mariano P. Consens
                         University of Toronto                                          University of Toronto
                   samavi@mie.utoronto.ca                                       consens@mie.utoronto.ca


ABSTRACT                                                                                                                     RT1Auditor
We propose publishing L2TAP privacy logs to facilitate pri-
vacy auditing tasks that involve multiple auditors, an increas-                        Research Team       L2TAP
ingly common requirement in the context of social computing                                 RT1         Audit Log RT1	
     Data Provider
and big data driven science. Our proposal utilizes two on-                                                                     Auditor
                                                                            Dataset
tologies, L2TAP and SCIP, designed for deployment in a                      MIMIC II
Linked Data environment. L2TAP provides provenance en-
                                                                          Data Provider Research Team                       External Auditor
abled logging of events. SCIP synthesizes contextual integrity                                             L2TAP
                                                                           PhysioNet         RT2
concepts to express key privacy-related semantics associated                                            Audit Log RT2	
  
with log events. We describe SPARQL query-based solu-
                                                                                                                             RT2 Auditor
tions for privacy log construction, obligation derivation, and
compliance checking. The solutions facilitate accountability
and transparency among participants (privacy auditors in                         Figure 1: Privacy Auditing Scenario
particular).

1.   INTRODUCTION                                                    consider a research study that analyzes the primary reasons
The protection of individuals’ privacy is becoming increas-          for intensive care unit (ICU) hospitalization, examining the
ingly more challenging in the era of social computing and            effectiveness of different types of medications across patient
data driven science. While privacy protection has impli-             demographics (sex, age, and ethnicity). The scenario is
cations in many application areas, it is clearly challenging         depicted in Fig. 1. The dataset used for the study is MIMIC
when health related data is involved. Big data enabled bi-           II, a Clinical Database provided by PhysioNet [10], where
ological and biomedical research involves massive datasets           records contain information about the ICU admission of
of human genome, biological imaging, and clinical informa-           patients [16]. Although MIMIC II database is a de-identified
tion collected and aggregated from individual health records.        public dataset, the access is available only under terms of
Protecting data subjects’ privacy in clinical research is a          a data use agreement (DUA1 ). This DUA defines a list of
concern addressed by multiple legislations and regulations.          obligations that a researcher agrees to fulfill. Some of these
For example, the U.S. Department of Health and Human Ser-            obligations involves purposes and roles. For example, the
vices (HHS) [27] obliges investigators to protect the privacy        dataset should be used only for academic research purposes
of data subjects and to maintain the confidentiality of data.        and by a researcher (ob1 ). Some others are pre-obligations
HHS also requires investigators to establish oversight mecha-        which are actions that need to be performed prior to access.
nisms and monitoring plans for research projects involving           For instance, the DUA states a researcher should complete a
human subjects, and to remain accountable to the subjects’           training program in human research subjects protections prior
privacy rights. Auditing is essential to the enforcement of          to access (ob2 ). There are also some post-obligations which
accountability, and many scenarios involve auditors from             are actions that need to be performed after access has been
multiple institutions that monitor the fulfillment of privacy        granted such as: If the researcher finds information within
obligations.                                                         restricted data that she believes might permit identification of
                                                                     any individual, she will report the location of this information
To illustrate the need for an audit mechanism that facilitates       promptly by email (ob3 ).
accountability and transparency among multiple participants,
                                                                     Two research teams (RT1 and RT2) are collaborating on this
                                                                     study. The teams could be based on related or unrelated
                                                                     research institutions (i.e., RT1 is based on a hospital, while
                                                                     RT2 is based on a university department, and the hospital
                                                                     could be part of the university, or not). The privacy policies
                                                                     mentioned above govern the access to MIMICII dataset. The
                                                                     policies are designed by PhysioNet according to HIPPA [26]
                                                                     and other privacy regulations in order to protect privacy
Copyright is held by the author/owner(s). LDOW2014, April 8, 2014,
Seoul, Korea.                                                        1
                                                                         http://physionet.org/works/mimic2cdb/access.shtml
of individuals whose data are used in research studies. In          tutional Review Board (IRB) for a single-site research and
order to check if the research teams are compliant to these         Data and Safety Monitoring Board (DSMB) for multi-site
policies multiple potential auditors must be able to audit          research), and the external auditor can check compliance.
the log of access and fulfilment of obligations. One of the
potential auditors is the Data Provider itself who oversees         The paper structure and contributions are as follows. Section
the contract to ensure the usage of data is accordance to the       2 provides an overview of L2TAP and SCIP and shows how
agreement. The auditors from two teams may also want to             the ontology can be used to capture the log events and their
audit the process with respect to the internal data protection      privacy semantics. Section 3 describes our SPARQL query-
policies. In addition, an external auditor should be able to        based solutions for constructing the log, obligation derivation,
ensure the research involving individuals health data are fully     and compliance checking. Section 4 describes the related
HIPPA compliant.The challenge needs to be addressed is              research. We conclude in Section 5.
that while multiple participants are involved in generating
privacy logs (e.g. the data provider, the research teams, and       2.   L2TAP LINKED DATA LOG
the researchers), all potential auditora should be able to
                                                                    In this section we first motivate the need for two ontologies
check the log to see if the researchers are respecting privacy
                                                                    to generate privacy audit logs. Then using our motivating
of data subjects.
                                                                    scenario we describe L2TAP, an ontology for specifications of
                                                                    the header of privacy log events, and SCIP, an ontology that
In the past few years, we have observed multiple practical
                                                                    provides necessary specifications to encode privacy semantics
proposals with focus on privacy of big datasets and linked
                                                                    of the body of log events.
data ([22, 18, 7, 8]). The goal of these studies have been on
access control frameworks that define who can access which
                                                                    The goal of L2TAP is to provide a set of classes and prop-
resources. They achieve privacy through safeguarding of
                                                                    erties that can be used to represent and publish a log of
data before the access is granted and provide no solutions
                                                                    privacy events as Linked Data. In the motivating scenario
for privacy support after the access.There are also solid
                                                                    expressing access policies and obligations, requesting access
theoretical and practical work to support privacy auditing
                                                                    to the dataset, fulfilling obligations are some of the typical
(data usage control after access is granted). However they
                                                                    privacy events that we expect L2TAP to be able to capture.
either exploit complex logic (e.g. [2, 9, 3, 5]) that jeopardizes
their practical benefits or use system level logging standards
                                                                    L2TAP follows the principles of Linked Data [11] to publish
(e.g.[15, 6]) to generate privacy audit logs on an application by
                                                                    logs. Everything in the l2tap:Log is expressed in terms of
application basis, thus generating privacy logs not exploitable
                                                                    some l2tap:LogEvents. URIs are used as names for logs, log
by multiple participants and auditors in an heterogenous
                                                                    events, participants, and processes. Thus, the log events can
environment.
                                                                    be published as web dereferenceable URIs by participants.
                                                                    Participants who want to dereference a published log are
In [23] we proposed L2TAP ontology (Linked Data Log to
                                                                    authenticated and communicated via a secure https channel.
Transparency, Accountability and Privacy) that allows par-
                                                                    After we describe L2TAP and SCIP ontologies, at the end
ticipants to log in RDF [28] the provenance assertions of
                                                                    of Subsection 2.2 we will provide justification on why these
privacy related events. We also proposed a second pluggable
                                                                    ontologies rely on dereferenceable URIs and how Linked Data
ontology SCIP (Simple Contextual Integrity Privacy) to cap-
                                                                    infrastructure allows to achieve log data integration when
ture the privacy semantics of log events and enable SPARQL
                                                                    multiple parties contribute into the log their privacy events
query-based implementations of auditing and compliance
                                                                    in different points in time.
checking in a personalized health workflow2 .The scalability
of the framework for compliance checking has been evaluated
                                                                    The L2TAP ontology describes the header of a log event
by a set of queries described in [23].
                                                                    and is scoped to answer the provenance queries about log
                                                                    events, such as who has contributed an event to the log and
Using L2TAP+SCIP, this paper proposes a standard way
                                                                    when. The when in L2TAP can be expressed as simple xsd
of privacy auditing in the big data research context. We
                                                                    time using two L2TAP properties, l2tap:eventTimestamp and
propose multiple SPARQL query-based solutions (with a
                                                                    l2tap:publishingTimestamp. There are some subtlety in captur-
limited reasoning support of RDFS) to facilitate the tasks
                                                                    ing the who in L2TAP. Following the second principle of
of constructing L2TAP privacy logs (when privacy policies
                                                                    Linked Data, using http:// URIs as names for participants,
are applicable to the classes of individuals and data items),
                                                                    amounts to a data publisher choosing part of an http:// names-
deriving obligations from privacy policies, and compliance
                                                                    pace that the publisher controls, by virtue of owning the
checking. If research teams agree on the semantics of pub-
                                                                    domain name [11]. In L2TAP, the publisher of the log events
lishing the log based on L2TAP+SCIP, they can show to the
                                                                    is the logger who owns the domain of the log and can talk
auditors their compliance in only one effort and auditors can
                                                                    about the events and their assertions (e.g. https://logRT.org).
oversee the compliance of parties involved without additional
                                                                    If an L2TAP logger wishes to identify a participant as the who
efforts. In other words, after the log has been created and all
                                                                    in an event header, the logger must register the participant,
obligations and their fulfillments are captured the research
                                                                    i.e. mint the URI of the participant with the namespace
team can check the log and provide it as an evidence of
                                                                    in its domain. Registered participants will be considered
accountability [21]. Using the same log and the SPARQL
                                                                    accountable for the assertions that they make in the log.
solutions, all other auditors including the data provider’s
auditor, the institution auditors (described in [26] as Insti-
                                                                    The privacy semantics of privacy events (e.g. what is an
                                                                    obligation fulfilment) are contained in the body of a log event
2
    L2TAP and SCIP are documented at http://l2tap.org.              and expressed using the SCIP ontology. In designing SCIP,
 1  a l2tap:Log.                                      1 https://logRT.org/logevent/e2> a l2tap:ParticipantRegistrationEvent;
 2  a l2tap:LogInitializationEvent;       2   l2tap:memebrOf ;
 3   l2tap:initializesLog ;                           3   l2tap:eventTimestamp "2014-01-27T12:00:00Z"^^xsd:dateTime;
 4   l2tap:logger ;                               4   l2tap:publicationTimestamp "2014-01-27T12:00:01Z"^^xsd:dateTime;
 5   l2tap:publicationTimestamp "2014-01-26T12:00:00Z"^^xsd:dateTime;    5   l2tap:registersAgent ;
 6   l2tap:timeline .                           6   l2tap:eventParticipant .
 7  a foaf:Agent.                                 7   l2tap:eventData .
 8  a l2tap:Timeline;                           8  a l2tap:Participant;
 9   l2tap:physicalTimeline tl:universaltimeline;                        9   l2tap:registeredAgent .
10   l2tap:clock "wwp.greenwichmeantime.com/" ^^ xsd:string;            10  a foaf:Agent.
11   l2tap:clockSyncFreq [tl:duration "P7DT"^^ xsd:duration].           11 ={
                                                                        12  a  .
                                                                        13  a  .
                                                                        14  a  .
                                                                        15  a  .}
we are inspired by the contextual integrity (CI) perspective
[20]. The SCIP ontology provides mapping targets for ba-
sic notions of participants in an information flow, privacy                     Figure 3: Participant registration event
contexts, and privacy norms as described in CI. The goal
of the SCIP ontology is to define a minimum set of classes,
properties and constraints that allows the basic compliance             Fig. 3 shows an example of using the L2TAP ontology to
queries (e.g. which access request is non-compliant?) to be             register the class of research teams as a foaf:Agent with
                                                                         URI (line 5). Note that
answered using SPARQL queries.
                                                                        this is the URI of a class of researchers. In lines 8-9 the
                                                                        l2tap:Participant class and the l2tap:registeredAgent property
Having two namespaces is the basis for the framework flexi-
bility and extensibility. The proposed SCIP ontology is just            are used to capture the fact that the URI of the class of re-
one instance of a class of pluggable ontologies to express              searchers is minted in the logger’s domain. It is optional for a
privacy semantics and can be substituted with an ontology               participant registration event to use the l2tap:participantData
with more-or-less expressive power without impacting the                property and add a named graph [4] as the event data (pay-
semantics of the log header.                                            load) to the event. Suppose in our motivating scenario RT1
                                                                        and RT2 are two classes of researchers. RT1 uses the dataset
                                                                        to study patients under 18 and RT2 studies patients 18 year
2.1     Log Event Types                                                 and older. The optional named graph can be used to cap-
L2TAP specifies three types of log events, log initialization           ture this classifications and additional information about the
events, participant registration events, and privacy events.            members of each class. For example we used the named
                                                                        graph (lines 11-15) to encode research team hierarchy and
Log Initialization Events. This type of events defines                  memberships. Therefore the accountability can be cascaded
which l2tap:Log is being initialized using the l2tap:initializesLog     to a specific individual.
property. It also records assertions on the log characteristics
such as who the logger is (l2tap:logger), and how the event             Privacy Events. A privacy event is used to encode privacy
timestamps are captured (l2tap:logClock). Fig. 2 provides an            processes such as expressing privacy policies, access requests,
example of using the L2TAP ontology to encode a log event               and obligation fulfilment. Fig. 4 shows how the L2TAP
that initializes a log with https://logRT.org URI (line 1). This        ontology is used to log provenance assertions of privacy
privacy log has a logger with https://RT.org/logger URI (line           policies applicable to our scenario. The quads in this privacy
4). The logger is a foaf:Agent3 (line 7). The physical timeline         event are grouped in two sets. The quads in lines 1-6 are the
for this log is a constant in the timeline ontology4 (line9).           header of the event and the quads in line 8 onward are the
Lines 10 and 11 encode the log’s reference clock and the                body of the event. The data provider (PhysioNet) is the one
syncing frequency.                                                      who submits the quads of the policies to the log (line 3). The
                                                                        body of this log event (wrapped in https://logRT.org/logng/ng1
Participant Registration Events. This event type is used                named graph) describes privacy policies and preferences as
to register a foaf:Agent as an L2TAP log participant who can            the payload of the event. The SCIP ontology is used to
then submit the future log events. The l2tap:registersAgent             express the semantics of a privacy event’s body.
property links a log event to a foaf:Agent who will be recog-
nized by the logger as the registered agent. As described
above the participant registration event marks the time in-
                                                                        2.2     Log Event Privacy Semantics
stant that a participant’s URI has been minted in the logger’s          The L2TAP ontology described so far encodes a privacy
domain. So the registered participants will be kept account-            event and its accountable participant regardless of the privacy
able with respect to the log events that they are contributing          semantics of the event. The SCIP ontology provides necessary
to the log in the future. In our scenario research teams as             vocabularies to capture the privacy semantics. We categorize
receivers of data and PhysioNet as the data provider are                the semantics in four groups: privacy preferences (policies),
participants in the log. If the data was not anonymized each            access requests and responses, obligation fulfillments, and
individual data subject or a class of data subjects could have          access activities.
also been registered as participants.
                                                                        Privacy Preferences. The scip:PrivacyPreference class is
3
  http://xmlns.com/foaf/spec/                                           used to encode a context and the norms applicable to the
4
  http://purl.org/NET/c4dm/timeline.owl#                                context. The context in SCIP is characterized using multiple
 1  a l2tap:PrivacyEvent;                  1  a l2tap:PrivacyEvent;
 2   l2tap:memebrOf ;                                 2   l2tap:memebrOf ;
 3   l2tap:eventParticipant ;    3   l2tap:eventParticipant ;
 4   l2tap:eventTimestamp "2014-01-28T12:00:00Z"^^xsd:dateTime;            4   l2tap:eventTimestamp "2014-01-29T12:00:00Z"^^xsd:dateTime;
 5   l2tap:publicationTimestamp "2014-01-28T12:01:00Z"^^xsd:dateTime;      5   l2tap:publicationTimestamp "2014-01-29T12:01:00Z"^^xsd:dateTime;
 6   l2tap:eventData  .                       6   l2tap:eventData .
 7  = {                                       7  = {
 8  a scip:PrivacyPreference; ...}              8  a scip:AccessRequest;
                                                                           9   scip:dataRequestor ;
                                                                          10   scip:dataSender ;
                                                                          11   scip:dataSubject ;
Figure 4: The header of a privacy event (for policies)                    12   scip:dataItem ;
                                                                          13   scip:purpose ;
                                                                          14   scip:requestorRole ;
 9   scip:expressedBy ;         15   scip:requestedPrivilege  .}
10   scip:hasValidity [time:hasBegining "2014-01-01T00:00:00Z";
11                     time:hasEnd "2015-01-01T00:00:00Z"];
12   scip:dataItem ;
13   scip:requestorRole ;             Figure 6: Log event for access request
14   scip:purpose ;
15   scip:privacyPrivilege ;
16   scip:obligation ;                            (line 20). scip:occurrenceGap property in line 21 encodes the
17   scip:obligation ;
18   scip:propositionalExpression  .             relative time interval for performing the obligation (a pos-
19  a scip:ObligationTemplate;                    itive integer indicates occurrence after the access activity,
20   scip:performAction ;
21   scip:occurrenceGap "-1"^^xsd:integer;                                property in line 22 encodes the time required to perform
22   scip:performanceDuration "1"^^xsd:integer.}                          the obligation. The third obligation is a post-obligation and
                                                                          requires to be fulfilled after access has been granted and
                                                                          when a record deem to be identifiable.
Figure 5: The body of a privacy event (for policies)
                                                                          Access Requests and Responses. The scip:AccessRequest
                                                                          class is used to encode a request by a researcher to access
classes: scip:DataItem, scip:Purpose, and scip:PrivacyPrivilege.          a dataset. A number of classes that we used to express
Use, collect, and disclosure are different types of privacy               privacy policies (such as scip:DataItem, scip:Role, scip:Purpose,
privileges. Participants in a context interact with each                  scip:DataItem, and scip:DataRequestor) will also be used to ex-
other in certain capacities or roles. In SCIP, roles of three             press access requests. An access request can be initiated by
main participants in an information flow are encoded using                a class of participants or an individual participant. In our
scip:dataSubjectRole, scip:dataRequestorRole, and scip:dataSender         motivating scenario, we assume one of the researchers in the
Role properties. The scip:Role class is used to capture the               team (Mark) uses the framework to log its access request
abstract and concrete roles. In SCIP, roles, purposes, data               (cf. Fig. 6). Note that the who in the header of this log
items and privacy privileges are represented as lattice using             event is Mark’s URI (line 3) who is a member of clinical
rdfs:subClassOf.                                                          researcher class. Line 8 encodes the Mark’s access request
                                                                          as an instance of scip:AccessRequest. The scip:dataRequestor
Fig. 5 shows how the SCIP ontology is used to encode the                  property in line 9 captures the URI of the data requestor
obligations described in our scenario. Note that the quads in             (Mark), scip:dataSender (line 10) captures who should send
this figure are the continuation of the quads in Fig. 4. Line             the data (PhysioNet) while scip:dataSubject (line 11) captures
9 describes by whom the privacy preferences are expressed                 whose data has been requested (Patients class in MIMIC II
using the scip:expressedBy property. Note that the minted                 dataset). Similar to the privacy policies, we encode in line
PhysioNet URI is the participant who submits the privacy                  12 the URI of requested data items (MEDITEMS: class of
policies. The quads in line 10 and 11 describe the validity               all medications taken by patients), the purpose for accessing
time interval of the policies. The first obligation in our                data (line 13), and the roles of the participants requesting ac-
scenario (ob1 ) is expressed as legitimate purpose (line 14)              cess (line 14). The privacy privilege that has been requested
for using the dataset (line 12) and the acceptable roles of               is encoded by scip:requestedPrivilege in line 15.
participants (lines 13) and the privilege that will be granted
if the obligations are fulfilled (line 15).                               The scip:AccessResponse class encodes the boolean response to
                                                                          an access request as well as the applicable obligations. The
There are also norms associated with a context that de-                   log event shown in Fig. 7 records the access response to the
scribe obligations or actions that need to be performed be-               Mark’s request by dereferencing the corresponding access
fore (pre-obligation) or after (post-obligation) the dataset              request URI (line 9). Line 10 encodes the access decision.
is accessed [19]. The scip:ObligationTemplate is a subclass               Associated with each access response there could be a set of
of scip:Obligation that captures these actions. Obligations,              applicable obligations. The quads in lines 11-17 encode one
expressed in privacy preferences, are templates for future                of the obligations derived from privacy policies applicable to
instantiation of executable obligations. The scip:Obligation is           the study. Lines 11 in this listing refers to the URI of the
rdfs:subClassOf scip:ObligationTemplate. Obligations has prop-            corresponding obligations using scip:contextObligation. When
erties to express temporal constraints associated with an                 multiple obligations arise from an access request, a propo-
obligation. For example, the second obligation requires tak-              sitional formula ϕ describes how the satisfaction of these
ing the training course (obtain_training_certificate) prior to            obligations relates to the overall compliance of the access
access. This obligation is encoded using scip:performAction               request. In our example scenario ϕ ≡ ob1 ∧ ob2 ∧ ob3 , i.e.
 1  a l2tap:PrivacyEvent;                  1  a l2tap:PrivacyEvent;
 2   l2tap:memebrOf ;                                   2   l2tap:memebrOf ;
 3   l2tap:eventParticipant;   3   l2tap:eventParticipant ;
 4   l2tap:eventTimestamp "2014-01-30T19:01:00Z"^^xsd:dateTime;            4   l2tap:eventTimestamp "2014-01-31T12:01:00Z"^^xsd:dateTime;
 5   l2tap:publicationTimestamp "2014-01-30T19:01:01Z"^^xsd:dateTime;      5   l2tap:publicationTimestamp "2014-01-31T12:01:01Z"^^xsd:dateTime;
 6   l2tap:eventData .                        6   l2tap:eventData .
 7  = {                                       7  = {
 8  a scip:AccessResponse;                  8  a scip:ObligationAcceptance;
 9   scip:responseTo ;                       9   scip:accepts .}
10   scip:accessDecision "True"^^xsd:boolean;
11   scip:contextObligation ;
12   scip:contextObligation ;
13   scip:propositionalExpression  .               Figure 8: Log event for obligation acceptance
14  a scip:Obligation;
15   scip:createdFrom .
16  a scip:Obligation;                         1  a l2tap:PrivacyEvent;
17   scip:createdFrom .}                            2   l2tap:memebrOf ;
                                                                            3   l2tap:eventParticipant ;
                                                                            4   l2tap:eventTimestamp "2014-02-01T12:01:00Z"^^xsd:dateTime;
         Figure 7: Log event for access response                            5   l2tap:publicationTimestamp "2014-02-01T12:01:01Z"^^xsd:dateTime;
                                                                            6   l2tap:eventData  .
                                                                            7  = {
                                                                            8  a scip:PerformedObligation;
                                                                            9   scip:performedFor ;
all three obligations must be fulfilled for the access to be               10   scip:performedBy ;
compliant. The scip:propositionalExpression property in line               11   scip:occurredIn "2014-02-01T11:00:00Z"^^xsd:dateTime .}
13 encodes this formula. The rest of the quads in Fig. 7 links
each of the performable obligations to the corresponding
obligation templates in the privacy policy. So the character-                Figure 9: Log event for performing an obligation
istics of each obligation such as the action and the temporal
constraints associated with the obligation become resolvable.
                                                                           Performing Obligation. In the scenario, one of the re-
 The who in this log event (line 3) is https://logRT.org/participa         search team members (Mark) is the participant who must
 nts/PN_ACLAgent indicating that the participant who has logged            perform the obligations as conditions to access to the dataset.
 the response is an ACL agent of PhysioNet, implementing                   The first obligation (obtain_training_certificate) is a pre-
 access control and obligation derivation. These mechanisms                obligation, meaning that the research team must obtain the
 are usually domain-dependent. In [23] we described how                    certificate and log this action as an evidence prior to access.
 we can derive obligations from privacy preferences using                  Fig. 9 shows a log event that captures the fact that Mark
 SPARQL queries. Obligations also can be derived using more-               has performed the first obligation. Line 8 defines the per-
 or-less complex mechanisms. From the logging perspective                  formed obligation as an instance of scip:PerformedObligation
 what is necessary is to have a mechanism in place to log                  class. Line 9 refers to the URI of the corresponding obliga-
 the access decisions and obligations, regardless of which                 tion logged in the access response. The participant who has
 mechanism is used to control access or derive obligations.                performed the obligation and the time instant of performing
                                                                           the obligation are encoded using the scip:performedBy (line 10)
When the obligations are derived from the privacy pref-                    and scip:occurredIn (line 11) respectively. Note that Mark is
erences (obligation templates) and logged, the obligation                  the who has submitted these quads to the log (line 3).
performer who can be the same participant as the data
requestor (Mark, the researcher) or a different participant                Access activity. Finally, SCIP has a class scip:AccessActivity
must fulfill the obligation in an acceptable time interval                 to record the occurrence of an access activity. Fig. 11 shows
and log its fulfillment. SCIP has a number of properties                   the log event of an access activity when the research team
to capture the participant who should perform an obliga-                   (including all its members) has accessed the dataset. Line 9
tion (scip:obligationPerformer), the participant who actually              refers to the URI of the corresponding obligation acceptance
performs the obligation (scip:performedBy), the one who can                event using scip:forObligationAcceptance. Line 10 captures the
witness the violation of an obligation (scip:obligationWitness)            time instant that the access activity occurred. The prove-
and the one who actually witnesses (scip:attestsViolation).                nance assertions for this log event (line 3) shows that the re-
                                                                           searcher is the participant who logs the access activities. We
 Obligation Acceptance. When the access response has                       assume that the data provider (PhysioNet) has also a mech-
 been logged, the research team (as the obligation performer)              anism in place to log all accesses to its dataset. Therefore, if
 accepts to perform the obligations. This event captures                   the researcher fails to log an access activity the discrepancy
 the researcher’s commitment as a performative act. The                    between the provider’s access log and the L2TAP audit log
 performative act is the utterance of a self-describing act                will trigger a non-compliance incident.
 which is performed by declaring that one is doing it [1]. Fig. 8
 shows the log event for obligation acceptance. The event’s                The justification for leveraging the Linked Data infrastruc-
 participant (line 3) is Mark, one of the registered researchers.          ture and derferenceable URIs become evident as we walk
 This event refers to the URI of the access response (line                 through the log events for the motivating scenario described
 9). By the virtue of logging this event, the researcher not               above. We summarized registration of the log events in
 only acknowledges existence of the obligations but also as a              Fig. 11. Participants make statements about the events
 performative act commits himself to perform the obligations               in the log. Therefore, they need to access the events data
 as conditions to access.                                                  to dereference the past events URIs that may have been
 1  a l2tap:PrivacyEvent;
 2   l2tap:memebrOf ;                                 Research Team                           L2TAP Audit Log 	
                        Data Provider
                                                                             RT1                                                                            PhysioNet
 3   l2tap:eventParticipant;
 4   l2tap:eventTimestamp "2014-02-02T00:01:00Z"^^xsd:dateTime;
 5   l2tap:publicationTimestamp "2014-02-02T00:01:01Z"^^xsd:dateTime;                                                                   Privacy Policies
 6   l2tap:eventData  .                                                                                       (Fig. 4 & 5)
 7  = {                                                        Access Request
 8  a scip:AccessActivity;                                      (Fig.6)
 9   scip:forObligationAcceptance ;
10   scip:occurredIn "2014-02-02T00:00:01Z"^^xsd:dateTime .}
                                                                                                                                        Access Response
                                                                                                                                            (Fig. 7)

       Figure 10: Log event for access activity
                                                                                         Obligation Acceptance
                                                                                                 (Fig. 8)


logged by other participants in the different points in time.
                                                                                         Performed Obligation
For example, the privacy policies are registered by Phys-                                       (Fig. 9)
ioNet on Jan 01, 2014, then the access request has been
                                                                                           Access Activity
logged by the research teams on Jan 29, 2014. The access                                      (Fig. 10)
response event has been logged by the Physionet access con-
trol agent on Jan 30th referring the URI of the access request
(scip:responseTo ). The access re-        Figure 11: Registering log events into the log and
sponse also refers to the URIs of obligations registered by             required URI dereferencing
PhysioNet as part of the privacy policies (e.g. scip:createdFrom
). Analogously the log events encoding
the acceptance of obligations, fulfilment of an obligation by
one of the researchers and the access activity logged by the            applicable privacy policies. An individual’s personal informa-
research team refer to the URIs of the other past log events.           tion can span from a very specific data item (e.g. the glucose
The statements that each of these participants wants to make            level in a blood work) to a very general data item (e.g. the
depends on the URIs of the statements have been previously              personal health record of an individual). Privacy policies and
logged.                                                                 regulations (e.g. HIPPA) are not only applicable to an entire
                                                                        dataset but also may apply to a specific class of data items
The events in Fig. 11 do not necessarily occur in the sequence          (e.g. mental health data) or a specific class of individuals (e.g.
shown. Consider a scenario in which the researcher logs an              children under age of 12). Individuals (e.g. data subjects
access to the dataset referencing an obligation acceptance’s            in our scenario) may have options to express their personal
URI. However, the researcher happens to not log an obliga-              privacy preferences applicable to the instances of their data.
tion fulfilment event corresponding to the access response.             Expressing everything in the log (including data items, par-
So the access response’s URI not be referred by a performed             ticipants, etc.) using dereferencable URIs provides the most
obligation event and in turn the corresponding access request           flexible and generic way of representing resources involved in
would also not be referred. This results in a non-compliant             privacy processes. Furthermore, RDF representation of audit
access request and the researcher would become accountable              logs using L2TAP and SCIP ontologies allows both the URI
for not logging the obligation fulfilment event. Therefore,             of a class of resources or URI of an instance of a resource
the L2TAP+SCIP ontology relies on the URI dereferencing                 (participants or data items) to be dereferenced and reasoned
to make actions of each participant transparent for other               about using RDFS. As shown in Fig. 3 members of the class
participants involved in the process (of course for the par-            of researchers are defined using a named graph as a log event
ticipants who have been authenticated) and provide support              payload. Exploiting rdfs:subClassOf allows to reason about
for accountability and privacy.                                         the entire class of researchers or a specific individual in the
                                                                        class when evaluating an obligation derivation query or a
                                                                        compliance query as described below. With the same token
3.   QUERY-BASED AUDITING                                               applicable privacy policies and preferences can be determined
The fundamental aspect of leveraging RDFS and Linked                    for a class of data subjects, a class of data items, or for one
Data to generate L2TAP logs is to facilitate privacy audit              instance of the same classes.
tasks by queries over the created logs. In this section we
will first discuss how the standard RDFS and computation                Log Construction. In our motivating scenario, the re-
of transitive closures for the refs:subClassOf relationship can         search institute is the one who needs access to the datasets
be exploited to support query-bases audit tasks. Then we                for its researchers and also wants to keep its researchers
describe three major audit tasks (constructing the log with             accountable with respect to the dataset usage policy. There-
data usage policies, obligation derivation and fulfilment, and          fore, the research institute initializes the log and registers the
compliance checking) that all can be supported by SPARQL                participants. The institute then uses the log in the future
queries with a limited RDFS reasoning support. These tasks              and show to the interested auditors that its researchers are
involve several classes of participants including data provider,        compliant with the policies. On the other hand, the data
data receiver (research teams), and auditors.                           provider wants to be able to express the norms and poli-
                                                                        cies that govern the data usage. So the provider wants to
RDFS Reasoning Support. By leveraging Linked Data                       contribute to the log these policies and record all accesses
for privacy audit log we can achieve a flexible way to deal             to datasets. We illustrated throughout Fig. 1-4 the set of
with data items granularity, participants granularity, and              quads that need to be stored in an L2TAP log for these tasks.
All quads in these figures can be appended to an L2TAP              1 ASK
log using SPARQL 1.1 [29] commands in three steps: first            2   WHERE {
                                                                    3     ?obAcc scip:accepts ?response.
a named graph will be created for a log event using CREATE          4     ?response scip:responseTo ?request.
GRAPH , second the quads of the log header will be inserted      5     ?response scip:contextObligation @ob.
to the log default graph and then the quads of the log event        6     @ob rdf:type scip:Obligation.
body will be inserted into the named graph using INSERT DATA        7     @ob scip:occurrenceGap ?occGap.
                                                                    8     @ob scip:performanceDuration ?pD.
{GRAPH  { }}.                                                    9     OPTIONAL {?accessActivity scip:forObligationAcceptance ?obAcc}.
                                                                   10     OPTIONAL {?accessActivity scip:accessedTime ?accessTime}.
Obligation Derivation. After the log is constructed, the           11     OPTIONAL {?performedOb scip:performedFor @ob}.
                                                                   12     OPTIONAL {?performedOb scip:performedBy ?performAgent}.
research teams (or an individual researcher) want to be able       13     OPTIONAL {?performedOb scip:occurredIn ?obligationTime}.
to derive the obligations applicable to the class of data items    14     OPTIONAL {?witness scip:attestsViolation @ob}.
or data subjects that they want to access. This task can be        15     FILTER (((!bound(?performAgent) && !bound (?accessTime))
                                                                   16     ||(bound (?accessTime) && (xsd:integer(@currentTime) < =
accomplished through computation of transitive closures for        17     fn:max((xsd:integer(?accessTime) + xsd:integer(?occGap) + xsd:
the rdfs:subClassOf relationship. Norms in the SCIP ontology                    integer (?pD)),
are defined in terms of data items, roles of participants who      18     (xsd:integer(?accessTime) + xsd:integer(?occGap)))))) &&
                                                                   19     (!bound(?witness))) }
want to use data items, purpose of usage, and requested
access privilege. All these concepts are expressed in SCIP by
a lattice using rdfs:subClassOf. For example children under        Figure 12: Evaluating the fulfilment of an individual
12 are rdfs:subClassOf data subjects. Therefore, a SPARQL          obligation
query with the RDFS reasoning support allows to match the
context of a set of privacy policies with the context of an          Data: Access request: rq, currentTime: t
access request. The query conditions check that all instances        Result: Boolean Compliance value for rq
of data items, data subjects, roles, privacy privileges asked      1 OB ← set of derived obligations for rq ;
                                                                   2 φ ← propositional formula for rq;
by the research teams in the access request graph, can be
                                                                   3 foreach obi ∈ OB do
subsumed by the corresponding items in the privacy policies        4     oi ← answer of (SPARQL ASK obligation query (Fig.
graph. Then the output of the query will be applicable                   12));
obligations to that access request. The method has been            5     Substitute obi in φ with oi ;
                                                                   6 end
described in more details in our earlier publication ([23]-        7 Substitute φ in Compliance Ask Query;
Section 3).                                                        8 C ← answer of (SPARQL ASK compliance query (Fig. 13));
                                                                   9 return (C)

Compliance Checking. An important audit task is to                      Algorithm 1: An algorithm for compliance checking
identify, at any given point in time, if an access request is in
compliance with the applicable privacy policies. Compliance
of an access request is decided based on the status of its
                                                                   able that will be used in the expression in line 4 to evaluate
corresponding obligations. Therefore, a typical compliance
                                                                   the access request compliance queries. The FILTER statement
checking task will be performed in three steps as illustrated in
                                                                   is the conjunction of ϕ and ?accessDecision meaning that if
Algorithm 1. First multiple SPARQL ASK queries evaluate
                                                                   the access decision logged by the access control mechanism
the status of all individual obligation and return true for an
                                                                   is false even if all obligations are fulfilled the access request
obligation if it is fulfilled and false otherwise. The template
                                                                   would be non-compliant.
query shown in Fig. 12 can be used for evaluating the fulfil-
ment of an obligation after the parameter @ob is substituted
with the URI of an obligation. A similar template query can         1 ASK
                                                                    2 WHERE { ?response scip:responseTo @rq .
be used to evaluate a pending obligation (an obligation that        3   ?response scip:accessDecision ?accessDecision .
the conditions for its fulfillment not yet settled).                4   FILTER (@phi && xsd:boolean(?accessDecision)) }

For each access response a propositional formula will be
also logged indicating how the fulfilment of an individual         Figure 13: Evaluating an access request compliance
obligation contributes to the overall compliance of an access
request. In our scenario the formula is ϕ ≡ ob1 ∧ ob2 ∧ ob3 i.e.   A number of other compliance queries (e.g. which obligation
all three obligations must be fulfilled for the access request     is pending or which access request is not compliant at time t),
to be compliant. The second step in the algorithm is to            the experimental validation of the scalability of our solution,
substitute the propositional variable in ϕ with the truth-         and the practical benefits of our approach are described in
values representing the state of every derived obligation.         [23].
Each obi in this formula will be substituted with oi which
can be true or false depending on the evaluation of the query      4.     RELATED WORK
in Fig. 12.
                                                                   Our research study is inspired by the concept of information
                                                                   accountability as described by Witzner et al. [30], that is
The third step in the algorithm is to substitute ϕ as a propo-
                                                                   ensuring whether the policies and configured preferences that
sitional variable and evaluate the template query in Fig. 13
                                                                   govern the flow of personal information, are respected by
to check the overall compliance of the corresponding access
                                                                   the parties that collect, use, and share users’ data. In an
request. Note that in line 3 of the query in Fig. 13, we
                                                                   early work on the management of policies and the seman-
include the graph encoding the access decision of the access
                                                                   tic web [14], Kolovski et al. emphasize on the need for a
request. The ?accessDecision variable is a propositional vari-
                                                                   declarative access policies to support scalable information
sharing among parties. The authors then propose a rule-           privacy concerns in the emerging domains of linked data
based discretionary access control language for the web. In       applications, Speiser et al. [24] propose a privacy framework
[13], Kagal et al. propose Rein, a policy framework grounded      for policy specification and access control enforcement.While
in semantic web technologies. The authors acknowledge and         access control is a necessary mechanism to protect individu-
respect the diversity and heterogeneity of policy languages       als’ privacy, it is not sufficient to express and control data
on the web and propose Rein as an ontological framework           usage policies. The work introduced in this paper addresses
for policy interoperability. The ontology proposed in this        privacy concepts such as usage purposes and obligations after
paper supports information accountability via privacy audit       access.
logs and complements the Rein proposal [13] by providing
a SPARQL query based solutions for the basic compliance           5.   CONCLUSIONS
checking queries.                                                 While compliance auditing is mandated in different privacy
                                                                  legislation (e.g. [26, 21]), it has received less attention from
There are solid theoretical foundations for policy auditing       the research community. In this paper we continued our work
over logs [2, 9, 3, 5]. Barth et al. use Alternating-time         in [23] and showed that regardless of what logic is used to
Temporal Logic to build a logical privacy model and design        express privacy policies there is a standard way for privacy
a privacy language (LPU) to express norms [2]. The concept        logging that allows basic privacy events to be logged and
of norms in this work has been adapted from the Contex-           provides a scalable query-based solution for answering com-
tual Integrity perspective [20]. The LPU language allows all      pliance queries. We also demonstrated that L2TAP Linked
communications between agents to be recorded in a logical         Data Log is capable of facilitating basic privacy auditing
trace. Norms are expressed as logical constraints and privacy     tasks such as: constructing the log, obligation derivation,
compliance is related to the logical concepts of satisfiability   and compliance checking in the big data and linked data
and entailment. Datta et al. [9] extended the LPU language        research context. In our approach, the convenience of Linked
with reasoning about information accountability over incom-       Data and RDFS has been sought for privacy log interoperabil-
plete logs. Basin et al. use metric first order temporal logic    ity and facilitating accountability and transparency among
(MFOTL) to express policies, which are then monitored to          participants.
verify whether the trace of actions satisfies desired temporal
properties [3]. Cederquist et al. describe a framework that       6.   ACKNOWLEDGMENTS
uses audit logs to enforce compliance with discretionary ac-
                                                                  Financial supports from the NSERC Canada and Privacy
cess control policies [5]. While this body of work propose
                                                                  Awards from IBM and the Information and Privacy Com-
highly expressive privacy logic, lack of support by an scalable
                                                                  missioner of Ontario are greatly acknowledged.We thank the
semantic technology prevents the approaches to be applied
                                                                  anonymous reviewers for their comments.
outside of research labs.

An important related work is the recently proposed RDF            7.   REFERENCES
provenance model (PROV-DM) [17]. The focus of PROV-DM              [1] J. Austin. How to do things with words, volume 88.
is on providing a domain independent ontology for asserting            Harvard University Press, 1975.
provenance of a resource on the web. While the provenance          [2] A. Barth, A. Datta, J. C. Mitchell, and H. Nissenbaum.
assertions of the L2TAP+SCIP log events (log event header)             Privacy and contextual integrity: Framework and
can be expressed using PROV-DM ontology, the ontology                  applications. In Proc. SP, pages 184–198, 2006.
cannot support the structure needed to encode the semantics        [3] D. Basin, F. Klaedtke, and S. Müller. Policy
of the body of privacy events (e.g. privacy preferences, obli-         monitoring in first-order temporal logic. In Proc. CAV,
gations, and purpose of usage). A simple mapping between               pages 1–18, 2010.
L2TAP and PROV-DM allows a log event (regardless of its            [4] J. Carroll, C. Bizer, P. Hayes, and P. Stickler. Named
content) to be expressed by the PROV-DM ontology. The                  graphs. Web Semantics: Science, Services and Agents
mapping requires adding a prov:Activity (i.e. defining a URI           on the World Wide Web, 3(4), 2011.
for the act of generating the l2tap:LogEvent as a prov:Entity).    [5] J. Cederquist, R. Corin, M. Dekker, S. Etalle, J. den
Then the assertion of the who, l2tap:eventParticipant, will            Hartog, and G. Lenzini. Audit-based compliance
be mapped to the prov:wasAssociatedWith property. The two              control. Int. J. of Info. Security, 6:133–151, 2007.
L2TAP properties capturing the when assertions are mapped          [6] A. Chuvakin, E. Fitzgerald, R. Marty, R. Gula,
to prov:startedAtTime and prov:endedAtTime respectively.               W. Heinbockel, and R. McQuaid. Common event
                                                                       expression, 2008.
In recent years, we have seen several proposals addressing         [7] L. Costabello, S. Villata, N. Delaforge, F. Gandon,
privacy in the Linked Data context ([22, 18, 7, 8]). This body         et al. Linked data access goes mobile: Context-aware
of research are mainly proposing access control frameworks             authorization for graph stores. In LDOW- WWW, 2012.
based on access control lists (ACLs). Authors in [22] propose      [8] L. Costabello, S. Villata, O. R. Rocha, and F. Gandon.
a privacy preferences vocabulary that can be utilized to ex-           Access control for http operations on linked data. In
press fine-grained access policies in Linked Data environment.         ESWC, pages 185–199, 2013.
Muhleisen et al. propose an access control mechanism for           [9] A. Datta, J. Blocki, N. Christin, H. DeYoung, D. Garg,
social web applications [18]. This framework uses SWRL to              L. Jia, D. Kaynar, and A. Sinha. Understanding and
express access rules. Authors in [12, 7, 8] leverage the Linked        protecting privacy: formal semantics and principled
Data architecture for providing authorizations and access              audit mechanisms. In Proc. ICISS, pages 1–27, 2011.
restrictions at the document level [12]. The authorization
                                                                  [10] A. L. Goldberger, L. A. Amaral, L. Glass, J. M.
mechanism in [12] is based on WebID [25]. To address the
                                                                       Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus,
     G. B. Moody, C.-K. Peng, and H. E. Stanley.                 Information accountability. Commun. ACM,
     Physiobank, physiotoolkit, and physionet components         51(6):82–87, 2008.
     of a new research resource for complex physiologic
     signals. Circulation, 101(23):e215–e220, 2000.
[11] T. Heath and C. Bizer. Linked data: Evolving the web
     into a global data space. Synthesis Lectures on the
     Semantic Web: Theory and Tech., 1(1):1–136, 2011.
[12] J. Hollenbach, J. Presbrey, and T. Berners-Lee. Using
     RDF metadata to enable access control on the social
     Semantic Web. In Proc. CCMLSK WS at CK, 2009.
[13] L. Kagal, T. Berners-Lee, D. Connolly, and
     D. Weitzner. Using semantic web technologies for
     policy management on the web. In Proc of the National
     Conference on Artificial Intelligence, 2006.
[14] V. Kolovski, Y. Katz, J. Hendler, D. Weitzner, and
     T. Berners-Lee. Towards a policy-aware web. In
     Semantic Web and Policy Workshop at the ISWC,
     2005.
[15] S. Loosemore, R. Stallman, R. McGrath, A. Oram, and
     U. Drepper. The GNU C library reference manual. Free
     software foundation, 2001.
[16] G. B. Moody and L. Lehman. Predicting acute
     hypotensive episodes: The 10th annual
     physionet/computers in cardiology challenge. In
     Computers in Cardiology, pages 541–544. IEEE, 2009.
[17] L. Moreau and P. Missier. PROV-DM: The PROV data
     model. W3C Recomm., W3C, June 2012.
[18] H. Mühleisen, M. Kost, and J.-C. Freytag.
     SWRL-based Access Policies for Linked Data. In Proc.
     SPOT Workshop at SSW, 2010.
[19] Q. Ni, E. Bertino, and J. Lobo. An obligation model
     bridging access control policies and privacy policies. In
     Proc. SACMAT, pages 133–142, 2008.
[20] H. Nissenbaum. Privacy in Context: Technology,
     Policy, and the Integrity of Social Life. Stanford Law
     Books, 2009.
[21] Official Journal of the EC. EU directive 95/46/EC on
     the protection of individuals rights with regard to the
     processing of personal data, 1995.
[22] O. Sacco and A. Passant. A privacy preference ontology
     (PPO) for Linked Data. In Proc. LDOW, WWW, 2011.
[23] R. Samavi and M. P. Consens. L2TAP+SCIP: An
     audit-based privacy framework leveraging Linked Data.
     In CollaborateCom (TrustCol), pages 719–726, 2012.
[24] S. Speiser. Policy of composition? composition of
     policies. In Proc. POLICY, pages 121 –124, 2011.
[25] H. Story, B. Harbulot, I. Jacobi, and M. Jones.
     FOAF+SSL: RESTful Authentication for the Social
     Web. In Proc. SPOT, 2009.
[26] US Congress. Health Insurance Portability and
     Accountability Act of 1996, Privacy Rule. 45 CFR 164,
     Aug. 2002.
[27] US Department of Health and Human Services. Code of
     Federal Regulations, Title 45 - Part 46 - Protection of
     Human Subject, Revised January 15, 2009.
[28] W3C. RDF Vocabulary Description Language 1.0: RDF
     Schema. W3C, April 2002.
[29] W3C. SPARQL 1.1 Query Language, W3C Proposed
     Recommendation. W3C, November 2012.
[30] D. J. Weitzner, H. Abelson, T. Berners-Lee,
     J. Feigenbaum, J. Hendler, and G. J. Sussman.