=Paper=
{{Paper
|id=Vol-2022/paper15
|storemode=property
|title=
Data Policy as Activity Network
|pdfUrl=https://ceur-ws.org/Vol-2022/paper15.pdf
|volume=Vol-2022
|authors=Vasily Bunakov
|dblpUrl=https://dblp.org/rec/conf/rcdl/Bunakov17
}}
==
Data Policy as Activity Network
==
Data policy as activity network
© Vasily Bunakov
Science and Technology Facilities Council, Harwell Campus,
United Kingdom
vasily.bunakov@stfc.ac.uk
Abstract. The work suggests using a network of semantically clear interconnected activities for a
formal yet flexible definition of policies in data archives and data infrastructures. The work is inspired by
needs of EUDAT Collaborative Data Infrastructure and the case of long-term digital preservation but the
suggested policy modelling technique is universal and can be considered for all sorts of data management that
require clearly defined policies linked to machine-executable policy implementations.
Keywords: data management, long-term digital preservation, data policy, semantic modelling.
1 Introduction of this gap and do make efforts to close it by working on
data policies implementation. An example of such e-
Problematics of advanced long-term digital infrastructure is EUDAT [3] that has developed a number
preservation [1] has been in focus of many collaborative of operational services [4] and data pilots with user
projects and popular recommendations. However, it has communities, and is now trying to express and apply
been paid a relatively small attention in domain-specific policies to these services.
projects that rely on data archiving, or in projects that The prime candidate for applying data policies in
develop scalable e-infrastructures aggregating data that EUDAT is B2SAFE service [5] based on iRODS
comes from different user communities. platform [6]. B2SAFE developers are doing a very good
One of the problems that long-term digital job on building geographically and organizationally
preservation aims to address is having clear policies for distributed data storage with data replication, integrity
the entire data lifecycle from data ingestion by archive or checks and other routine tasks of data management
by e-infrastructure service, through years-long data guided by iRODS machine-executable rules. B2SAFE
management with sensible data checks, transformations have made their own effort on policies with the
and moves, to data access and data dissemination to the development of Data Policy Manager [7] which is a
end users. software module with policies expressed via XML
One can argue that without clear data policies and templates. There is a perceived need though of having a
means of their validation there is no such a thing as the more universal solution for policy management across all
long-term digital preservation, even in cases when a EUDAT services. The possible policy modelling
technology foundation used for an archive or an e- approaches under consideration are using RuleML[8],
infrastructure is sound and well-supported. At the end of SWRL[9] or ProvOne ontology[10] which seems suitable
the day, every technology evolves – and at a brisk pace not only for capturing data provenance after the
compared to relatively long time when many data assets execution of certain actions but also for the forward-
are going to be useful, so data policies and means of their looking design of data processing workflows which can
expression should be semantically clear and in a way then potentially serve as a means of data policy
more permanent than technology that underpins data modelling.
management. A strong case for policy-driven digital This work presents an alternative approach to those
preservation, with extensive references to the prominent mentioned and is based on Research Activity Model [11]
projects and popular methodologies was made in [2]. which is in fact quite universal and suitable for the
In practice, quite a few data archives and e- expression of all sorts of activities, not necessarily related
infrastructures end up in a situation when they have got a to research. Research Activity Model is slightly extended
sound technology for managing data bits, also acquire a and applied to the case of data policy modelling.
decent number of users (which is a popular measure used The main advantage of this alternative approach is its
by funders for their judgement on the e-infrastructure high modularity which allows modeling policy elements
success) but do not have a reasonable data policy, let and using them as building blocks for the semantically
alone any machine-assisted reasoning over it. The users’ clear representation of a whole policy. The modularity of
trust in the archive or the e-infrastructure may be enough policy design is especially important in data
for their daily use but there can be a substantial infrastructures that commonly aggregate data coming
conceptual and technological gap in regards to data from different user communities, often having their own
policies formulation, expression and execution. business models, technical requirements, data formats
Some larger projects and e-infrastructures are aware and data lifecycles which makes it difficult to design and
adequately express the crosswalks between community-
Proceedings of the XIX International Conference specific data policies and those for the data infrastructure.
“Data Analytics and Management in Data Intensive Another advantage of the suggested approach is its ability
Domains” (DAMDID/RCDL’2017), Moscow, Russia, to address the conceptual gap between policy formulation
October 10–13, 2017 and policy implementation, as it may not be easy to
79
translate a high-level policy (often in a textual form) into statements. These granular statements which can be
machine-executable policy. converted, in a pretty straightforward way, in machine-
The modularity should allow high levels of executable statements are called control policies in
inheritance and reuse of policy elements; it also helps to SCAPE. Examples of control policies are: “information
solve specific problems of policy formulation and on preservation events should use the PREMIS metadata
validation when textually the same policy can be schema” or “original object creation date must be
executed in different ways leading to different states of captured”. The granular control policies relate to a
data archive, for which situation we provide an example. higher-level procedural policy (a procedural policy on
The conceptual gap between policy formulation and Provenance for the current example) which in turn relates
policy implementation is addressed by a possibility to to an even higher-level and most abstract guidance policy
define policy-related Activities as “black boxes” with (a policy on Authenticity for the current example). Three-
(initially) only interfaces defined; this can be hopefully level structure of guidance policies, procedural policies
done by policy makers themselves without entirely and control policies constitute a very well developed
delegating this policy design phase to policy SCAPE digital preservation policy framework.
implementers (software developers). SCAPE stopped short of the actual implementation of
Implementation of a sensible data policy is a control policies, so when EUDAT [3] decided to use the
challenging task even within the boundaries of a SCAPE framework for policy considerations, it was also
particular organization. In a situation when the decided to supplement this framework with the catalogue
organization is using a collaborative data infrastructure of practical data policies [14] developed by an RDA
along with its own organization-specific IT services, the (Research Data Alliance) Practical Policy Working
implementation of a data policy is going to be even more Group. The practical data policies in this catalogue are
intricate and is likely to rely on loosely coupled services. expressed as iRODS [6] functons specifically suitable for
An approach to data policy modelling suggested in this implementation in EUDAT B2SAFE service [5] based on
work is going to address this challenge, along with the iRODS platform.
alleviation of the earlier mentioned problems of the Having well-defined control policies or practical
policy elements reusability and the policy application policies is not enough though for semantically clear
results predictability. modelling of a data policy as a whole, as the application
The work is inspired by needs of EUDAT (execution) of a policy composed of granular machine-
Collaborative Data Infrastructure [3] and refers to it for executable statements may lead to quite different
illustration of certain ideas, also the main incentive for outcomes depending on the order in which granular
the work was modelling policies for the case of long-term policies are applied.
digital preservation. However, the suggested modelling The problem of policy decomposition is in fact
technique is universal and can be considered for all interrelated with the problem of policy validation. To
archives or e-infrastructures that are interested in all sorts illustrate this, let us consider a simple case when there is
of data management (not only long-term digital a couple of easily identifiable policy statements
preservation) that require a clearly defined policy linked contained in the same policy document which we want to
to machine-executable policy implementations. decompose and validate through execution of two
Conceptual challenges of data policy modelling are granular policies. Let the statements in a composite
discussed first, specifically the problem of policy policy (perhaps, but not necessarily so, added one to
decomposition into policy elements, then an example is another through some policy update by different policy
given of how Activity Model can be used for policy managers) be:
modelling. This is followed by suggestions on what IT [1] Image files having size of more than X gigabytes
architecture for data policy management will be required should be stored in file storage A; otherwise they
to support the suggested modelling techniques. should be stored in file storage B.
[2] Image files of type RAW should be converted in JPG
2 Data policy and a problem of its format.
decomposition If a certain file of type RAW is more than X gigabytes
in size but becomes less than X when converted in JPG
2.1 Insufficiency of granular policy definition
then, depending on the higher-level guiding policy and
Data policy is often created as a conventional textual on the order in which these granular policies are applied
document that contains certain statements about what in the actual service implementation, the result of the
should or should not be done with data, with implied or combined application of the two granular policies can be
sometimes explicit logical “ANDs” and “ORs” that glue any of the following:
statements together in an aggregated policy. This 1. File is moved as RAW in storage A and remains
composite nature of policies is why it seems natural to stored in A as RAW.
break down the policy document into granular 2. File is moved as RAW in storage A then
statements, model each statement using some formalism converted in JPG and remains stored in A.
and then execute the statements using some IT solution. 3. File is converted in JPG and stored in B.
One of the most advanced efforts on data policy 4. File is moved as RAW in storage A and remains
decomposition was performed by SCAPE project [12] stored in A as RAW; also a copy of it converted
that created an extensive catalogue of preservation policy in JPG is stored in B.
elements [13] which are in fact granular textual This is to illustrate that validation of the data policy
80
implementation is hard as any of the listed outcomes may is used for the expression and, where necessary,
be considered being right or wrong depending on the recomposition of granular policies (policy elements) and
validator’s point of view. for their assembling in the whole, with that formalism
Also let us take into account that policy validation can being reasonably friendly to machines as well as to
be based on some statistical selection of samples (so that humans. The humans – policy managers themselves or a
problematic boundary cases of RAW data sized only not-so-skilled modeller – can use the formalism for a
slightly over X gigabytes threshold may not be selected flexible policy definition that can be fairly easily
in a sample and hence go unnoticed), or that a policy modified depending on the true policy intentions and on
validation procedure allows some tolerance towards the feedback received from the archive or e-infrastructure
small amount of failed policy checks (so that even if a where the policy is implemented. The role of software
few files have ended up somewhere that a particular developers is then to implement an engine for the
policy interpretation considers to be a wrong place, this formalism (quite similarly to the second approach). The
does not trigger a policy violation alert). machine just executes the policy expressed using that
So even if the data policy can be, seemingly formalism.
successfully, decomposed into granular policies that are The differences amongst approaches are presented in
easy to define and validate as machine-executable Table 1; in essence, they are different “weights”
statements, the actual result of the policy implementation (different levels of demand) for the skills of policy
does not necessarily match the intentions of policy managers, policy modellers and policy implementers.
designers or policy managers, as the backwards process
Table 1 Differences amongst policy modelling
of the policy composition – assembling it from the
approaches
granular policies (policy elements) – can be performed
with substantial variations. Policy Demands Demands Demands
2.2 Possible responses to the challenge of granular modelling for policy for policy for policy
policies insufficiency approach manager modeller implemen-
skills skills ter skills
One possible response to the outlined challenge could
be setting up an elaborated policy governance None
Policy
framework, i.e. well-defined business processes that (policy
allow human agents (policy managers) to look after the governance
modeler
policy implementation, i.e. accumulate and analyse framework
can be
feedback from the environment where the policy is +
replaced
applied and supply the result of this analysis as updated requirement
High by High
requirements to software developers who work on the s
business
actual software implementation of the policy. This managemen
analyst
approach requires a good organizational culture and a t + specific
or/and
substantial human resource involved in data policy software
software
management and in policy implementation; documented testing
tester)
requirements will serve as an interface between policy
managers and policy implementers. Some “magic” Policy
should happen in between so that high-level policy modelling Low High Medium
definitions translate into actual policies implementation language
in software code, this is why policy validation is likely to
demand extensive software testing with specific policy- Formalism
related test cases. for granular
Another possible response is having an elaborated policy
means of expression for the entire data policy (a elements Medium Medium Medium
sophisticated policy modelling language): both for the definition
definition of granular policies and for the definition of and
logic than binds the granular policies into the whole. An
composition
example of this approach is RuleML [8] that is
considered a candidate for a detailed expression of data
policy in EUDAT e-infrastructure [3]. This approach The preferable approach could easily be the third one
requires skilled human resource for policy modelling; the as it empowers policy modelers themselves with
modeler and a sophisticated model produced by her reasonable means of policy expression and therefore can
becomes then an interface between policy managers and reduce overheads and risks of communicating a policy
policy implementers (the role of the latter is less from policy managers through modelers to implementers.
prominent than in the first approach, in a sense that A remote analogy of the third approach could be the
software developers should not interpret requirements proliferation of SQL language that, despite its
but just implement – or adopt – a certain engine that sophistication, has become a lingua franca of not only
executes formal rules defined by the savvy policy software engineers but is widely used by logistics and
modeller). even sales departments is all sorts of business.
The third possible response is that a certain formalism The formalism to be used for data policy expression
81
should not be something as developed as SQL though, The full RDF serialization of the Activity Model is
neither should it be purely textual: it can be based on the published in [11]; it is really simple and requires only
idea of “building blocks” with possible graphical RDF Schema and an “inverseOf” OWL statement for its
representation of them, hence providing an easy-to- expression, i.e. what is often referred to as RDFS Plus.
operate semantic wrapper for machine-executable
Table 2 Activity Model aspects explained
statements. On the other hand (unlike SQL which allows
the actual data manipulation), these “building blocks” for Examples
data policy definition are likely to remain only a wrapper
Aspect Description
to the actual machine-executable implementations of Research per Research data
granular policies which will be inevitably specific to a se analysis
particular service even within the same archive or e-
infrastructure. As an example, for EUDAT B2SAFE [5] Something that Previous Raw data
that is based on iRODS platform [6] these granular
Input is taken in or research
implementations can be iRODS functions and for other
EUDAT services based on other software platforms the operated on by
policy implementations can be something else. A Activity
common semantic wrapper will be then a reasonable
Something that Raw data Derived
means of a clear policy modelling and a clear definition
of interfaces between policy “building blocks” across a is intentionally (analyzed)
Output
variety of different IT services. produced by data
This work strongly prefers the third approach and Activity
suggests considering Activity Model [11] for
semantically clear modelling of data policies in all IT Something that Sample One or more
services within the same data archive or e-infrastructure, Scope Activity is properties experiments
as well as for policy interoperability across different data aimed at or
archives and e-infrastructures. deals with
3 Activity Model as a semantic wrapper for Something that Scientific IT
machine-executable policies affects or instrument environment
3.1 Activity Model in a nutshell supports
Condition Activity, or
Activity Model [11] was initially suggested for
modelling granular research activities and combining gives it a
them in networks so that, as an example, the output of specific
one Activity can be the input of another one, e.g. these context
combined Activities may represent certain phases in
research data analysis. It has been clear though that Something or Investigator Data analyst
Activity Model can suit all sorts of activities as it is pretty Actor somebody who
generic; as an example, it may well suit for modelling participates in
data provenance across different IT services within e- Activity
infrastructure.
The main “building block“ of the Activity Model is Something that Environment New software
an “activity cell” represented by Figure 1 with its aspects Effect is a pollution module
(that can be thought of as incoming and outcoming consequence
relations) explained in Table 2. of Activity
Activity “cells” can be combined in chains or
networks, and not necessarily in a way that the Output of
one Activity is the Input to another. As an example, a data
management policy can be the Output of one Activity
(policy design) and the Condition that affects another
Activity, e.g. data replication in the archive.
The model flexibility when any aspect of one Activity
can be matched with any aspect of another Activity is
supported by the fact that aspects do not have to have
types associated with them.
3.2 Proposed extensions of the Activity Model
In order to use Activity Model for data policy
Figure 1 Research activity “cell”; it can be used for modelling, we will need to make a profile of the model
semantic definition of any activity by specifying certain types of Activity as subclasses (in
82
case of an RDF serialization of the model – RDFS (software platform where policies are executed), other
subclasses). Suggested extensions are presented in Table parts of the Activity Model, e.g. its Inputs, Outputs, or
3. Conceptually, Generic Data Management Activities Conditions may require additional semantically clear
should cover the needs of data engineering that are extensions. However, it is unclear at the moment whether
related to machine-interpretable policy implementations, these potentially required extensions should be a part of
Logical Switch Activities should cover the needs of data the universal Activity Model profile for data policies, or
analysis and machine-assisted reasoning, and Control it is better to introduce them as necessary, as parts of
Activities should cover the needs of IT services policy execution engine implementations on particular
deployment and operation. software platforms.
Compared to modelling data policies with workflows,
3.3 Examples of the Activity Model data policies
the suggested approach based on the definition of policy-
profile application
related Activities should allow more loosely coupled
implementations of policy management IT solutions. As The role of the suggested model extensions will be
an example, the “data engineering” part of policy clearer by giving an example of their application to the
implementation represented by Generic Data modelling of a particular policy. The example will be a
Management Activity can be performed on a software policy with two granular statements about data
platform fully controlled by a specific user community or movements depending on data size and data format that
organization (e.g. a research institution), the operation were considered in Section 2.1.
(the actual execution of control statements) represented We will need to define first a File Characterization
by Control Activity can be performed by collaborative Activity:
data infrastructure (e.g. by EUDAT CDI [3]) and the @prefix am:
logic of combining policy elements represented by .
Logical Switch Activity can be performed by either the @prefix ampp:
organization or the data infrastructure, or by a third-party .
GDMA_FileChar a
service. ampp:GenericDataPolicyActivity
If the policy was modelled by an executable GDMA_FileChar am:hasInput File
workflow, it would require the presence of all three GDMA_FileChar am:hasOutput FileSize
GDMA_FileChar am:hasOutput FileFormat
aspects: data engineering, reasoning and execution – in GDMA_FileChar am:hasOutput File
the same workflow likely operated by a single universal GDMA_FileChar am:hasScope ImageFiles
workflow engine. This would mean not only an GDMA_FileChar am:hasCondition
operational limitation but a conceptual / modelling ServiceInstance
GDMA_FileChar am:hasActor CertainScript
limitation, too, as all the participants (stakeholders) of GDMA_FileChar am:hasEffect FileCharLog
policy implementation would have to adhere to the
conceptual framework and the format required by the In short, GDPA_FileChar activity takes a file as an
workflow engine. Modeling with interconnected input and produces values for the file size and file
Activities as semantic wrappers to particular format (which can be semantically clearly defined as
implementations leaves more freedom to conceptualize necessary – e.g. with measurement units and
and to operate data policies that are going to be executed format IDs in a file type registry) as outputs; the initial
by loosely coupled IT services. file is passed over as another output. To derive the file
size and format, the activity uses CertainScript
Table 3 Additions to the core Activity Model required
(which again can be semantically clearly defined as
for data policy modelling
necessary – e.g. with references to a software repository).
Type to add Comment / Description
Generic Data Subclass of Activity for data As an additional outcome (better defined not as Output
Management policy definition. It can be but as Effect) of the file characterization activity, we
Activity considered a semantic wrapper get theFileCharLog log file. The scope of activity is
for a variety of data handling defined as ImageFiles (so that other kinds of files
Activities, e.g. Activities for can be handled by differently defined
data characterization or data Characterization Activities; what “ImageFiles”
transformation. actually means can be clearly defined with e.g. a
reference to a certain taxonomy entry). The Condition
Logical Switch Subclass of Activity for logical is defined as ServiceInstance (which means that
Activity switches of all sorts Actor:CertainScript operates in some particular IT
service environment).
Control Activity Subclass of Activity for an
Mapping of Activity to a particular software
interface with a particular
implementation can be performed using Activity ID
software platform where
and a reference to a repository with a clear software
policies are executed. This is a
identity, e.g. a software versioning repository.
semantic wrapper for the actual
call to a platform-specific The graphic representation of this Characterization
script or function. Activity (which, in the ideal world, can be designed in
a certain authoring tool with graphical user interface
Depending on a particular operational environment and producing the above RDF as a serialization) is
illustrated by Figure 2.
83
platform. Alternatively, rules modelling language or
workflow templates (and appropriate engines for them)
can be used – yet, in this case, the actual usage of these
modelling languages or workflow templates would be
limited to the policy logic enwrapped in the Logical
Switch Activity, allowing freedom for different
implementations of other types of Activities involved in
the policy definition.
How to express control statements in the Output is
subject to particular implementations, too. The only
consideration which is important for the moment –
important both from conceptual and from
implementation perspectives – is having the list of
control statements as a clearly defined interface between
Logical Switch Activity and Control Activity.
Figure 2 Definition of a Data Policy Activity for image Control Activity takes the list of control statements as
files characterization Input and makes platform-specific function or procedure
or script calls that implement the control statements.
The problem of the policy composition out of two Actors for Control Activity are particular functions /
granular policies outlined in Section 2.1 can be addressed procedures / scripts and the Effects of it are log and error
with the help of other classes of activities that we files or messages – whatever is used for traceability in a
introduced earlier: Logical Switch and Control. For the particular implementation. Condition is, similarly to the
sake of simplicity (as we are going just to illustrate it how file characterization activity definition, a particular
the policy modelling can be done) we will not be defining software platform or IT service where Actors operate.
all aspects for these activities, e.g. we can omit Scope or Figure 4 presents an example of a diagrame for the
Effect but they may be required in a real policy modelling Control Policy.
situation.
The Logical Switch activity will take File, FileSize
and FileFormat as Inputs, a particular logic of handling
file moves to either storage A or B, as well as file
conversion, will be Condition. The Activity yields a list
of particular control statements (like “move File to
storage A”, “Convert file in JPG format”) as Output. The
shape of such defined Logical Switch activity is
illustrated by Figure 3.
Figure 4 Definition of a Control Activity for policy
execution
Generic Data PolicyActivities (such as data
Figure 3 Definition of a Logical Switch Activity for characterization) can be combined with Logical Switch
handling image files Activities and Control Activities in a chain or a network
of activities. For our example, the resulted chain is
The semantically clear definition of a Logical Switch illustrated by Figure 5. It represents the full model of a
Activity gives an idea of how we suggest to address the
certain data policy expressed as a chain of semantically
problem of a policy composition from granular policy clear activities with interfaces between them, as well as
statements. The hope is, if the logic of producing control interfaces to activity implementations in particular IT
statements is made explicit, as well as the control services or software platforms.
statements themselves, this will eliminate the ambiguity It is worth mentioning once again that every aspect in
of a policy composed of granular policy statements. the Figure 5 diagrame (such as File, Size, Format, Script
A good question is what formalism, if any, will be or Log) should be thought of not as a particular artefact
adequate for the expression of logic in the Condition of or a value but as a semantic wrapper of an artefact or a
the Logical Switch. The short answer is: it depends on the value. As a particular model serialization, these semantic
policy engine implementation. In an extreme case, this wrappers can be RDF statements about artefacts or
Condition can be just a mandatory textual explanation values.
(commentary) of the logic implemented by the Actor
(which is omitted in the Figure 3), i.e. by an executable
function or a procedure or a script for a particular IT
84
suggested approach and therefore such authoring tools
should be a part of a sensible IT architecture for data
policy management. In addition, what is required is a
repository where policy designs can be stored and
retrieved from.
Figure 5 Example of full policy definition
In real data policy modelling situations, it may be
necessary to define more than one instance of each
Activity type; as an example, there could be two Data
Characterization Activities defined (one for the file size
and another for the file format) in place of one in our
example. Nevertheless, even differently defined
Activities could be combined in a semantically clear
network representing the same data policy.
If Activities in Figure 5 are clearly defined and Figure 6 IT architecture for activity-based policy
sensibly combined in the Activity network, this management
eliminates any ambiguity in policy definition and
execution exemplified by two interfering granular Activity network interpretation engine picks up
policies discussed back in Section 2.1 so that the actual Activity network from the authoring tools or repository
result of the policy implementation becomes predictable and executes them. In order to execute activity networks
and can be formally validated. in a particular IT environment (software platforms and
One of the strengths of the suggested model is a services), a mapping engine is required that maps
combination of its reasonable expressivity with its high Activities and their aspects (such as Conditions or
flexibility as it is based on the idea of composition of Outputs) to configuration files and executable scripts.
activities that can be a) modelled differently b) In addition to this generic mapping engine, specific
implemented differently and c) operated (executed) engines for logical conditions and control statements can
differently. In the above example, scripts for file be implemented. Effects repository stores Effect aspects
characterization and scripts for policy execution can be of each Activity; it is a generalization of logging service
implemented using different software and operated by and contains semantically clear tracks of Activities
different components of the same service, or by different execution. Policy search interface can be designed for
services, or even by different e-infrastructures. searching and sharing data policies.
The actual chain or network of activities, as well as For the purposes of data archive or data
definition of each of them (i.e. definition of all semantic infrastructure audit, a policy validation engine is
wrappers) could be done in a certain authoring tool with required that talks to policy search interface and to
a graphic user interface and RDF as a model serialization Effects repository. The actual validation can be based on
format. Development of such a tool has been beyond matching graphs of artefacts resulted from policies
resources available for this conceptual work; however, execution with graphs of Activities in the policy design.
such a tool is worth mentioning as one of the elements of
an IT architecture that can support data policies
5 Conclusion
formulation, execution and validation. The problem of data policy modelling with
reasonable crosswalks between high-level (read: textual)
4 IT architecture for activity-based data policies and their machine-executable implementations
policy management has yet to find a satisfactory solution. The challenges of
policy design and implementation are even bigger when
The proposed IT architecture is presented by Figure 6 collaborative data infrastructures are operated in
with the most essential components and information combination with the in-house software platforms.
flows (that would constitute a core operational platform The problem of semantically clear crosswalks and the
for data policy management) designated as filled-in problem of data policy implementation across
boxes and arrows; more advanced components and flows organization-specific and external IT services can be
are designated as dashed boxes and arrows with a blank addressed by adoption of certain policy modelling
background. techniques and tools. Activity Model [11] can be a
As already suggested, having policy Activities reasonable means for the design of such tools, with the
idea that data policies can be represented as networks of
authoring tools with GUI and possibility to serialize
Activity networks in a semantically explicit format such Activities with interconnected aspects of them.
as RDF is essential for good levels of adoption of the This work has introduced extensions to the Activity
Model in order to make it fit for the task of data policy
85
modelling. An example of using the Activity Model for [6] iRODS: Integrated Rule-Oriented Data System.
the definition of a particular data policy has been given, https://irods.org/
and a possible IT architecture has been considered that [7] EUDAT Data Policy Manager.
can support data policy management based on Activity https://github.com/EUDAT-B2SAFE/B2SAFE-
networks. DPM
Acknowledgements [8] RuleML Wiki pages.
http://wiki.ruleml.org/index.php/RuleML_Home
This work is supported by EUDAT 2020 project that [9] SWRL: A Semantic Web Rule Engine.
receives funding from the European Union’s Horizon https://www.w3.org/Submission/SWRL/
2020 research and innovation programme under the grant
[10] ProvONE: A PROV Extension Data Model for
agreement No. 654065. The views expressed are those of
Scientific Workflow Provenance.
the author and not necessarily of the project.
http://vcvcomputing.com/provone/provone.html
References [11] Bunakov, V. Core semantic model for generic
research activity. In 15th All-Russian Conference
[1] Giaretta, D. Advanced Digital Preservation. "Digital Libraries: Advanced Methods and
Springer, Heidelberg (2011) technologies, Digital Collections" (RCDL 2013),
[2] Bunakov, V., Jones, C., Matthews, B., Wilson, M. Yaroslavl, Russia, 14-17 Oct 2013, CEUR
Data authenticity and data value in policy-driven Workshop Proceedings (ISSN 1613-0073) 1108,
digital collections. OCLC Systems & Services: 79-84 (2013). Persistent URL:
International digital library perspectives, vol. 30 http://purl.org/net/epubs/work/10938342
issue 4, pp. 212-231 (2014). doi: 10.1108/OCLC- [12] SCAPE: Scalable Preservation Environments
07-2013-0025. Open Access version of the preprint: project. http://scape-project.eu/
http://purl.org/net/epubs/work/12299882
[13] SCAPE Catalogue of Preservation Policy Elements.
[3] EUDAT Collaborative Data Infrastructure. http://scape-project.eu/wp-
https://www.eudat.eu/eudat-cdi content/uploads/2014/02/SCAPE_D13.2_KB_V1.
[4] EUDAT services. https://www.eudat.eu/services- 0.pdf
support [14] Practical Policy Implementations Report.
[5] EUDAT B2SAFE service. http://dx.doi.org/10.15497/83E1B3F9-7E17-484A-
https://www.eudat.eu/b2safe A466-B3E5775121CC
86