Data policy as activity network © Vasily Bunakov Science and Technology Facilities Council, Harwell Campus, United Kingdom vasily.bunakov@stfc.ac.uk Abstract. The work suggests using a network of semantically clear interconnected activities for a formal yet flexible definition of policies in data archives and data infrastructures. The work is inspired by needs of EUDAT Collaborative Data Infrastructure and the case of long-term digital preservation but the suggested policy modelling technique is universal and can be considered for all sorts of data management that require clearly defined policies linked to machine-executable policy implementations. Keywords: data management, long-term digital preservation, data policy, semantic modelling. 1 Introduction of this gap and do make efforts to close it by working on data policies implementation. An example of such e- Problematics of advanced long-term digital infrastructure is EUDAT [3] that has developed a number preservation [1] has been in focus of many collaborative of operational services [4] and data pilots with user projects and popular recommendations. However, it has communities, and is now trying to express and apply been paid a relatively small attention in domain-specific policies to these services. projects that rely on data archiving, or in projects that The prime candidate for applying data policies in develop scalable e-infrastructures aggregating data that EUDAT is B2SAFE service [5] based on iRODS comes from different user communities. platform [6]. B2SAFE developers are doing a very good One of the problems that long-term digital job on building geographically and organizationally preservation aims to address is having clear policies for distributed data storage with data replication, integrity the entire data lifecycle from data ingestion by archive or checks and other routine tasks of data management by e-infrastructure service, through years-long data guided by iRODS machine-executable rules. B2SAFE management with sensible data checks, transformations have made their own effort on policies with the and moves, to data access and data dissemination to the development of Data Policy Manager [7] which is a end users. software module with policies expressed via XML One can argue that without clear data policies and templates. There is a perceived need though of having a means of their validation there is no such a thing as the more universal solution for policy management across all long-term digital preservation, even in cases when a EUDAT services. The possible policy modelling technology foundation used for an archive or an e- approaches under consideration are using RuleML[8], infrastructure is sound and well-supported. At the end of SWRL[9] or ProvOne ontology[10] which seems suitable the day, every technology evolves – and at a brisk pace not only for capturing data provenance after the compared to relatively long time when many data assets execution of certain actions but also for the forward- are going to be useful, so data policies and means of their looking design of data processing workflows which can expression should be semantically clear and in a way then potentially serve as a means of data policy more permanent than technology that underpins data modelling. management. A strong case for policy-driven digital This work presents an alternative approach to those preservation, with extensive references to the prominent mentioned and is based on Research Activity Model [11] projects and popular methodologies was made in [2]. which is in fact quite universal and suitable for the In practice, quite a few data archives and e- expression of all sorts of activities, not necessarily related infrastructures end up in a situation when they have got a to research. Research Activity Model is slightly extended sound technology for managing data bits, also acquire a and applied to the case of data policy modelling. decent number of users (which is a popular measure used The main advantage of this alternative approach is its by funders for their judgement on the e-infrastructure high modularity which allows modeling policy elements success) but do not have a reasonable data policy, let and using them as building blocks for the semantically alone any machine-assisted reasoning over it. The users’ clear representation of a whole policy. The modularity of trust in the archive or the e-infrastructure may be enough policy design is especially important in data for their daily use but there can be a substantial infrastructures that commonly aggregate data coming conceptual and technological gap in regards to data from different user communities, often having their own policies formulation, expression and execution. business models, technical requirements, data formats Some larger projects and e-infrastructures are aware and data lifecycles which makes it difficult to design and adequately express the crosswalks between community- Proceedings of the XIX International Conference specific data policies and those for the data infrastructure. “Data Analytics and Management in Data Intensive Another advantage of the suggested approach is its ability Domains” (DAMDID/RCDL’2017), Moscow, Russia, to address the conceptual gap between policy formulation October 10–13, 2017 and policy implementation, as it may not be easy to 79 translate a high-level policy (often in a textual form) into statements. These granular statements which can be machine-executable policy. converted, in a pretty straightforward way, in machine- The modularity should allow high levels of executable statements are called control policies in inheritance and reuse of policy elements; it also helps to SCAPE. Examples of control policies are: “information solve specific problems of policy formulation and on preservation events should use the PREMIS metadata validation when textually the same policy can be schema” or “original object creation date must be executed in different ways leading to different states of captured”. The granular control policies relate to a data archive, for which situation we provide an example. higher-level procedural policy (a procedural policy on The conceptual gap between policy formulation and Provenance for the current example) which in turn relates policy implementation is addressed by a possibility to to an even higher-level and most abstract guidance policy define policy-related Activities as “black boxes” with (a policy on Authenticity for the current example). Three- (initially) only interfaces defined; this can be hopefully level structure of guidance policies, procedural policies done by policy makers themselves without entirely and control policies constitute a very well developed delegating this policy design phase to policy SCAPE digital preservation policy framework. implementers (software developers). SCAPE stopped short of the actual implementation of Implementation of a sensible data policy is a control policies, so when EUDAT [3] decided to use the challenging task even within the boundaries of a SCAPE framework for policy considerations, it was also particular organization. In a situation when the decided to supplement this framework with the catalogue organization is using a collaborative data infrastructure of practical data policies [14] developed by an RDA along with its own organization-specific IT services, the (Research Data Alliance) Practical Policy Working implementation of a data policy is going to be even more Group. The practical data policies in this catalogue are intricate and is likely to rely on loosely coupled services. expressed as iRODS [6] functons specifically suitable for An approach to data policy modelling suggested in this implementation in EUDAT B2SAFE service [5] based on work is going to address this challenge, along with the iRODS platform. alleviation of the earlier mentioned problems of the Having well-defined control policies or practical policy elements reusability and the policy application policies is not enough though for semantically clear results predictability. modelling of a data policy as a whole, as the application The work is inspired by needs of EUDAT (execution) of a policy composed of granular machine- Collaborative Data Infrastructure [3] and refers to it for executable statements may lead to quite different illustration of certain ideas, also the main incentive for outcomes depending on the order in which granular the work was modelling policies for the case of long-term policies are applied. digital preservation. However, the suggested modelling The problem of policy decomposition is in fact technique is universal and can be considered for all interrelated with the problem of policy validation. To archives or e-infrastructures that are interested in all sorts illustrate this, let us consider a simple case when there is of data management (not only long-term digital a couple of easily identifiable policy statements preservation) that require a clearly defined policy linked contained in the same policy document which we want to to machine-executable policy implementations. decompose and validate through execution of two Conceptual challenges of data policy modelling are granular policies. Let the statements in a composite discussed first, specifically the problem of policy policy (perhaps, but not necessarily so, added one to decomposition into policy elements, then an example is another through some policy update by different policy given of how Activity Model can be used for policy managers) be: modelling. This is followed by suggestions on what IT [1] Image files having size of more than X gigabytes architecture for data policy management will be required should be stored in file storage A; otherwise they to support the suggested modelling techniques. should be stored in file storage B. [2] Image files of type RAW should be converted in JPG 2 Data policy and a problem of its format. decomposition If a certain file of type RAW is more than X gigabytes in size but becomes less than X when converted in JPG 2.1 Insufficiency of granular policy definition then, depending on the higher-level guiding policy and Data policy is often created as a conventional textual on the order in which these granular policies are applied document that contains certain statements about what in the actual service implementation, the result of the should or should not be done with data, with implied or combined application of the two granular policies can be sometimes explicit logical “ANDs” and “ORs” that glue any of the following: statements together in an aggregated policy. This 1. File is moved as RAW in storage A and remains composite nature of policies is why it seems natural to stored in A as RAW. break down the policy document into granular 2. File is moved as RAW in storage A then statements, model each statement using some formalism converted in JPG and remains stored in A. and then execute the statements using some IT solution. 3. File is converted in JPG and stored in B. One of the most advanced efforts on data policy 4. File is moved as RAW in storage A and remains decomposition was performed by SCAPE project [12] stored in A as RAW; also a copy of it converted that created an extensive catalogue of preservation policy in JPG is stored in B. elements [13] which are in fact granular textual This is to illustrate that validation of the data policy 80 implementation is hard as any of the listed outcomes may is used for the expression and, where necessary, be considered being right or wrong depending on the recomposition of granular policies (policy elements) and validator’s point of view. for their assembling in the whole, with that formalism Also let us take into account that policy validation can being reasonably friendly to machines as well as to be based on some statistical selection of samples (so that humans. The humans – policy managers themselves or a problematic boundary cases of RAW data sized only not-so-skilled modeller – can use the formalism for a slightly over X gigabytes threshold may not be selected flexible policy definition that can be fairly easily in a sample and hence go unnoticed), or that a policy modified depending on the true policy intentions and on validation procedure allows some tolerance towards the feedback received from the archive or e-infrastructure small amount of failed policy checks (so that even if a where the policy is implemented. The role of software few files have ended up somewhere that a particular developers is then to implement an engine for the policy interpretation considers to be a wrong place, this formalism (quite similarly to the second approach). The does not trigger a policy violation alert). machine just executes the policy expressed using that So even if the data policy can be, seemingly formalism. successfully, decomposed into granular policies that are The differences amongst approaches are presented in easy to define and validate as machine-executable Table 1; in essence, they are different “weights” statements, the actual result of the policy implementation (different levels of demand) for the skills of policy does not necessarily match the intentions of policy managers, policy modellers and policy implementers. designers or policy managers, as the backwards process Table 1 Differences amongst policy modelling of the policy composition – assembling it from the approaches granular policies (policy elements) – can be performed with substantial variations. Policy Demands Demands Demands 2.2 Possible responses to the challenge of granular modelling for policy for policy for policy policies insufficiency approach manager modeller implemen- skills skills ter skills One possible response to the outlined challenge could be setting up an elaborated policy governance None Policy framework, i.e. well-defined business processes that (policy allow human agents (policy managers) to look after the governance modeler policy implementation, i.e. accumulate and analyse framework can be feedback from the environment where the policy is + replaced applied and supply the result of this analysis as updated requirement High by High requirements to software developers who work on the s business actual software implementation of the policy. This managemen analyst approach requires a good organizational culture and a t + specific or/and substantial human resource involved in data policy software software management and in policy implementation; documented testing tester) requirements will serve as an interface between policy managers and policy implementers. Some “magic” Policy should happen in between so that high-level policy modelling Low High Medium definitions translate into actual policies implementation language in software code, this is why policy validation is likely to demand extensive software testing with specific policy- Formalism related test cases. for granular Another possible response is having an elaborated policy means of expression for the entire data policy (a elements Medium Medium Medium sophisticated policy modelling language): both for the definition definition of granular policies and for the definition of and logic than binds the granular policies into the whole. An composition example of this approach is RuleML [8] that is considered a candidate for a detailed expression of data policy in EUDAT e-infrastructure [3]. This approach The preferable approach could easily be the third one requires skilled human resource for policy modelling; the as it empowers policy modelers themselves with modeler and a sophisticated model produced by her reasonable means of policy expression and therefore can becomes then an interface between policy managers and reduce overheads and risks of communicating a policy policy implementers (the role of the latter is less from policy managers through modelers to implementers. prominent than in the first approach, in a sense that A remote analogy of the third approach could be the software developers should not interpret requirements proliferation of SQL language that, despite its but just implement – or adopt – a certain engine that sophistication, has become a lingua franca of not only executes formal rules defined by the savvy policy software engineers but is widely used by logistics and modeller). even sales departments is all sorts of business. The third possible response is that a certain formalism The formalism to be used for data policy expression 81 should not be something as developed as SQL though, The full RDF serialization of the Activity Model is neither should it be purely textual: it can be based on the published in [11]; it is really simple and requires only idea of “building blocks” with possible graphical RDF Schema and an “inverseOf” OWL statement for its representation of them, hence providing an easy-to- expression, i.e. what is often referred to as RDFS Plus. operate semantic wrapper for machine-executable Table 2 Activity Model aspects explained statements. On the other hand (unlike SQL which allows the actual data manipulation), these “building blocks” for Examples data policy definition are likely to remain only a wrapper Aspect Description to the actual machine-executable implementations of Research per Research data granular policies which will be inevitably specific to a se analysis particular service even within the same archive or e- infrastructure. As an example, for EUDAT B2SAFE [5] Something that Previous Raw data that is based on iRODS platform [6] these granular Input is taken in or research implementations can be iRODS functions and for other EUDAT services based on other software platforms the operated on by policy implementations can be something else. A Activity common semantic wrapper will be then a reasonable Something that Raw data Derived means of a clear policy modelling and a clear definition of interfaces between policy “building blocks” across a is intentionally (analyzed) Output variety of different IT services. produced by data This work strongly prefers the third approach and Activity suggests considering Activity Model [11] for semantically clear modelling of data policies in all IT Something that Sample One or more services within the same data archive or e-infrastructure, Scope Activity is properties experiments as well as for policy interoperability across different data aimed at or archives and e-infrastructures. deals with 3 Activity Model as a semantic wrapper for Something that Scientific IT machine-executable policies affects or instrument environment 3.1 Activity Model in a nutshell supports Condition Activity, or Activity Model [11] was initially suggested for modelling granular research activities and combining gives it a them in networks so that, as an example, the output of specific one Activity can be the input of another one, e.g. these context combined Activities may represent certain phases in research data analysis. It has been clear though that Something or Investigator Data analyst Activity Model can suit all sorts of activities as it is pretty Actor somebody who generic; as an example, it may well suit for modelling participates in data provenance across different IT services within e- Activity infrastructure. The main “building block“ of the Activity Model is Something that Environment New software an “activity cell” represented by Figure 1 with its aspects Effect is a pollution module (that can be thought of as incoming and outcoming consequence relations) explained in Table 2. of Activity Activity “cells” can be combined in chains or networks, and not necessarily in a way that the Output of one Activity is the Input to another. As an example, a data management policy can be the Output of one Activity (policy design) and the Condition that affects another Activity, e.g. data replication in the archive. The model flexibility when any aspect of one Activity can be matched with any aspect of another Activity is supported by the fact that aspects do not have to have types associated with them. 3.2 Proposed extensions of the Activity Model In order to use Activity Model for data policy Figure 1 Research activity “cell”; it can be used for modelling, we will need to make a profile of the model semantic definition of any activity by specifying certain types of Activity as subclasses (in 82 case of an RDF serialization of the model – RDFS (software platform where policies are executed), other subclasses). Suggested extensions are presented in Table parts of the Activity Model, e.g. its Inputs, Outputs, or 3. Conceptually, Generic Data Management Activities Conditions may require additional semantically clear should cover the needs of data engineering that are extensions. However, it is unclear at the moment whether related to machine-interpretable policy implementations, these potentially required extensions should be a part of Logical Switch Activities should cover the needs of data the universal Activity Model profile for data policies, or analysis and machine-assisted reasoning, and Control it is better to introduce them as necessary, as parts of Activities should cover the needs of IT services policy execution engine implementations on particular deployment and operation. software platforms. Compared to modelling data policies with workflows, 3.3 Examples of the Activity Model data policies the suggested approach based on the definition of policy- profile application related Activities should allow more loosely coupled implementations of policy management IT solutions. As The role of the suggested model extensions will be an example, the “data engineering” part of policy clearer by giving an example of their application to the implementation represented by Generic Data modelling of a particular policy. The example will be a Management Activity can be performed on a software policy with two granular statements about data platform fully controlled by a specific user community or movements depending on data size and data format that organization (e.g. a research institution), the operation were considered in Section 2.1. (the actual execution of control statements) represented We will need to define first a File Characterization by Control Activity can be performed by collaborative Activity: data infrastructure (e.g. by EUDAT CDI [3]) and the @prefix am: logic of combining policy elements represented by . Logical Switch Activity can be performed by either the @prefix ampp: organization or the data infrastructure, or by a third-party . GDMA_FileChar a service. ampp:GenericDataPolicyActivity If the policy was modelled by an executable GDMA_FileChar am:hasInput File workflow, it would require the presence of all three GDMA_FileChar am:hasOutput FileSize GDMA_FileChar am:hasOutput FileFormat aspects: data engineering, reasoning and execution – in GDMA_FileChar am:hasOutput File the same workflow likely operated by a single universal GDMA_FileChar am:hasScope ImageFiles workflow engine. This would mean not only an GDMA_FileChar am:hasCondition operational limitation but a conceptual / modelling ServiceInstance GDMA_FileChar am:hasActor CertainScript limitation, too, as all the participants (stakeholders) of GDMA_FileChar am:hasEffect FileCharLog policy implementation would have to adhere to the conceptual framework and the format required by the In short, GDPA_FileChar activity takes a file as an workflow engine. Modeling with interconnected input and produces values for the file size and file Activities as semantic wrappers to particular format (which can be semantically clearly defined as implementations leaves more freedom to conceptualize necessary – e.g. with measurement units and and to operate data policies that are going to be executed format IDs in a file type registry) as outputs; the initial by loosely coupled IT services. file is passed over as another output. To derive the file size and format, the activity uses CertainScript Table 3 Additions to the core Activity Model required (which again can be semantically clearly defined as for data policy modelling necessary – e.g. with references to a software repository). Type to add Comment / Description Generic Data Subclass of Activity for data As an additional outcome (better defined not as Output Management policy definition. It can be but as Effect) of the file characterization activity, we Activity considered a semantic wrapper get theFileCharLog log file. The scope of activity is for a variety of data handling defined as ImageFiles (so that other kinds of files Activities, e.g. Activities for can be handled by differently defined data characterization or data Characterization Activities; what “ImageFiles” transformation. actually means can be clearly defined with e.g. a reference to a certain taxonomy entry). The Condition Logical Switch Subclass of Activity for logical is defined as ServiceInstance (which means that Activity switches of all sorts Actor:CertainScript operates in some particular IT service environment). Control Activity Subclass of Activity for an Mapping of Activity to a particular software interface with a particular implementation can be performed using Activity ID software platform where and a reference to a repository with a clear software policies are executed. This is a identity, e.g. a software versioning repository. semantic wrapper for the actual call to a platform-specific The graphic representation of this Characterization script or function. Activity (which, in the ideal world, can be designed in a certain authoring tool with graphical user interface Depending on a particular operational environment and producing the above RDF as a serialization) is illustrated by Figure 2. 83 platform. Alternatively, rules modelling language or workflow templates (and appropriate engines for them) can be used – yet, in this case, the actual usage of these modelling languages or workflow templates would be limited to the policy logic enwrapped in the Logical Switch Activity, allowing freedom for different implementations of other types of Activities involved in the policy definition. How to express control statements in the Output is subject to particular implementations, too. The only consideration which is important for the moment – important both from conceptual and from implementation perspectives – is having the list of control statements as a clearly defined interface between Logical Switch Activity and Control Activity. Figure 2 Definition of a Data Policy Activity for image Control Activity takes the list of control statements as files characterization Input and makes platform-specific function or procedure or script calls that implement the control statements. The problem of the policy composition out of two Actors for Control Activity are particular functions / granular policies outlined in Section 2.1 can be addressed procedures / scripts and the Effects of it are log and error with the help of other classes of activities that we files or messages – whatever is used for traceability in a introduced earlier: Logical Switch and Control. For the particular implementation. Condition is, similarly to the sake of simplicity (as we are going just to illustrate it how file characterization activity definition, a particular the policy modelling can be done) we will not be defining software platform or IT service where Actors operate. all aspects for these activities, e.g. we can omit Scope or Figure 4 presents an example of a diagrame for the Effect but they may be required in a real policy modelling Control Policy. situation. The Logical Switch activity will take File, FileSize and FileFormat as Inputs, a particular logic of handling file moves to either storage A or B, as well as file conversion, will be Condition. The Activity yields a list of particular control statements (like “move File to storage A”, “Convert file in JPG format”) as Output. The shape of such defined Logical Switch activity is illustrated by Figure 3. Figure 4 Definition of a Control Activity for policy execution Generic Data PolicyActivities (such as data Figure 3 Definition of a Logical Switch Activity for characterization) can be combined with Logical Switch handling image files Activities and Control Activities in a chain or a network of activities. For our example, the resulted chain is The semantically clear definition of a Logical Switch illustrated by Figure 5. It represents the full model of a Activity gives an idea of how we suggest to address the certain data policy expressed as a chain of semantically problem of a policy composition from granular policy clear activities with interfaces between them, as well as statements. The hope is, if the logic of producing control interfaces to activity implementations in particular IT statements is made explicit, as well as the control services or software platforms. statements themselves, this will eliminate the ambiguity It is worth mentioning once again that every aspect in of a policy composed of granular policy statements. the Figure 5 diagrame (such as File, Size, Format, Script A good question is what formalism, if any, will be or Log) should be thought of not as a particular artefact adequate for the expression of logic in the Condition of or a value but as a semantic wrapper of an artefact or a the Logical Switch. The short answer is: it depends on the value. As a particular model serialization, these semantic policy engine implementation. In an extreme case, this wrappers can be RDF statements about artefacts or Condition can be just a mandatory textual explanation values. (commentary) of the logic implemented by the Actor (which is omitted in the Figure 3), i.e. by an executable function or a procedure or a script for a particular IT 84 suggested approach and therefore such authoring tools should be a part of a sensible IT architecture for data policy management. In addition, what is required is a repository where policy designs can be stored and retrieved from. Figure 5 Example of full policy definition In real data policy modelling situations, it may be necessary to define more than one instance of each Activity type; as an example, there could be two Data Characterization Activities defined (one for the file size and another for the file format) in place of one in our example. Nevertheless, even differently defined Activities could be combined in a semantically clear network representing the same data policy. If Activities in Figure 5 are clearly defined and Figure 6 IT architecture for activity-based policy sensibly combined in the Activity network, this management eliminates any ambiguity in policy definition and execution exemplified by two interfering granular Activity network interpretation engine picks up policies discussed back in Section 2.1 so that the actual Activity network from the authoring tools or repository result of the policy implementation becomes predictable and executes them. In order to execute activity networks and can be formally validated. in a particular IT environment (software platforms and One of the strengths of the suggested model is a services), a mapping engine is required that maps combination of its reasonable expressivity with its high Activities and their aspects (such as Conditions or flexibility as it is based on the idea of composition of Outputs) to configuration files and executable scripts. activities that can be a) modelled differently b) In addition to this generic mapping engine, specific implemented differently and c) operated (executed) engines for logical conditions and control statements can differently. In the above example, scripts for file be implemented. Effects repository stores Effect aspects characterization and scripts for policy execution can be of each Activity; it is a generalization of logging service implemented using different software and operated by and contains semantically clear tracks of Activities different components of the same service, or by different execution. Policy search interface can be designed for services, or even by different e-infrastructures. searching and sharing data policies. The actual chain or network of activities, as well as For the purposes of data archive or data definition of each of them (i.e. definition of all semantic infrastructure audit, a policy validation engine is wrappers) could be done in a certain authoring tool with required that talks to policy search interface and to a graphic user interface and RDF as a model serialization Effects repository. The actual validation can be based on format. Development of such a tool has been beyond matching graphs of artefacts resulted from policies resources available for this conceptual work; however, execution with graphs of Activities in the policy design. such a tool is worth mentioning as one of the elements of an IT architecture that can support data policies 5 Conclusion formulation, execution and validation. The problem of data policy modelling with reasonable crosswalks between high-level (read: textual) 4 IT architecture for activity-based data policies and their machine-executable implementations policy management has yet to find a satisfactory solution. The challenges of policy design and implementation are even bigger when The proposed IT architecture is presented by Figure 6 collaborative data infrastructures are operated in with the most essential components and information combination with the in-house software platforms. flows (that would constitute a core operational platform The problem of semantically clear crosswalks and the for data policy management) designated as filled-in problem of data policy implementation across boxes and arrows; more advanced components and flows organization-specific and external IT services can be are designated as dashed boxes and arrows with a blank addressed by adoption of certain policy modelling background. techniques and tools. Activity Model [11] can be a As already suggested, having policy Activities reasonable means for the design of such tools, with the idea that data policies can be represented as networks of authoring tools with GUI and possibility to serialize Activity networks in a semantically explicit format such Activities with interconnected aspects of them. as RDF is essential for good levels of adoption of the This work has introduced extensions to the Activity Model in order to make it fit for the task of data policy 85 modelling. An example of using the Activity Model for [6] iRODS: Integrated Rule-Oriented Data System. the definition of a particular data policy has been given, https://irods.org/ and a possible IT architecture has been considered that [7] EUDAT Data Policy Manager. can support data policy management based on Activity https://github.com/EUDAT-B2SAFE/B2SAFE- networks. DPM Acknowledgements [8] RuleML Wiki pages. http://wiki.ruleml.org/index.php/RuleML_Home This work is supported by EUDAT 2020 project that [9] SWRL: A Semantic Web Rule Engine. receives funding from the European Union’s Horizon https://www.w3.org/Submission/SWRL/ 2020 research and innovation programme under the grant [10] ProvONE: A PROV Extension Data Model for agreement No. 654065. The views expressed are those of Scientific Workflow Provenance. the author and not necessarily of the project. http://vcvcomputing.com/provone/provone.html References [11] Bunakov, V. Core semantic model for generic research activity. In 15th All-Russian Conference [1] Giaretta, D. Advanced Digital Preservation. "Digital Libraries: Advanced Methods and Springer, Heidelberg (2011) technologies, Digital Collections" (RCDL 2013), [2] Bunakov, V., Jones, C., Matthews, B., Wilson, M. Yaroslavl, Russia, 14-17 Oct 2013, CEUR Data authenticity and data value in policy-driven Workshop Proceedings (ISSN 1613-0073) 1108, digital collections. OCLC Systems & Services: 79-84 (2013). Persistent URL: International digital library perspectives, vol. 30 http://purl.org/net/epubs/work/10938342 issue 4, pp. 212-231 (2014). doi: 10.1108/OCLC- [12] SCAPE: Scalable Preservation Environments 07-2013-0025. Open Access version of the preprint: project. http://scape-project.eu/ http://purl.org/net/epubs/work/12299882 [13] SCAPE Catalogue of Preservation Policy Elements. [3] EUDAT Collaborative Data Infrastructure. http://scape-project.eu/wp- https://www.eudat.eu/eudat-cdi content/uploads/2014/02/SCAPE_D13.2_KB_V1. [4] EUDAT services. https://www.eudat.eu/services- 0.pdf support [14] Practical Policy Implementations Report. [5] EUDAT B2SAFE service. http://dx.doi.org/10.15497/83E1B3F9-7E17-484A- https://www.eudat.eu/b2safe A466-B3E5775121CC 86