Semantic Digital Rights Management for
            Controlled P2P RDF Metadata Diffusion

                          Roberto García, Giovanni Tummarello

                   GRIHO – Human-Computer Interaction Research Group
                      Universitat de Lleida, Spain - roberto@griho.net
                    SEMEDIA – Semantic Web and Multimedia Group
                   http://semedia.deit.univpm.it - g.tummarello@gmail.com


       Since the early works in the W3C Semantic Web initiative, RDF has been
       generically indicated as a potential basis for legally binding exchange of
       semantically structured information. In this paper we introduce and detail a
       procedural framework that could support such legally binding exchange. The
       proposed methodology is based on a Copyright Ontology, a copyright
       conceptualisation which includes concrete rights expression languages like
       MPEG-21 REL, and RDF model decomposition based on the Minimum Self
       Contained Graph theory. The procedure seems particularly useful when applied
       to P2P semantic web scenarios.


1.   Introduction

The knowledge representation capabilities of RDF are agnostic with respect to the
content and the purpose for which it is used. Since the early works in the W3C
Semantic Web initiative, however, a few use cases stood out and among these there
was the idea that RDF might have been potential basis for legally binding exchange of
semantically structured information [1].
   In this paper we address a scenario which is becoming more and more common on
both the Semantic Web and in “Web 2.0” websites; information does not simply go
directly from the source to the intended destination. Instead, information is mashed
up, aggregated, filtered, republished, annotated, etc. This happens notably with RSS
feeds but more on the “Semantic Web”, with frameworks such as DBin [2] where
peers collect bits of RDF (related to resources of common interest) which can then be
redistributed either to other peers or web republished.
   Clearly however, not all data sources would in any case agree on uncontrolled use
and redistribution of their produced content. For example, a stock price web service
might be willing to provide real time information to a subscriber as long as “it is not
publicly redistributed before 10 minutes”. Similarly, in a DBin P2P RDF group, a
user might want to give information to other peers “as long as it is redistributed only
to those who have a verified @deit.univpm.it address”.
    In such scenarios, simple access control to the information sources (e.g. password
protected) does not suffice and a non machine readable licence (e.g. a fixed licence
that one has to agree with a “I understand the terms and condition” checkbox at sign
2    Roberto García, Giovanni Tummarello


up time) would not allow any automatic and dynamic handling of such information
distribution scenarios.
   The procedure we discuss in this paper addresses such needs and enables a source
peer (from here on source) to provide a piece of RDF to a receiving peer (receiver) in
a manner which could provide the technical basis for legal protection.


2.   The proposed exchange procedure: outline

In this section we describe the procedure by which the source provides RDF to the
receiver along with a licence which specifies how such information may be used. The
procedure involves multiple steps requires trust of the identity of the remote party, i.e.
the parties must know or have a way to track the legal identity of the creator of the
public key that will verify the signing of the licences. There are many ways by which
this can be achieved (e.g. via a certifying third party like for example Verisign) so the
discussion of these is outside the scope of this paper.
   For the rest of the discussion we will use the term cite to indicate a pointer to the
information, e.g. an URL. A non dereferenciable citation is a citation by the way of,
for instance, a digital hash: a receiver can check that it refers to the information just
when it has the information itself or via a third party. With the term quote we indicate
providing the information itself along with additional control information.
In time steps, the exchange proceeds as follows:

1) R makes a request to S. As a result of such request R expects S to give
   information expressed in RDF. Optionally: The request is digitally signed so to
   provide R with a way to make a “personalized” licence offer
2) S receives the request, creates the RDF for the answer and uses the minimum self
   contained graph (MSG) decomposition as highlighted in the next chapter to obtain
   a set of digital hashes which enable to cite in a non dereferenciable way the
   information it is willing to give. Uses the hashes in a licence created with the
   methodology described in section 3 and sends the result, from here on called
   proposal, to R. Optionally: signs the proposal so to provide S with the guarantee
   that if agreed, the answer will actually be provided within the specified terms
3) R receives the proposal and, if it decides that the terms are agreeable, signs it and
   returns it to S. Optionally: thanks to the properties of MSGs, R can check if the
   answer correspond to information which is already locally known. In this case R
   could drop the request as not interesting, or proceed, e.g., in case it is important
   for R to prove that the information was in fact legally acquired.
4) S receives the signed proposal, stores it and replies with the answer computed in
   2). Optionally: the signed proposal might be countersigned to allow R to prove
   that the information was obtained by legal means.
       Semantic Digital Rights Management for Controlled P2P RDF Metadata Diffusion   3


2.1.   An introduction to the Minimal Self Contained Graph theory

In this section we will illustrate the Minimum Self Contained Graph (MSG) theory.
The discussion will deepen that first illustrated in [3] and will provide the bases for
the understanding precisely the procedure.
   Let's first define what is the minimum “standalone” fragment of an RDF model. As
blank nodes are not addressable from outside a graph, they must always be considered
together with all surrounding statements, i.e. stored and transferred together with
these. MSG are the smallest components of a lossless decomposition of a graph which
does not take into account inference such as provided by OWL, as concepts such as
RDF-Molecules show [4] We will here give a formal definition of MSG (minimum
Self-contained Graph) and will cite some important properties (for proofs, see [3]).
   Def 1. An RDF statement involves a name if it has that name as subject or object.
   Def 2. An RDF graph involves a name, if any of its statements involves that name.
   Def 3. Given an RDF statement s, the Minimum Self-contained Graph (MSG)
   containing that statement, written MSG(s), is the set of RDF statements comprised
   of the statement in question and, recursively, for all the blank nodes involved by
   statements included in the description so far, the MSG of all the statements
   involving such blank nodes;
   It is possible to show however that the choice of the starting statement is arbitrary
and this leads to a unique decomposition of the RDF graph into MSGs.
   It is also possible to prove that:
   Theorem 1. If s and t are distinct statements and t belong to MSG(s), then MSG(t)
= MSG(s).
   Theorem 2. Each statement belongs to one and only one MSG.
   Corollary 1. An RDF model has a unique decomposition in MSGs.
   This is a consequence of theorem 2 and of the determinism of the procedure.
As a consequence of the Corollary 1, a graph can be incrementally transferred
between parties by decomposition into MSGs and transfers with granularity down to
one MSG at a time. Such transfer would be, as consequence of theorem 2, maximally
network efficient as statements would never be repeated.
Definition 4. The RDF Neighbourhood (RDFN) of a resource is the graph composed
by all the MSGs involving the resource itself.

Content based identifiers for MSGs
MSGs are standalone RDF graphs. As such they can be processed with algorithms
such as canonical serialization. We use an implementation of the algorithm described
in [5], which is part of the RDFContextTools Java library [6], to obtain a canonical
string representing the MSG and then we hash it to an appropriate number of bits to
reasonably avoid collisions. This hash acts as a unique identifier for the MSG with the
fundamental property of being content based, which implies that two remote peers
would derive the same ID for the same MSG in their DB. Sets of such IDs are used to
identify the information covered in the licences.
4    Roberto García, Giovanni Tummarello


3.   Semantic Digital Rights Management

   Lately, there have been great works and debate surrounding Digital Rights
Management, or DRM. A DRM system (DRMS) is composed of IT components and
services along with corresponding law, policies and business models which strive to
enable controlled distribution of content and associated usage rights.
   It is important for different DRMSs to interoperate. One of the main initiatives for
DRM interoperability is the ISO/IEC MPEG-21 standardisation effort. The main
interoperability facilitation components are the Rights Expression Language (REL),
which is based on a XML grammar and so syntax-based, and the MPEG-21 Rights
Data Dictionary (RDD) which captures the semantics of the terms employed in the
REL [7]. This one, however, does so without defining a formal semantics [8].
   The limitations of a purely syntactic approach and the lack of formal semantics can
be overcome using a semantics based approach based on ontologies [9]. Web
ontologies are used in order to benefit from the Semantic Web initiative efforts and
facilitate its integration in the Web context. The Copyright Ontology [10], of which
we give here an overview, is a conceptualisation effort based on OWL.
   The copyright domain is a very complex one and its conceptualization is a very
challenging task. In order to facilitate this, the Copyright Ontology conceptualisation
task has been divided in three parts. Each part concentrates on a portion of the
problem. The conceptualisation starts from building a model for the more primitive
part, the Creation Model. Then, the following step is to build the Rights Model, and,
finally, the Action Model on the roots of the two previous ones. This section just
sketches the main points of these three models. For more details, see [11].
   The Creation Model defines the different forms a creation can take. These can be
classified on the three top categories common in many upper ontologies: Abstract, a
mental concept, Object, a continuant or endurant and Process, an occurrent or
perdurant. [12].
   The Rights Model follows the World Intellectual Property Organisation (WIPO)
recommendations in order to define the rights hierarchy. There are the economic
rights plus the moral rights, as promoted by the WIPO and adopted by all the
countries adhered to the Berne Convention [13].
   The more relevant rights in the DRM context are the economic rights as they are
related to productive and commercial aspects of copyright. The Action Model
corresponds to the primitive actions that can be performed on the concepts defined in
the Creation Model and which are regulated by the rights in the Right Model.
   For instance, for the economic rights, these are the actions governed by them:
    • Reproduction Right: reproduce, commonly speaking copy.
    • Distribution Right: distribute. More specifically sell, rent and lend.
    • Public Performance Right: perform; it is regulated by copyright when it is a
        public performance and not a private one.
    • Fixation Right: fix, or record.
    • Communication Right: communicate when the subject is an object or
        retransmit when communicating a performance or previous communication,
        e.g. a re-broadcast. Other related actions, which depend on the intended
        audience, are broadcast or make available.
       Semantic Digital Rights Management for Controlled P2P RDF Metadata Diffusion    5


    • Transformation Right: derive. Some specialisations are adapt or translate.
   The action concepts are complemented with a set of relations that link them to the
action participants. The relations are adopted from the linguistics field and they are
based on case roles [14].
   The previously introduced pool of primitive actions can be combined in order to
build different value chains in the copyright domain. It is complemented with a set of
axioms that restrict the ways actions, rights and creation types are related.
   The P2P RDF metadata diffusion scenario is governed by the Reproduction and
Communication Rights. The Reproduction Right governs the Copy action that
reproduces a piece of metadata from Peer A, where the piece resides originally, to
Peer B, where the piece also resides when the copy is completed.
   The Communication Right governs the generic action Communicate. This action
corresponds, among others, to the situation where the agent responsible for a peer
makes content available to others from the place and time individually chosen by
them. Therefore, in the context of P2P diffusion, this is the right required by a peer in
order to make a piece of metadata available for others to copy.
   In order to complete the action model, there are also the licensing actions: Agree
and Disagree, the building blocks for any license, as the one shown in. Fig. 1.


          Fig. 1. Model for an agreement on a copy action pattern plus a condition

   The deontic operators are implicit in the agreement model. The agreement theme
corresponds to an implicit permission, i.e. the theme of an agreement is permitted.
The condition on the agreement theme corresponds to an obligation, i.e. in order to
fulfil the theme action it is necessary to satisfy the pattern defined by the condition
property object. Finally, it is also possible to model prohibitions. This can be done in
two ways, by agreeing on a negated pattern or by using the Disagree action.
6      Roberto García, Giovanni Tummarello


3.1.     License Checking, an example

The main objective has been to provide a straightforward and efficient
implementation geared towards an extensive use of DL (Description Logic) reasoners.
   Licenses are modelled as OWL Classes and copyrighted content intended uses are
modelled as instances. In order to check if a usage (instance) is authorised by a set of
licenses (classes) a DL reasoner is used to classify the instance in the available
classes. If the instance is classified into a class that models an agreement, the Agree
class as specified in the Copyright Ontology, the usage is authorised.
   Suppose, for example, that we want to model a license that allows the agent
"granted" to copy the metadata "fragment01" from "peerA" to either "peerB", "peerC"
or "peerD". Additional restrictions are that at most it can be simultaneously copied to
2 peers (as a result of an individual copy action) and that the copy can be performed
from January 1st 2006 to June 30th 2006.
   Table 1 shows the class pattern for the theme values of the license Agree. The
pattern is for Copy actions, so it is a subclass of Copy, and it is equivalent to the class
resulting from the intersection of four OWL restrictions, which constitute the
necessary and sufficient conditions that would trigger the classification of authorised
usage instances.

             Table 1. Class pattern for the actions authorised by the example license

Pattern ⊑    Copy                                                                       (1)
Pattern ≡    ∀pointInTime.≥ 2006-01-01T00:00:00, ≤ 2006-06-30T23:59:59 ⊓                (2)
             ∃agent.{granted} ⊓ ∃origin.{peerA} ⊓ ∃theme.{fragment0001} ⊓               (3)
             ( ≤ 2 recipient ) ⊓                                                        (4)
             ∀recipient.{peerC, peerD, peerB}                                           (5)


3.2.     Implementation

The Semantic DRMS is implemented at two levels. The ground level is about OWL-
DL and can be implemented with a common Description Logic reasoner. Pellet1 has
been selected because it can reason over custom data types and this has been very
useful to check licensing time ranges.
   This however must be complemented with a metalevel that implements the deontic
aspects that are implicit in the conceptual model. This metalevel guides the DL
checks that have to be performed in order to capture the semantics of the implicit
obligations, permissions and prohibitions. The metalevel has been also implemented
programmatically.
   MSG theory and tools has been implemented in [2] based on the Jena and the
Sesame toolkits. The entire procedure as described in this paper is covered in the
implementation of an upcoming version of the DBin platform [2] but it will be made
available as a standalone library to be used embedded in other applications which
exchange RDF.

1 Pellet OWL Reasoner, http://www.mindswap.org/2003/pellet
        Semantic Digital Rights Management for Controlled P2P RDF Metadata Diffusion   7


4.     Conclusions

The copyright ontology constitutes a complete framework for representing copyright
value chains and the associated flow of rights situations, agreements, offers, etc. This
general framework can be specialised and used in conjunction with the Minimum Self
Contained RDF graph theory to implement a P2P RDF diffusion mechanism which
could form a base for legally binding agreements.
   The proposed methodology works based on typical semantic web tools. Licences
are implemented as an OWL-DL ontology so an implementation only needs a
Description Logic classifier to determine if an action is permitted.
   One could say that the proposed approach would be limited to the case of
protection against “verbatim” redistribution of information. While this is the case
technically (MSG IDs would change with any simple modification, e.g., the insertion
of a meaningless triple attached to any blank node), this does not change the validity
and applicability of the procedure. It is in fact long established that copyright laws
protect not only the exact representation of the protected work but also derived
representations. The case is similar to one licensing a photo from a collection,
changing a single pixel and wanting to redistribute it as one’s own production outside
fair use limits.
   We believe this work can have wide applicability and cover real world
requirements. The development of this idea was in fact motivated by the need to
support much requested use cases in the Semantic Web P2P framework of DBin. As
per DBin version 0.4, information is in fact exchanged just based on a URI based
request. Under this condition, all that is known by a peer which involves that URI (at
MSG level) is shipped to the requesting peer. Thanks to the procedure we propose in
this paper it will be now possible to support important use cases involving
information which should be exchanged but just in controlled conditions.


4.1.    Related Work

   While we consider DRM a natural approach for the purpose of this paper, there
exist several general policy system which have been applied to SW scenarios.
Ontology-based approaches rely on the expressive capabilities of Description Logic
languages, such as OWL. DL reasoners can be then used to classify policies and
contexts and enable deductive inferences for policy checking.
   This is the approach for the Copyright Ontology implementation presented in this
paper. A generic policy language also following this approach is KAoS [15] which
can reason about licenses by ontological subsumption. KAoS requires however OWL-
Full reasoning capabilities and its implementation is based on a theorem prover.
   In contrast, rule-based approaches take the perspective of Logic Programming to
encode policies as rules with variables. Rei is a policy framework based on rules [16].
Rules are expressed as triples following a pattern that is typical of logical languages
like Prolog. In fact, Rei is developed using the XSB Prolog engine. Rei overcomes the
variables limitation and enables the definition of policies that refer to dynamically
determined values. However, this prevents it from exploiting the full potential of the
8   Roberto García, Giovanni Tummarello


OWL language. In fact, Rei rules knowledge is treated separately from OWL
ontology knowledge due to its different syntactical form.
   To overcome the limitations of this trade-off between ontology and rule-based
policies, some have proposed a hybrid solutions [17]. This is also the choice for the
Copyright Ontology implementation, as in fact SWRL is used for some axioms and
for metalevel reasoning.


References


 1. Resource Description Framework (RDF): Concepts and Abstract Data Model. W3C
    Working Draft 2002. RDFhttp://www.w3.org/TR/2002/WD-rdf-concepts-20020829
 2. G. Tummarello, C. Morbidoni, P. Puliti, F. Piazza, "The DBin Semantic Web platform: an
    overview", WWW2005 Workshop on The Semantic Computing Initiative (SeC 2005)
 3. Tummarello G.,;Morbidoni C.; Puliti P; Piazza F. "Signing individual fragments of an
    RDF graph" , 2005, World Wide Web Conference 2005 Poster Track
 4. Ding L.; Finin, T; Peng, Y; Pinheiro da Silva, P; , McGuinness, D , "Tracking RDF Graph
    Provenance using RDF Molecules" , 2005, Proceedings of the Fourth International
    Semantic Web Conference, November 2005
 5. Carroll, J "Signing RDF Graphs", 2003, International Semantic Web Conference 2003
 6. Tummarello, G.; Morbidoni C.; "RDFContext Tools 0.2",
    http://semedia.deit.univpm.it/tiki-index.php?page=RdfContextTools
 7. Wang, X.; DeMartini, T.; Wragg, B.; Paramasivam, M.; Barlas, C.: "The MPEG-21 rights
    expression language and rights data dictionary". IEEE Transactions on Multimedia, Vol. 7,
    No. 3, pp. 408-417, 2005
 8. García, R.; Delgado, J.: "An Ontological Approach for the Management of Rights Data
    Dictionaries". In Moens, M. & Spyns, P. (ed.): "Legal Knowledge and Information
    Systems". IOS Press, Frontiers in Artificial Intelligence and Applications Vol. 134, 2005
 9. García, R.; Gil, R.; Delgado, J.: "A Web Ontologies Framework for Digital Rights
    Management". In press, Journal of Artificial Intelligence and Law, Springer, 2006
10. Copyright Ontology, http://rhizomik.net/ontologies/copyrightonto
11. García, R.: "A Semantic Web Approach to Digital Rights Management". PhD Thesis,
    Technologies Department, Universitat Pompeu Fabra, Barcelona, ES, 2006.
    http://rhizomik.net/~roberto/thesis
12. Niles, I.; Pease, A.: "Towards a Standard Upper Ontology". In Welty, C.; Smith, B. (eds.):
    Proceedings of the 2nd International Conference on Formal Ontology in Information
    Systems (FOIS), Maine, USA, 2001
13. Berne Convention, http://www.wipo.int/treaties/en/ip/berne
14. Sowa, J.F.: "Knowledge Representation. Logical, philosophical and computational
    foundations". Brooks Cole Publishing Co., 2000
15. Uszok, A., et al.: "KAoS policy management for semantic web services". IEEE Intelligent
    Systems, Vol. 19, Num. 4, pp. 32-41, 2004
16. Kagal, L.: "A Policy Based Approach to Governing Autonomous Behavior in Distributed
    Environments". PhD Thesis, University of Maryland, Baltimore County, USA, 2004
17. Bradshaw, J.; Kagal, L.; Montanari, R.; Toninelli, A.: "Rulebased and ontology-based
    policies: Toward a hybrid approach to control agents in pervasive environments". In
    Proceedings of the ISWC2005 Semantic Web and Policy Workshop, 2005