=Paper=
{{Paper
|id=Vol-95/paper-6
|storemode=property
|title=Data-Centric Networks and Peer-to-Peer Databases
|pdfUrl=https://ceur-ws.org/Vol-95/06-nejdl.pdf
|volume=Vol-95
|dblpUrl=https://dblp.org/rec/conf/mmgps/Nejdl03
}}
==Data-Centric Networks and Peer-to-Peer Databases==
Data-Centric Networks
and
Peer-to-Peer Data Management
Wolfgang Nejdl
L3S / University of Hannover
Germany
Overview
Motivation and Background
Networks, Databases and the Web: DHTs, PEERS, et al
L3S P2P Background - KnowledgeWeb, Edutella, et al
Schema-Based Peer-to-Peer Networks
Resource Description Framework (RDF) and RDF Schema
Edutella Query Service / RDF Query Exchange Language RDF-QEL
Subscriptions
Efficient Routing / HyperCuP & Super-Peers
Distributed Query Processing
Access Control and Trust Negotiation
Summary and Conclusions
Wolfgang Nejdl 19/12/03 2
Evolution of Networks
from
Host-centric networks (URLs & low level routing)
Enabling and optimizing communication between
network hosts
to
Data-centric networks (Google & P2P)
Into the Coddian world of physical data independence
(invited talk Scott Shenker, VLDB 2003)
Wolfgang Nejdl 19/12/03 3
Where Databases meet Networks: PEERS
Wolfgang Nejdl 19/12/03 4
Where the Semantic Web Meets Databases: KnowledgeWeb
Scalabity
Heterogeneity
Semantic Web Languages
Services
Dynamics
Wolfgang Nejdl 19/12/03 5
Knowledge Web
Where Databases & Rules meet the Semantic Web: REWERSE
Reasoning on the Web with Rules and Semantics
How to get and retrieve data?
Querying, reasoning and optimization
How to protect data?
Policy specification and evaluation
How to integrate data?
Reasoning and mediation
Selected applications for proof-of-concept purposes
Personalized Web systems
Web-based decision support
Bioinformatics Semantic Web
Wolfgang Nejdl 19/12/03 6
Where E-Learning meets Databases & Sem. Web: Edutella
Specify and implement a RDF-
based meta-data
infrastructure for P2P
networks
Developed as part of the open
source peer-to-peer project
JXTA
edutella.jxta.org
60+ contributors from various
institutions
Building block for the EU/IST
ELENA smart learning space
Wolfgang Nejdl 19/12/03 7
E-Learning + Infrastructures + Interoperability: PROLEARN
Working towards
innovative elearning
resources
interoperable elearning
resources and systems
sustainable elearning
infrastructures and
processes for SMEs
Wolfgang Nejdl 19/12/03 8
Schema-Based Peer-to-Peer Networks
User-definable schemas Database Systems Schema-based P2P
Systems
Structured schemas ANY RDBMS
AMOSII
CHATTY WEB
schema- OBJECTGLOBE
Query language based
CONCEPTBASE
TSIMMIS
EDUTELLA
ONTOBROKER PIAZZA
TUKWILA
DIRECTCONNECT
fixed
Decentralized control GNUTELLA
schema/ NAPSTER
KAZAA
keywords
Node autonomy P-GRID
P2P Systems
Transient peers
CAN
Self organization key
CHORD
local distributed peer-to-peer
(system list not complete)
Wolfgang Nejdl 19/12/03 9
RDF / RDF Schema for Describing Distributed Resources
Basic Formalisms for the Semantic Web
URIs to identify resources
Combine resources and annotate resources with attributes, using
Tuples
Graph as basic model, easy to translate to logic facts
RDFS allows us to define the RDF vocabulary used (classes and
attributes), and thus to represent simple semantic models
Possible extensions towards more expressive semantic descriptions, e.g.
description logic (DAML+OIL / OWL)
Using RDF / RDFS in the P2P context
Distributed annotations for distributed resources
Flexible schema definitions, which can be uniquely identified and
combined, as well as extended by additional properties
Wolfgang Nejdl 19/12/03 10
RDF-QEL: RDF Query (Exchange) Language
Datalog-based Query Exchange Language (RDF-QEL)
RDF QEL1: conjunctive query up to
RDF QEL5: RDF QEL4 (SQL3) + general recursion
Edutella consumer
see Nejdl et al: „EDUTELLA: A P2P Networking
RDF QEL 1-5 RDF query
Infrastructure Based on RDF“, WWW 2002 result
Datalog-based ECDM
Datalog is used as the internal data model (ECDM:
Edutella Common Data Model) and provided as a set
of Java classes Local query
RDF is used to represent the queries transmitted
between the peers repository
Wrappers for other RDF query languages (RQL, Edutella Provider
TRIPLE, etc.) and XML query languages (like Xpath)
Edutella query data flow
Wolfgang Nejdl 19/12/03 11
Another Possibility: Don‘t query, subscribe
Subscriptions are a good idea, too (get the NYTimes each morning, get new
teaching material on P2P topologies …)
Example: Selective Information Dissemination in P2P-DIET
Instead of Queries and Answers we need
Profile forwarding
Notification forwarding / Filtering
Advertisement forwarding
Dynamicity of P2P network Æ storing notifications / rendezvous
See e.g. Koubarakis et al: Selective Information Dissemination in P2P
Networks: Problems and Solutions, SIGMOD Record, Special P2P Issue,
September 2003 as well as ongoing work to integrate P2P-DIET and Edutella
See also Terpstra, Buchmann et al: A P2P-Approach to Content-Based
Publish/Subscribe
Wolfgang Nejdl 19/12/03 12
P2P and Efficient Routing
How do peer-to-peer networks scale?
Requirements:
Symmetric topology (every node is a root)
Low network diameter (small worlds property, should be
O(log n))
Limited node degrees (number of peer-connections from a node,
should be O(log n))
Load balancing of traffic
Efficient broadcast (receive broadcast messages only once)
Adaptable to dynamic number of peers
Wolfgang Nejdl 19/12/03 13
HyperCuP Peer-to-Peer Topology
Details: see e.g. Schlosser, Sintek, Decker, Nejdl: „HyperCuP – Shaping Up
Peer-to-Peer Networks“, 2nd Intl. WS on Agents and P2P Computing, 2002
Wolfgang Nejdl 19/12/03 14
Hypercube Topology
Broadcast Algorithm
Annotate messages with the “dimension” of the peer-to-peer
connection, and only forward it along “higher” dimensions
Properties
Network diameter, characteristic path length and number of nodes are
O(logbN)
Fault tolerant, vertex-symmetric
Extensions 6 0 7
Dynamic hypercube
2 2
Base=N hypercube 1 1
Cayley graphs 3 0 2
Step 2
4 0 5
Step 3 1 1
2 2
Step 1
8 0 1
Wolfgang Nejdl 19/12/03 15
Chord Neighbor and Route selection Algorithms
000
111 110
001 d(000, 001) = 1
110 010 d(000, 010) = 2
101 011
100
d(000, 001) = 4
Neighbor selection: ith neighbor at 2i distance
Route selection: pick neighbor closest to destination
Wolfgang Nejdl 19/12/03 16
Super-Peer Networks
Observation: Peers vary significantly in availability, bandwidth,
processing power, etc.
Create network backbone from highly available and powerful peers to
distribute load better.
See also Yang, Garcia-Molina: Improving Search in P2P Systems, Intl.
Conf. on Distributed Computing Systems, Vienna, 2002, or file
sharing networks like KaZaa
Æ
Wolfgang Nejdl 19/12/03 17
Super-Peers and Routing Indices
Nejdl et al. Super-Peer-Based Routing and Clustering
Strategies for RDF-Based Peer-To-Peer Networks. WWW 2003
Wolfgang Nejdl 19/12/03 18
Extension to Distributed Query Processing
Interleave P2P techniques and query processing
Push abstract query plans through the super peer network
Super peers pick and expand those parts of the query plan that can be
executed locally
On the fly distribution and expansion of query plans
See Brunkhorst, Dhraief, Kemper, Nejdl, Wiesner: Distributed Queries
and Query Optimization in Schema-Based P2P-Systems, VLDB-P2P-
Workshop
Query Optimization exploits clustering strategies
Access-path clustering: attribute-based clustering using per-attribute
hypercubes (using the hypercube as a balanced n-ary search tree) (see
Dhraief, Kemper, Nejdl, Wiesner: Distributed Queries and Query
Optimization in Schema-Based P2P Systems, submitted)
Wolfgang Nejdl 19/12/03 19
Access Control and Automated Trust Negotiation
Goal → protect resources from unauthorized access
Establish trust between strangers
Initial trust among nodes is not necessary
No need for prior registration
Use and interchange of credentials: online analogue to the
paper credentials in real life.
Negotiation according to policies
Access control policies can be used in both sides (requester and provider)
Delegation
Automated Trust Negotiation → iterative exchange of digital
credentials.
Iterative disclosure of policies and credentials
Wolfgang Nejdl 19/12/03 20
Credentials and Policies
Property-based credentials
Describe one or more properties / attributes of the owner
asserted by the issuer, signed with the private key of the issuer
As credentials contain sensitive information, they are not shown
until the other part demonstrates that it is qualified to have such
sensitive information.
Access Control Policies
Protect a resource or a credential
Specify credentials that the other negotiation participant must
provide in order to get access
Several policies can be involved during the negotiation.
Several policies for the same resource or credential.
Policies can be protected like any other resource.
Wolfgang Nejdl 19/12/03 21
Example „Alice & E-Learn“
Step 1: Alice requests to access E-Learn‘s free Spanish course
Step 2: E-Learn replies with policy protecting this resource
Requests police badge to prove police officer status
Requests driver‘s licence to prove California residence status
Step 3: Alice views her driver‘s license as non-critical, but needs to
protect her police officer credential
Discloses driver‘s license
Requests E-Learn membership proof from the Better Business Bureau
Step 4: E-Learn agrees
Discloses Better Business Bureau membership card
Step 5: Alice finds her policy satisfied
Discloses police badge
Step 6: E-Learn finds its policy satisfied
Makes Spanish course available
Wolfgang Nejdl 19/12/03 22
Automated Trust Negotiation among Peers on the Web
Design policy language to express trust negotiation
Delegation, policy protection, negotiation strategies
Based on guarded distributed logic programs
Develop run-time system for automated trust negotiation
Based on Prolog meta interpreter embedded as Java library in Applet / Server
(WWW) or Peer-to-Peer (Edutella) environment
Currently two application areas
eLearning (ELENA, EU/FP5 @ L3S)
Emergency management (ITR @ DAIS/UIUC (M. Winslett))
See e.g. Yu, Winslett, and Seamons. Supporting Structured Credentials and
Sensitive Policies through Interoperable Strategies for Automated Trust
Negotiation, ACM Transactions on Information and System Security,
February 2003.
See also: Nejdl, Olmedilla, Winslett: Automated Trust Negotiation among Peers
on the Semantic Web, submitted
Wolfgang Nejdl 19/12/03 23
Summary and Conclusions
Schema-based P2P networks and P2P-based data management
infrastructures build upon traditional P2P networks and distributed /
heterogeneous database research, while posing new challenges as
well as additional functionalities
Building blocks are flexible / extendable schema languages, expressive
query and reasoning languages, efficient network topologies as well
as routing and clustering algorithms, data integration and mediation
functionalities, query optimization, and last but not least,
dezentralized access control and trust negotiation mechanisms
See also SIGMOD Record September 2003, Special P2P Issue: Nejdl,
Siberski, Sintek: „Design Issues and Challenges for RDF- and Schema-
Based Peer-to-Peer Systems“
Wolfgang Nejdl 19/12/03 24