=Paper=
{{Paper
|id=Vol-95/paper-1
|storemode=property
|title=Metadata Management in the EU DataGrid
|pdfUrl=https://ceur-ws.org/Vol-95/01-mccance.pdf
|volume=Vol-95
|dblpUrl=https://dblp.org/rec/conf/mmgps/McCance03
}}
==Metadata Management in the EU DataGrid==
Metadata Management in the
European DataGrid Project
Gavin McCance
University of Glasgow
European DataGrid Project
GridPP Project
DataGrid is a project funded by the European Union
GridPP is funded by PPARC
MMGPS – 16 December 2003 – Metadata Management in EDG
Outline
Classes of metadata in EDG
Grid internal metadata
Application specific metadata
Products
Replica catalogues
Spitfire
Technology details
Future Work
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 2
Types of Metadata
Two types of metadata used in EDG WP2
Grid internal metadata
Metadata on files (size, checksum, etc)
Metadata on logical names (application specific)
Application specific general metadata
Not related on logical filenames
Bookkeeping databases
Data Catalogues
Image metadata
etc
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 3
Grid Internal
Replication Metadata
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 4
Replica Location Problem
Given a logical file identifier – how do we find all
the replicas of that file on the Grid
Driven by two use-cases:
a) Particle physics – multiple replica of the same file so that
the data are always near the compute resources - for data
hungry applications
b) Earth Observation/Medical – convenient mechanism for
logical namespace. Don’t need to know the physical
location of the files.
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 5
Replica Metadata
Logical filename to storage (physical) filename
mapping
Physical File Replica
Logical Alias
Physical File Replica
Logical Alias GUID
Physical File Replica
Logical Alias
Physical File Replica
Replica Location Service
Replica Metadata Catalog
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 6
Replica Location Service (RLS)
Optimised to answer 2 very specific queries:
“for a given GUID, give me all the replicas”
“for a given GUID give me all locally
available replicas”
Scalability achieved by:
Each site has a Local Replica Catalog LRC containing
mappings for files located at the given site
Each site runs a Replica Location Index RLI which
contains a bloom-filter hashmap for all GUIDs in all
LRCs
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 7
Architecture…
Replica Location
Index
Local Replica Local Replica Local Replica
Catalog Catalog Catalog
Site 1 Site 2 Site 3
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 8
Architecture…
Each LRC updates the RLI on every other site.
Replica Location Replica Location Replica Location
Index Index Index
Local Replica Local Replica Local Replica
Catalog Catalog Catalog
Site 1 Site 2 Site 3
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 9
Sequence to answer the query
for a given GUID, give me all locally available
replicas
simply contact the Local Replica Catalog.
for a given GUID, give me all the replicas
contact Replica Location Index to retrieve all LRCs
potentially having a mapping for the given GUID:
GUID Æ List of LRCs
contact each LRC in the list to retrieve all replicas
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 10
Bloom Filter Indexing
Advantages:
High level of scalability
Fast
Not a memory intensive hash
Disadvantages:
Only fulfills “EQUALTY” type queries, i.e. no wildcards
Non-deterministic, i.e. there are a small number of
false positives to be dealt with
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 11
Replica Metadata Catalog (RMC)
Stores GUID metadata:
logical file names (human readable)
small number of user-defined attributes ~O(10)
Attributes are natively typed:
string, float, int, date
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 12
RMC
Used to do GUID selection based on application-
specific metadata
Subsequently use the RLS to find the physical replica based
on the GUID
Currently a centralised catalog
though work ongoing with Oracle Streams for replicated
architectures
Work on clustering and replication for high availability
solutions
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 13
Application Specific
General Metadata
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 14
Spitfire: Technology Demonstrator
Capabilities:
Simple Grid-enabled front-end for any remote RDBMS
through secure Web Services (SOAP-RPC)
Provides sample generic RDBMS methods that may easily
be customized with little additional development
WSDL interfaces
Web Browser integration (data browser servlet)
GSI authentication
Local authorization module
Not suitable for the retrieval of LARGE result sets
Status: current version 2.1
Used by EU DataGrid Earth Observation and Biomedical
applications.
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 15
Spitfire Sample API
Spitfire Sample API based upon common SQL
operations. Use the Spitfire Grid service where you
might have used JDBC before.
Provides DB query operations, update operations,
and schema update operations.
Provides browser servlet to expose specific views of
the data to web based clients.
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 16
Technology details
All services implemented as secure web services
WSDL exposed allowing auto-client generation
Supplied clients: Java, C++
Others have successfully used perl, python clients using our
WSDL
SSL secure authentication using Grid Proxy
certificates (GSI, but NOT httpg)
‘Medium-grained’ authorization including web-based
administration tool:
‘medium-grained’ meaning each method can be
allowed/denied based on patterns of distinguished names,
VOMS capabilities.
can interpret grid-map files
can interpret VOMS credentials and capabilities contained
therein
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 17
Deployment
Tested and deployed on
Tomcat/MySQL,
Tomcat/Oracle9i
Oracle9iAS/Oracle 9i.
Testing ongoing for Tomcat/DB2.
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 18
Future Work
Plan to work together with DAIS working group of
GGF to ensure that our services can be re-factored
into DAIS-compliant services.
Should be fairly easy since we are starting from web
services.
Plan to work more closely with applications in order
to refine the metadata interface, or just to enable
their existing metadata applications to be ‘on the
Grid’.
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 19
Security modules
HTTP + SSL
Request + client certificate
TomCat
SSLServerSocketFactory Trusted CAs
(spitfire−cacerts.jks)
TrustManager
Is certificate signed
by a trusted CA?
Authentication using
standard GSI certs or
Revoked
cert repository
Has certificate
been revoked?
proxies
Servlet chain no
Security Servlet
Trustmanager checks
Authorisation module validity and revocation
Does user specify role?
Role based Authorisation
no
Find default Role repository
yes
Specific and default roles
Role ok?
Connection
role mappings
Map role to connection id
Request + connection id .xsql files
Oracle XSQL servlet
Gavin McCance – University of Glasgow MMGPS – 16 December 2003 – Metadata Management in EDG – n° 20