=Paper=
{{Paper
|id=Vol-513/paper-9
|storemode=property
|title=Security-oriented Portals for the Life Sciences
|pdfUrl=https://ceur-ws.org/Vol-513/paper09.pdf
|volume=Vol-513
|dblpUrl=https://dblp.org/rec/conf/iwsg/SinnottDJMSW09
}}
==Security-oriented Portals for the Life Sciences==
IWPLS’09
Security-oriented Portals for the Life Sciences
*R.O. Sinnott, T. Doherty, J. Jiang, S. McCafferty, A. Stell, J. Watt
National e-Science Centre
University of Glasgow
Glasgow G12 8QQ
r.sinnott@nesc.gla.ac.uk
ABSTRACT • a moving set of scientific data and understanding of
Motivation: The life sciences are broad in scope and cover multi-
that data;
and inter-disciplinary domains as well as the biological domain.
These domains can for example involve researchers from the clini- • a perceived lack of security of Grids and issues this has
cal, social, geo-spatial and computer sciences amongst others, e.g. on data access and release.
in understanding genetic variations across a population as might be This list is not complete and there are doubtless other issues
undertaken through a genome-wide association study. Given, this it
is essential that portals for these communities are targeted to the
that could be brought to bear on life science community
individual expertise of the particular domain scientists. Thus tools take-up. Some of these are described in more detail in [1]. In
available to a bioinformatician through a portal might well be mean- this paper we argue that many these issues can now be tack-
ingless to a social scientist and vice versa. Furthermore certain do- led. In particular we focus upon solutions that tackle several
mains demand that fine-grained access control on data is supported.
In this paper we outline how a portfolio of life science related pro-
major uptake considerations: usability of the e-
jects at the National e-Science Centre (NeSC) at the University of Infrastructures that are developed; support for inter-
Glasgow have benefited from security-oriented portals focused upon disciplinary research and security considerations for all pro-
ease of access, configuration and usage, where data providers are tagonists involved in life science research. We claim that
assumed to be autonomous and able to make their own local fine-
grained access control decisions. We describe the basic technolo-
this is now possible through security-oriented portals tar-
gies that underlie these solutions and outline specific case studies in geted to the specific needs of life science researchers. This
their application in the areas of depression, self-harm and suicide, is demonstrated through two case studies in the area of pub-
and in the area of paediatric endocrinology focusing in particular on lic health and genetics of rare diseases.
rare diseases associated with sex development.
The rest of this paper is structured as follows. Section 2
describes portal based technologies and how they have been
1 INTRODUCTION applied at the National e-Science Centre (NeSC) at the Uni-
The life sciences have much to benefit from the application versity of Glasgow (www.nesc.gla.ac.uk). Section 3 describes
of e-Science and Grid based technologies: seamless access technology support for secure access to and configuration of
to large scale high performance computing (HPC) facilities; portals. Section 4 describes the application of these solu-
technologies and processes that help in coping with the ex- tions to particular case studies. Finally we draw some con-
plosion of data sets in high throughput post-genomic life clusions on the work as a whole and outline areas of future
sciences, and supporting multi- and inter-disciplinary re- development.
search communities in tackling major research questions are
some of the most obvious ones. Despite this, the impact 2 GRID PORTALS
envisaged in the application of e-Science and Grids in the
Web portals provide a single point of access where a variety
life sciences domain has not been as major as first hoped
of information is aggregated and personalised to individuals
for. There are a variety of reasons for this. These include:
to improve their experience in accessing and using a range
• the complexity of the middleware that is used;
of Internet resources. Common features of web portals in-
• the lack of clear understanding and expression of life clude support for categorization of web content and ad-
science research community requirements; vanced search facilities. Grid portals build upon the general
• a moving set of life science research questions; web portal model to deliver the benefits of Grid computing
to virtual communities of users, providing a single access
*
point to Grid services and resources. Web 2.0 based solu-
To whom correspondence should be addressed.
1
R.O. Sinnott et al.
tions whether this be wikis, social networking capabilities, their own services and their own user base. The shared
lightweight tools, e.g. for visualisation or mash-ups, can resources underlying these communities need to be
also be made available through portals. autonomous, however, and under the control of these
The major difference between a Web portal and a Grid communities.
portal is that Grid portals provide a single point of access for • Monitoring – administrators and users should have di-
Grid resources specific to a given domain, rather than more rect access to monitoring information about various as-
general Internet-based web pages or content. Grid portals pects of the Grid for their virtual organization, for their
provide end users with a customized view of software and institution, and for those using the shared resources.
hardware resources specific to their particular problem do- This might include notifications of new data sets or new
main. This customization can be based upon the privileges tools of interest to life science communities.
that end users have. This can be used to restrict or authorize • Security – certain life science communities and data
access to collections of remote services and data sets. Grid providers demand fine-grained security for tailoring ac-
portals should ideally allow researchers to focus on their cess and usage of Grid resources. This might be based
research problems by making the Grid a transparent exten- on specific roles particular to a virtual organization.
sion of their desktop computing environment. Whilst it is possible to develop hand-crafted portals, recent
The development of targeted portals for the life sciences advances in this area have resulted in Grid portal frame-
in the large, i.e. without focusing on particular specializa- works which facilitate re-use of code and support various
tions, offers a direct way in which a rich variety of applica- forms of structuring portal pages. Grid portal frameworks
tions and resources can be made available in a transparent provide a set of basic functionalities and infrastructure for
manner to users who do not wish to become Grid experts. developing further portal components as plug-ins. Common
Given the depth and range of life science research currently components are offered for security (e.g. access manage-
being pursued, it is highly that a single one-stop shop portal ment), for personalisation (e.g. user/group profiles), and for
could be established for all researchers. different presentation capabilities (e.g. JSP, XSP,
That said, Grid-based life science portal solutions should XML/XSLT).
meet the following set of requirements: Portals themselves provide access to families of portlets
• Usability – the portal should be developed with both the or other hosted applications. Portlets are typically Java-
experienced and inexperienced research communities in based web components managed by a portlet container that
mind. This might benefit from use of backend servers to processes requests and generates dynamic content. Portals
manage user certificates and Grid credentials required use portlets as pluggable user interface components, provid-
across the life science resources. ing a presentation or access layer to systems. Portlets sup-
• Single sign-on – in addition to secure access to a Grid- port modular and user centric web applications. Portlets are
based life science portal itself, seamless access to a the building blocks of portals and are typically small units
range of life science resources without the need for mul- of functionality within a portal. Each portlet typically pro-
tiple authentications should be supported. Access vides an interface to a Grid service offering some well de-
should, of course, depend on user privileges. The con- fined functionality. Users and administrators of communi-
cept of single sign-on is one of the characterizing fea- ties or virtual organisations more generally can build cus-
tures of the Grid tomized environments by adding portlets.
• Interoperability – it should be possible for research As we shall see in section 3, for advanced scenarios, secu-
communities to develop their own services using poten- rity techniques can be used to authorize use of particular
tially different middleware on their own local resources, portlets or the resources available to the services accessible
but be able to make these available to remote researchers via those portlets.
through portal technologies. It is worth noting that to support portlet and portal inter-
• Support for research – it is essential that the services operability, the portal community and wider industry have
and data sets made available through the portals meet developed two key standards of relevance to the Grid com-
the real needs of life science researchers. Their input and munity: the Java Portlet Specification (JSR-168) and the
feedback should drive the design and development of Web Service for Remote Portlets (WSRP). JSR-168 enables
these portals and their content. interoperability among portlets and portals. The specifica-
• Support for collaboration – Grid-based life science por- tion defines the contract between a portlet and portlet con-
tals should facilitate collaboration between researchers tainer, and a set of portlet APIs that address personalization,
at all levels – within an institution, between institutions, presentation, and security. The specification also defines
across national and international levels. how to package portlets in portal applications. WSRP allows
• Portal administration and management – user commu- plug-and-play of content sources (portlets) within portals
nities should be able to establish and ultimately manage and other aggregating web applications. WSRP standardizes
2
Security-oriented Portals for the Life Sciences
the consumption of web services in portal front-ends, and tionship with each home site, and trusts the home site to
the way in which content providers write web services for authenticate its users properly.
portals. This allows content producers to maintain control After the user has picked their home site, their browser is
over the code that formats the presentation of their content. redirected to their site’s authentication server, e.g. an LDAP
By reducing the cost for aggregators to access their content, repository, and the user is invited to log in. After successful
WSRP improves the integration of content sources into authentication, the home site redirects the user back to the
pages for end users. SP and the message carries a digitally signed Security As-
WSRP and JSR-168 are complementary specifications. sertion Markup Language (SAML) authentication assertion
JSR-168 defines a standard portlet API for Java-based por- message from the home site, asserting that the user has been
tals. WSRP allows content to be hosted in the environment successfully authenticated (or not!) by a particular means.
most suitable for its execution, while still being easily ac- The actual authentication mechanism used is specific to the
cessed by content aggregators. Second generation Grid por- IdP.
tals can be produced from pluggable (JSR-168 compliant) If the digital signature on the SAML authentication asser-
Grid portlets. Running inside a portlet container, portlets tion is verified and the user has successfully authenticated
can be added or removed, thus providing administrators themselves at their home site, then the SP has a trusted mes-
with the ability to customize access and usage of Grid ser- sage providing it with a temporary pseudonym for the user
vices at portal level. A portal built from Grid portlets can (the handle), the location of the attribute authority at the IdP
provide users with the ability to integrate services provided site and the service provider URL that the user was previ-
by different Grid-enabling technologies. This aspect is criti- ously trying to access. The resource site then returns the
cal to the success of life sciences since a range of distributed handle to the IdP’s attribute authority in a SAML attribute
services will likely be developed by different communities query message and is returned a signed SAML attribute
and institutions, and subsequently made accessible through assertion message. The Shibboleth trust model is that the
common research specific portals (VREs). target site trusts the IdP to manage each user’s attributes
The NeSC at Glasgow have developed a large portfolio of correctly, in whatever way it wishes. So the returned SAML
JSR-168 compliant portlets that are available in numerous attribute assertion message, digitally signed by the origin,
portals supporting various life science research communi- provides proof to the target that the authenticated user does
ties. Furthermore, through the Open Middleware Infrastruc- have these attributes.
ture Institute (OMII-UK) Security Portlets project (SPAM- The attributes used in this assertion may then be used to
GP) at NeSC Glasgow (www.nesc.gla.ac.uk/projects/omii- authorise the user to access particular areas of the resource
sp) we have also developed a collection of JSR-168 compli- site. Once authenticated through Shibboleth, the notion of
ant security-oriented portlets that support simple, user- single sign-on is supported whereby a user may redirect
driven portal security based upon information provided by their browser to other protected Shibboleth resources with
the Internet2 Shibboleth technology no need for re-authentication.
(http://shibboleth.internet2.edu) and the UK Access Man- Underlying Shibboleth-based SAML token exchanges are
agement Federation (www.ukfederation.org.uk). a core set of attributes based upon the eduPerson object
class (www.educause.edu/eduperson/) that are pre-agreed
across the federation so that an SP can make its own local
3 SECURITY INFRASTRUCTURES
access control decision. It is essential that interoperability
As mentioned many communities are dissuaded from ac- exists between attribute authorities issuing attribute asser-
cessing and using Grid resources due to the complexity of tions, policy writers defining access policies, and access
the middleware and associated processes. Much of this decision functions that make decisions based on the initia-
stems from the demands to acquire and use X.509 based tor’s attributes and sites target and resource policy.
certificates as used to support the public key infrastructure However given the fact that Grids can be used to establish
(PKI) allowing single-sign on to distributed Grid resources. e-Infrastructures and more security-oriented VOs, the re-
The issues with PKIs including their limitations are dis- quirement to have VO specific attributes defined and em-
cussed in [2-4]. To overcome this, the UK and many inter- bedded in core eduPerson attributes are highly desirable.
national communities are moving to federated access control The most likely attribute for this purpose is the eduPer-
based upon the Internet2 Shibboleth technologies. sonEntitlement attribute. The eduPersonEntitlement attrib-
When a user attempts to access a Shibboleth protected ute can utilise structured XML data representative of large
service or Service Provider (SP) more generally, they are scale Grid infrastructure users and IdPs. This might include
typically redirected to a WAYF server that exists as part of the VO they are involved in, the roles that they might have
the federation that asks the user to pick their home Identity in that VO etc.
Provider (IdP) from a list of known and trusted sites. The
service provider site already has a pre-established trust rela-
3
R.O. Sinnott et al.
The SPAM-GP exploits Shibboleth-based access to Grid can impact upon individuals suffering from some form of
portals and provides tooling that allows usage of Shibboleth mental health problem. Thus what is the impact of living
information, including VO-specific attributes to be utilized. alone on suicide rates? What is the impact on suicide rates
In particular SPAM-GP developed a family of JSR-168 on access to prescription drugs? What is the impact on sui-
portlets that support: cide rates on access to park land? There are in short a multi-
• scoped attribute management (SCAMP) which allows tude of research questions that could potentially be an-
restricted and syntactically correct manipulation of swered if the appropriate data resources could be integrated
the Shibboleth attribute acceptance policy, streamlin- and analysed.
ing the subset of IdPs from whom a portal will accept To support these kinds of scenarios the various data sets
user attributes across the federation. have been acquired and made available through a targeted
• creation and usage of X509 attribute certificates DAMES portal. Ideally the data sets themselves would re-
(ACP) to allow distributed service providers to make main at the remote data provider sites however in the first
their own local authorisation decisions when users at- instance we have been given direct access to randomized
tempt to invoke remote (protected) services; copies of the associated clinical data sets. This was made
• content configuration allowing dynamic configuration possible through the previous MRC funded Virtual Organi-
of portal content based on Shibboleth attributes and sations for Trials and Epidemiological Studies (VOTES –
knowledge of available services. Once authenticated www.nesc.ac.uk/hub/projects/votes) project. The data sets
to a portal via Shibboleth, users are presented with a are identical in structure to the actual SMR data sets, i.e.
filtered view of available portlets (and hence access to they have the same schema. However the data itself has
a restricted set of services). This portlet has been tar- been pseudo-anonymised to remove patient identifying in-
geted specifically to extend the GridSphere portal formation. Nevertheless much of this information can be
framework. used directly for building proof of concept systems.
The detailed implementation of these portlets and the in- To begin with a family of portlets has been developed
frastructure that is required to support them is described in that offers user interfaces to targeted data services. For sim-
detail in [5]. Our focus here is to show how these portlets plicity, we associate a particular portlet with a given role as
support life science researchers. We demonstrate this in the returned by the Shibboleth IdP. Using the content configura-
ESRC funded DAMES project and the EU FW7 EuroDSD tion portlet from SPAM-GP it is possible to associate pos-
projects. session of roles with associated portlets. It is of course fea-
sible to extend this so that possession of a given role can be
4 CASE STUDIES used to give access to a family of portlets and associated set
of group permissions inside of the portal framework.
4.1 DAMES Originally the DAMES work was based upon the
The social sciences as with many other domains are awash GridSphere technologies (www.gridsphere.org), however
with data. The ESRC funded DAMES project is developing more recently we have targeted the LifeRay portal environ-
systems that will help in tackling the problems facing this ment (www.liferay.com). This was primarily due to the lack
community. The DAMES project as a whole has a variety of of continued support for the GridSphere framework.
themes. These include occupational data management; ethic The portlets themselves allow targeted access to a subset
and minority data management; educational data manage- of variables available in the particular data sets themselves.
ment and the one of concern here: e-Health data manage- Thus in the case of the mental health related scenarios, these
ment. The project is exploring the data management chal- include Census variables associated with general health of
lenges associated with access to a wide range of data sets the population; SMR variables associated with mental health
cross multiple e-Health related disciplines. These include including subsets of SMR01 (hospital admissions data);
clinical data sets such as the Scottish Morbidity Records SMR04 (mental health and psychosis data) and SMR99
(SMR) covering hospital admissions, mental health and psy- (death related variables where the cause of death was indi-
chosis, cancer and birth and death related data sets; Census cated as suicide).
related data sets and geospatial data sets. To prove the conceptual approach of secure portlet based
Scotland is especially well placed to support e-Health re- access to distributed data sets, we developed multiple differ-
lated research. Clinical information has been captured and ent portlets for each data set and associated these with par-
curated for over 30 years in Scotland and an extensive re- ticular roles. In particular we identified advanced and basic
cord of the health of the Scottish population exists. The roles. Thus to see the advanced portlets for SMR04 a user
DAMES project has focused in particular upon data sets would have to be in possession of DAMES_SMR04_adv.
associated with mental health and is exploring the issues The primary difference between the advanced and basic
related to depression, self-harm and suicide. In particular it roles was in the variables that could be selected with ad-
is attempting to answer questions related to the factors that
4
Security-oriented Portals for the Life Sciences
vanced roles having a much greater set of variables that
could be selected. The DAMES portal showing how it has
been configured using the CCP for an individual user with
advanced privileges is shown in Figure 1 (left) with the re-
sults of running a query shown in Figure 1 (right). This
result data set shows the number of individuals who have
committed suicide across Scotland who have been at least
once to a hospital due to mental health related reasons. This
result data set also includes geospatial information. This
includes partial postcodes and/or output areas where the
individual lived or was treated at that time.
Figure 2: Geospatial Visualisation of Suicide and Mental Health
Data Sets Across Scotland based on representative SMR data
To demonstrate the functionality of the SPAMP-GP ACP
portlet, we specifically defined a data service that had its
own local access and usage policy. Thus in reality, it is
highly unlikely that any given data provider will simply
delegate access to their data sets to a portal hosted at NeSC
in Glasgow. To address this, a local policy was defined and
enforced on access to the SMR99 data sets. In particular this
service would only allow access to the SMR99 data vari-
ables for users with associated digitally signed and recog-
nised attribute certificates. Users would thus use the ACP
Figure 1: Census & SMR Data Portlet Query Interfaces (left) and
Returned/Joined Results (right) portlet to create an X509 attribute certificate based upon the
information that was delivered to the portal through Shibbo-
Key to this approach is in understanding the actual data leth, i.e. the role that they need to use to access a particular
sets themselves. That is, knowing what data sets in which remote data set (in this case DAMES_SMR04_adv) would
data resources are related. At present the clinical data sets need to be digitally signed and stored in an attribute author-
have a unique Community Health Index (CHI) number as- ity that the remote data provider trusts. In this case, the at-
sociated with them. This unique variable was rolled out for tribute authority is an LDAP server associated with the
all individuals in receipt of health care across Scotland mid- DAMES portal port.
2006. However, other data sets, e.g. the Census data, does To support this process, the back end of the portal hosts a
not including the CHI. Nevertheless there are other data Grid credential repository – a MyProxy server. This is com-
fields that are common, e.g. geospatial output area statistics. pletely transparent to the user however. Thus the username
Building upon this information, DAMES has developed and password to activate the credential in the MyProxy
further portlets that support overlaying this health related server is sent through as information as part of the Shibbo-
data across geospatial boundaries. To support this, the pro- leth SAML assertion. This credential repository is used at
ject has acquired shapemap files from EDINA the individual invocation of a remote service. Thus the ser-
(www.edina.ac.uk) which can subsequently be used for ren- vice needs to identify the individual user attempting to ac-
dering and overlaying a variety of data across a given geo- cess the resource (authentication). Based on this they then
spatial boundary. These shapefiles cover health authorities need to ensure that the user has the appropriate privileges to
in Scotland and Scotland as a whole. The results of overlay- run that particular query, i.e. pull the digitally signed attrib-
ing the above data sets across Scotland are shown in Figure ute certificate from the attribute authority that there local
3. We note that access to these shapefiles themselves is re- policy recognises.
stricted as they are under license to EDINA bu the commer- More information on the DAMES work is described in
cial organization Ordnance Survey. Thus access is only for [6-7]. The project is currently applying for ethical permis-
registered UK academics who have agreed to the terms and sions to access and use actual SMR data sets to realise the
conditions to the license. scenarios outlined previously.
5
R.O. Sinnott et al.
4.2 EuroDSD
Disorders of sex development (DSD) are a set of disorders
affecting the genito-urinary tract and, in a lot of instances,
the endocrine-reproductive system. The physical representa-
tions of DSD are manifest in different ways. In the case of
newborns this can be through children born with ambiguous
genitalia; in the case of teenagers this can be in their lack of
entry in to puberty. The condition is extremely rare and can
be a fraught and potentially stigmatizing condition for the
patients and families involved [8].
At the heart of much DSD research is the androgen recep-
tor which is involved in controlling gene expression for
genes associated with DSD and in determining the sex phe-
notype. In particular much DSD research is focused around
androgen receptor co-factors that modulate sex steroid ac-
tion [9].
Given the rarity of the condition, there is an associated Figure 3: Core data schema based on the SGAN template
scarcity of data on DSD. The EU funded EuroDSD project
(www.eurodsd.eu) has been established to help improve the This information covers patient specific data including
understanding and provision of therapy and potential treat- the referring clinician; the year of birth of the patient and
ments or interventions associated with DSD, and increase specific clinical information and associated presentation
knowledge of the genetic and biochemical profiles which information. For example, whether the karyotype has been
characterize DSD. Until now, European research and the established and if there are any further complications with
data sets associated with DSD could best be categorized as DSD – often DSD has other physiological presentations, e.g.
being fragmented. There was no possibility to co-ordinate related to facial anomalies for example. The user interface
multiple, complementary DSD research areas in order to to upload patient specific information into the registry is
make a significant step forward in the understanding of shown in Figure 4.
DSD. Rather different centres and countries had their own
fragmented support networks for patients and their families.
The intention of EuroDSD is to bridge the gap between a
variety of disciplines including clinical medicine, biochem-
istry, molecular genetics and molecular biology, through
cross-national integration of resources and expertise. Cur-
rently, the EuroDSD project involves six European coun-
tries including the UK, France, Netherlands, Germany, Italy
and Sweden.
The central component of the EuroDSD project is the
development of a Virtual Research Environment (VRE),
designed to facilitate secure, flexible collaboration using
state of the art technologies. This technical work is led by
the National e-Science Centre (NeSC – www.nesc.ac.uk) in
Glasgow.
The EuroDSD VRE currently comprises a portal which
hosts a registry which allows for secure upload and search-
Figure 4: Registry upload portlet making use of the Struts2 framework
ing of clinical case information. This information is based
on a common data model agreed through the European So-
We note that the system and data models have been de-
ciety for Paediatric Endocrinology (ESPE), and in particular
veloped to be non-patient identifying. This has been a key
the work undertaken through the Scottish Genital Anomaly
part in the work as a whole. The only identifying informa-
Network (SGAN - http://www.sgan.nhsscotland.com/). An
tion that exists in the registry is the contact details for the
example of this information is shown in Figure 3.
clinician. An optional local patient identifier can also be
included, but once again the linkage of this with any given
individual cannot be achieved without detailed local infor-
mation on the patient itself. Indeed numerous other aspects
6
Security-oriented Portals for the Life Sciences
have been considered with regard to patient security. For the VRE runs through a Shibboleth-based authentication
example, the data of birth has been refined to year of birth process. One issue we have faced with this, is that users at
only and exact birth weights have been refined to approxi- institutions that are not involved in a Access Management
mate weights in kilograms. All of these considerations were Federation have been offered a virtual home at NeSC-
documented and put before ethics committees so that a deci- Glasgow. This model works and has up to now been found
sion on data collection could occur. This agreement has now to scale reasonably well for local administrators. However
happened and the data is being actively collected. At the when the VRE is used by 100-1000’s of DSD researchers
time of writing the registry contains patient data on 267 then this modus operandi may need to be revisited.
cases from numerous centres across Europe. There are cur- Based on consortium security requirements, each user has
rently 40 registered users of the VRE from multiple sites. It been assigned with a set of EuroDSD-specific roles, which
should also be noted that non-EuroDSD partners are also are subsequently used for fine-grained access control. These
now requesting access to the VRE to add their own clinical include EuroDSD_investigator; EuroDSD_contributor and
data. This includes researchers from countries outwith the EuroDSD_researcher. These roles are subsequently used to
European Union. define and enforce the privileges of the user inside of the
Each patient that is entered into the system has a unique portal. A user with the EuroDSD_contributor role has the
registry identifier created and clinicians are expected to privilege to upload, edit, and withdraw cases in the registry,
keep a local record of this in case of future follow up from whilst users assigned the EuroDSD_researcher role can
collaborators, i.e. it is this information that can be used for only search patient data. EuroDSD_investigators are able to
record linkage. add, edit, delete and search data sets. In addition to these
The portal itself has been implemented using the Apache roles, the project also supports the scope of the search. Thus
Struts2 (http://struts.apache.org/2.x/) framework; the Spring some sites are only able to add data for local centre users;
Framework (http://www.springsource.org/) and the Hiber- others only able to add data for national data sharing; others
nate framework (http://www.hibernate.org/). For maximum for sharing across all of EuroDSD and others for all partners
flexibility, the portal has been based upon a multi-tiered in EuroDSD and potentially with other international col-
architecture. The upper layer is the user/presentation layer laborators. The scope of data sharing is defined at data entry
and provides multiple ways to display information, e.g. time by the contributors as part of the consent process, i.e.
HTML, JSP, AJAX, etc. Other layers are used to control confirming that they have consent to add the patient data
targeted workflows and to process EuroDSD specific logic. and the level of data sharing that the consent allows.
Finally the lower layer is used to give access to the DSD Every operation in the VRE (search, upload, edit, delete)
data sets themselves and support for persistence itself. is secure and implemented according to roles assigned to
In this system architecture, Struts2 acts as the controller individuals. HTTPS is used to encrypt communications be-
of the VRE (aligned with the Model View Controller para- tween user browsers and the VRE, and between the VRE to
digm) and provides presentation of the data. The Spring data sources. End users are completely oblivious of the un-
framework provides a wide range of services for EuroDSD derlying security used to enforce access and use of clinical
business and workflow logics. Finally Hibernate, at the data made available in the VRE.
lower layer, provides data persistence services to the VRE. The VRE itself has also been developed with data valida-
This solution significantly improved the overall flexibil- tion and usability aspects incorporated. Where possible,
ity, testability, and maintainability of the VRE. It was also selection boxes are used instead of text fields. This simpli-
compatible with most existing frameworks and standards, fies data validation and ensures adherence with agreed data
including JSR-168 portlets. Based on this n-tier architecture, formats (aligned with the ESPE core data model) and avoid-
business logic is properly stripped from the presentation and ing data entry confusion. JavaScript is used in the user inter-
underlying data layers thus providing better maintainability face to provide data validation and automatic calculations
and enhancements. This model also allows further flexibility when possible, e.g. calculating an External Masculinisation
for integration with further resources and tools. Score based on user selections of five clinical fields. Invalid
To improve system availability, dynamic VRE configura- data and actions are shown immediately when moving to
tion is support. Thus it is normally the case that changes of next data entry fields or on request submission. AJAX tech-
configuration will typically require the restart of the Web nology is also used to provide coherent flow of information
server containers hosting the services, and hence stopping without the need for page refreshing to interrupt user ex-
the services themselves. However, EuroDSD VRE configu- perience.
ration is dynamic and can be loaded into the VRE through To support discussion and feedback between the scientists
scripts and property files. involved in DSD, the VRE hosts a wiki. This wiki is directly
In terms of portal and VRE security, the Shibboleth sys- linked with the registry itself. Through the wiki, scientists
tem is fully integrated into the VRE and everything inside of are able to directly comment on specific cases in the registry
7
R.O. Sinnott et al.
and discuss aspects of the cases themselves or the treatment we are also exploring alternative data access and usage
of the individuals for example. models. This includes the VANGUARD system [10] that
The work on the EuroDSD VRE is still on-going. The has been demonstrated as a proof of concept system which
current work is focused upon genetic analysis modules. will be refined and hardened as part of the Wellcome Trust
These provide support for understanding the genetic varia- funded Scottish Health Informatics Platform for Research
tions of children with DSD. This includes the genes that (SHIP) project.
have been screened for; the mutations/anomalies in genes We also note that the work described here is also being
that have been found; whether further analysis of these applied in numerous other research domains. This includes
genes has been undertaken (including whether results have the EPSRC funded nanoCMOS project
been or are being published), as well as new and upcoming (www.nanocmos.ac.uk) and the recently funded ENROLLER
screening and analysis methods, e.g. based upon mass spec- project (www.enroller.org.uk) and the NeISS project
troscopy. The genetics analysis module also typifies the (www.neiss.org.uk).
structuring and standardization the EWuroDSD is attempt-
ing to achieve. The field is characterized by multiple genes ACKNOWLEDGEMENTS
with synonyms. To address this, the genetics analysis mod- This work has been funded from a variety of sources including the
ule has implemented a core list of widely accepted genes UK Engineering and Physical Sciences Research Council
that the contributors can select. Thus textual editing has (EPSRC), the Economic and Social Sciences Research Council
been avoided to avoid typographical mistakes as well as (ESRC), the European Union, the Joint Information Systems Com-
errors caused by upper/lower case genes representing differ- mittee (JISC), the Medical Research Council (MRC) and the Well-
ent genes. come Trust. We gratefully acknowledge their support.
Further modules are also being explored. These include
REFERENCES
modules specific to the surgical treatments of patients with
[1] R. O. Sinnott, M. M. Bayer, J. Koetsier, A. J. Stell, Grid Infrastruc-
DSD. The classification of surgical treatments and the out- tures for Secure Access to and Use of Bioinformatics Data: Experi-
comes of these treatments are subject to international varia- ences from the BRIDGES Project, 1st International Conference on
tion between partners. Availability, Reliability and Security, (ARES’06), Vienna, Austria,
For centres with a catalogue of DSD cases, the effort to April, 2006.
[2] R.O. Sinnott, O. Ajayi, J. Jiang, A. J. Stell, J. Watt, User-oriented
manually enter these cases is off-putting. To address this we Security Supporting Inter-disciplinary Life Science Research across
are supporting systems that allow bulk upload of patient the Grid, New Generation Computing, Special Edition on Life Sci-
data. The common technology that is used by most sites is ence Grids, editors A. Konagaya, P. Arzberger, T. W. Tan, R. Sinnott,
Excel. Services have been implemented that allow Excel D. Angulo, pp 339-354, Vol. 25 No. 4, 2007.
[3] R.O. Sinnott, Grid Security, National Centre for e-Social Science
spreadsheets developed according to the ESPE core data book, Grid Computing: Technology, Service and Application, CRC
model to be automatically incorporated into the registry. Press, May 2008.
Finally we note that some centres have their own local [4] R.O. Sinnott, O. Ajayi, A.J. Stell, J. Watt, J. Jiang, User Oriented
databases and resources. We are in the process of make Access to Secure Biomedical Resources through the Grid, Life Sci-
ence Grid Conference, Yokohama, Japan, October 2006.
these available through the EuroDSD VRE. These include [5] J. Watt, R.O. Sinnott, J. Jiang, T. Doherty, C. Higgins, M. Koutroum-
patient steroid and metabolomic databases for example. pas, Tool Support for Security-oriented Virtual Research Collabora-
tions, IEEE International Workshop on Security in e-Science and e-
5 CONCLUSIONS AND FUTURE WORK Research (ISSR-09), Chengdu, China, August 2009.
[6] C. Higgins, R.O. Sinnott, T. Doherty, M. Koutroumpas, J. Watt, A.C.
The work described in this paper has outlined how security- Hume, A.G.D.Turner, Spatial Data e-Infrastructure, Proceedings of
oriented portals can be supported that are aligned with the International Conference on e-Social Science, Cologne, Germany,
requirements of a wide range of life science researchers and June 2009.
[7] R.O. Sinnott, T. Doherty, C. Higgins, P. Lambert, S. McCafferty, A.
communities. From experience of numerous projects in
Stell, K. J. Turner, J.P. Watt, Supporting Security-oriented, Inter-
NeSC we are acutely aware that usability of e-Infrastructure disciplinary Research: Crossing the Social, Clinical and Geospatial
has to be factored in from the outset. In this regard, Shibbo- Domains, in Proceedings of International Conference on e-Social Sci-
leth-based access, security-oriented portal configuration and ence, Cologne, Germany, June 2009.
[8] O. Hiort, P.M. Holterhus, U. Thyen, The basis of gender assignment
support for federated access control at remote, autonomous
in disorders of somatosexual differentiation. Horm Res 64 (Suppl.2),
data providers have been found to offer a model that is 2005, 18-22.
closely aligned with many researcher user access require- [9] J.H. Bebermeier, J. Brooks S. DePrimo, R. Werner, U. Deppe, J.
ments. However we also recognize that these models are Demeter, O. Hiort, P.M. Holterhus. Cellline and tissue specific signa-
tures of androgen receptor coregulator transcription. J Mol Med,
often non-trivial for data providers such as the NHS. They
2006, 84: 919-31.
are for example, extremely reluctant to open up their fire- [10] A. Stell, R.O. Sinnott, O. Ajayi, J. Jiang, Designing Privacy for a
walls, irrespective of whether or not a service exists at their Scalable Electronic Healthcare Linkage System, IEEE International
side that has associated authorisation policies. Given this, Conference on Information Privacy, Security, Risk and Trust
(PASSAT 2009), Vancouver, Canada, August 2009.
8