10​th​ International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 Towards a Science Gateway Reference Architecture Marlon E. Pierce Mark A. Miller Emre H. Brookes Mona Wong Science Gateways Research San Diego Supercomputer University of Texas Health San Diego Supercomputer Center Center Science Center San Antonio Center Indiana University San Diego, California USA San Antonio, Texas USA San Diego, California USA Bloomington, Indiana USA Enis Afgan Yan Liu Sandra Gesing Maytal Dahan Johns Hopkins University University of Illinois Center for Research Texas Advanced Computing Baltimore, Maryland USA Urbana-Champaign, Illinois Computing Center USA University of Notre Dame Austin, Texas USA Notre Dame, IN, USA Suresh Marru Tony Walker Science Gateways Research Research Technologies Center Indiana University Indiana University Bloomington, Indiana USA Bloomington, Indiana USA Abstract​—Science gateways have been developed over the last submission systems, data catalogs, and other components into a twenty years and have grown into a large community of practice, as unified working environment. Gateways are described in a evidenced by international workshops and conferences. Because of the diversity of approaches to creating science gateways and the significant body of literature [1-6]. There are many highly always changing landscape of technologies, the community lacks a successful science gateways that support thousands of scientific common definition for the term “science gateway” itself and users [7-12, 24]. An increasing number of gateway frameworks common terminology for describing the common components of a and platforms [13-17] exist to help create new gateways. Efforts gateway architecture. Instead, a wide range of definitions and understandings exist and are used in different communities; this is like the XSEDE Gateway Cookbook [19, 20] have taken a evident, for example, in discussions whether science gateways are cookbook approach to summarize the architecture and the same as virtual research environments. This paper attempts to motivations of these getaways and provide an overview but did address these issues by focusing on how science gateways support scientific research and considering the consequences on not normalize these disconnected recipes into a cohesive cyberinfrastructure. gateway definition. Keywords​—science gateways, cyberinfrastructure We believe it is useful to step back from these operational descriptions of gateways to examine the reasons why gateways I. INTRODUCTION exist and how they may continue to thrive in the future, Science gateways are commonly described as user-centric regardless of evolving technologies. Identifying these central environments that enable broader and deeper use of advanced propositions may lead to a stronger definition of science computing resources, storage, data collections, and scientific gateways, clarify terminology used by gateway practitioners, and applications. Gateways include graphical user interfaces clarify the relationships of science gateways to other types of (frequently Web browser-based), application programming cyberinfrastructure and distributed systems. We hope this will be interfaces (APIs), and middleware that provide access to useful to the community and also to those outside the field, software and data. Many modern gateways integrate diverse including those interested in joining the field, operators of technologies, pulling together databases, messaging systems, advanced computing and cloud resources, developers of content management systems, identity management systems, job 1 10​th​ International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 non-gateway middleware such as workflow systems, and and solicit initial feedback. It may also mean the broader decision makers at universities and government agencies who dissemination of findings through non-traditional means that are recognize the need to provide science gateway capabilities to associated with research altmetrics [18]. support their researchers but who are not yet familiar with the community. Additionally, we hope this paper will be useful to Many science gateways measure their success by the scientific researchers themselves since science gateways are scientific publications that they support; nanoHUB, for example, created for them to help make their research processes more calculates its own h-index (​https://nanohub.org/citations​). We efficient across system, organizational, and national boundaries. believe it is useful to examine more directly how science gateways can broadly support the scientific publication process II. SCIENCE GATEWAYS AND SCIENTIFIC RESEARCH and how this should be reflected in a reference architecture. At their core, science gateways are created to support We see two important roles for science gateways on the scientific research, either directly or indirectly through education publication process: they can support research as it is currently and dissemination. The exact nature of this support for research practiced, including training in practices through education, and depends on the specific gateway, but the general concept is they can help the research process expand beyond some of the useful as a starting point for describing a set of characteristic limits imposed by conventional journal publication practices. features of science gateways. Gateways can provide both restricted and public access to all Scientific research consists of a) exploration of the current experiments and analysis, supporting “actionable” or “living” state of a research field, b) formulation of testable hypotheses or publications [21-23], support replication of results, and generally research questions, c) design and execution of experiments, d) promote wider, interactive discussion of new findings among management and sharing of data and metadata about researchers. experiments, e) analysis of experiments and development of conclusions, and f) communication of the conclusions and supporting methods through a broadening circle of colleagues, culminating in broadly available formal publications that are accessible to the community and reproducible, hopefully leading to the start of a new cycle of research. Strong or at least promising findings form the basis of future investigations. Important results of scientific research that are not typically included in the traditional sequential description include dead ends, ambiguous or opposing results, and accidental discoveries that happen during a research effort that alter its course. One may think of research as a mapping exercise of new terrain, with the unfortunate limitation that only the routes to specific destinations reach formal publication. We now wish to make a specific connection between Fig. 1.​ A symbolic representation of the underpinnings science gateways scientific research and the capabilities offered by science provide in supporting individual steps of a research activity and collection of gateways. Publication, in its broadest sense, is the keystone results for science to deliver a more accessible environment for the researchers. activity of scientific research, in which conclusions (including We will now work through some of the implications of this negative ones) enter the community for larger inspection, debate, assumption on key features that science gateways may choose to verification, and extension; see Fig 1. A science gateway is a implement. We note that some gateways may seek to encapsulate software implementation that creates a specific set of the entire publication process in their implementations, but we capabilities, based in large part on access to externally managed do not view this as the end goal for all gateways. A gateway resources, to support the creation, sharing, publication, and broad may, for example, share data within a community or publicly, distribution of scientific research results. “Publication” may and thus support the “publication” of data sets that are easily mean the traditional processes used by peer-reviewed journals referenceable. A gateway may also provide a and the custom of pre-publication used in many scientific fields software-as-a-service reference implementation as an essential to circulate results quickly, stake a claim to a particular finding, 2 10​th​ International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 component of a publication, assisting both reviewers and readers. In practice, full persistence is frequently limited because of large file storage limitations and other practical considerations. Recognition of Users: ​Gateways provide authentication, However, it is still possible to retain the metadata of a session, authorization, and identity management. This is conventionally and the metadata can be used to replicate the experiment. We presented as necessary to protect valuable backend resources like note the relationship here to the concept of provenance [25, 26], supercomputers. By thinking instead of science gateways as although we wish to avoid specific implementation researcher-centric cyberinfrastructure, we see an expanded considerations. purpose for identity management. Recognition of the user is a prerequisite for the capabilities that follow, all of which support Sharing and Publication of User Interactions: ​Sessions are the research process. This recognition may be simple a core implementation concept for many science gateways. A authentication or it may incorporate the notion of user roles that specific scientific publication may be supported by many distinguishes different categories of users and the functions and different sessions, perhaps from many different researchers. information each role is allowed to access. Additionally, Sessions and their constituent elements should therefore be gateways may allow project-based user groups to support sharable. This may be done in widening circles: graduate relationships among users. students may review a set of computational experiments with their advisors and colleagues before depositing preprints in Integration of Services: ​Gateways act as agents that public archives and sending papers to journals for peer review. integrate scientific and other services for their users. These Gateways should therefore provide sharing mechanisms for services may be implemented by the gateway itself (such as results that map to these access levels. The publication may access to supercomputers), or they may be provided by external directly or in supplementary material reference the supporting entities, including other gateways. For example, a gateway may sessions in the science gateways, exposing them to the help a user search and retrieve a data set from an external catalog community at large. A research paper itself may or may not be service provider and then take action on it, using it as input to a written using tools provided by the science gateway (such as a simulation. electronic notebook [10]), but the gateway should provide a way for the results that are used in the paper (and its drafting process) Organization of User Interactions: ​Gateways help scientific to be accessed and reviewed by other researchers. users search and explore data sets and conduct computational experiments. The latter include both input and output data and In summary, the publication process, in the general sense of metadata that others may explore. It is thus useful to organize communicating scientific results, interpretations, and evidence in these interactions into “sessions”. Sessions capture the state of a a convincing manner to a professionally skeptical audience, is user’s interactions with the system. Sessions may be organized central to scientific research. Science gateways can help with the into hierarchies or other structures, and they can vary in their publication process by supporting the management and level of prescriptive detail: a session could be implemented as a organization of experimental results, the review process, and set of arbitrary or predefined key-value pairs or structures. replicability. They may further help expand the publication Sessions may also be annotatable; that is, the researcher may process beyond its conventional limitations by supporting the annotate a session to explain the purpose, insights, etc of the dissemination of unpublished results. The latter may include session. These annotations may be part of the session (a experiments that are deemed failures or preliminary results that comment field), or they may be external. The latter option are later discarded after the researcher focused on a different requires the session and perhaps its elements to have pointers; aspect of the problem. this is simply the hypertext concept, in which the annotation hypertext uses URLs to refer to supporting or related digital III. CONSIDERATIONS FOR A SCIENCE GATEWAY REFERENCE entities. ARCHITECTURE The conceptual features defining a science gateway need to Persistence of User Interactions: ​Science gateways allow be embodied into a framework implementation and delivered as researchers to recover previous sessions after initial interaction. a functional service to its users. The implementation of these Gateways support this feature to provide reproducibility or features largely depends on the science domain a gateway serves repeatability, help users organize their results, and avoid so that it can accommodate the research workflow of its domain unnecessary repetition. Persistent sessions allow users to check community. In this section, we look at a cross-section of such their work and assist gateway operators in diagnosing user implementations to extract a consensus set of gateway attributes reported issues. Persistence is also needed to support annotation. 3 10​th​ International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 that help define general elements needed by a science gateway. ● Community is a pivotal element of a gateway that These attributes are often interconnected and depend on each drives its success. The gateway needs to facilitate other to offer the comprehensive user experience science collaboration of its members as well as provide means gateways target. However, a science gateway does not need to of offering support. With time, if successful at forming possess all the attributes, particularly not at the outset, and they a devoted community, the gateway can also benefit can instead be built or cultivated as the complexity and size of from community contributions, which help fuel future the community a gateway serves grows (see Fig. 2). direction and sustainability. ● Services ​are the internal components a gateway requires to operate. For example, they handle authentication and authorization or manage retrieval, caching, and persistence of datasets in cases of federated storage configurations. Services also include session management and support for sharing. Overall, services represent the “glue” that make up the gateway framework and allow higher-level features to be exposed to the researchers. ● Workspace ​represents the main interface to the gateway, allowing the researcher to interact with the gateway’s services. The workspace focuses on Fig. 2. Interconnected attributes of a science gateway. improving accessibility of the tools and services exposed by the gateway in a way most suitable to the IV. EVALUATION specific domain and different roles of researchers. To illustrate the concepts in the previous sections, we apply Ideally, these role-specific views offered by the them to the very well-known science gateway, nanoHUB. dashboard are customizable and take into consideration nanoHUB is a science and engineering gateway for enabling and the researcher’s vocabulary and workflow preferences. broadening nanotechnology research and education. As a By offering an easy-to-use interface the workspace well-established gateway with over one million visitors using lowers the barrier of entry by abstracting complicated hundreds of simulation tools and running millions of simulation tasks into easy to understand interfaces. jobs a year, the architecture of nanoHUB has been evolving for ● Integration is characterized by the ability of a gateway years to incorporate a rich set of community requirements and to connect multiple disparate elements (e.g., tools, gateway capabilities. However, the initial reference architecture resources) into a unified interface. This interface can be concepts proposed in this paper can be well applied to capture utilized internally by the workspace and services as well major components in this gateway. as by other gateways or applications through its API. The integration layer focuses on translating In nanoHUB community, a few users contribute content and gateway-specific representation of data, inputs, etc. to tools while the majority of the users rely on nanoHUB content the format required by the actual tool or resources for research and education purposes. User-oriented design and performing the requested action while exposing a development principles determine the advancement of well-defined, documented, and consistent interface to nanoHUB. The gateway provides a comprehensive framework to its users. monitor user activities, cite and publish contents, and make ● Infrastructure is typically the workhorse of a gateway. content reproducible through cached jobs and Jupyter notebooks. While the workspace exposes the available functionality Gateway development takes the “adopt-and-adapt” strategy in an accessible format, the gateway must interface to a based on user requirements. Broad community usage is a major specific and often highly specialized set of compute and success metric of this gateway. storage resources by performing the necessary configurations, data transfers, authentication, etc. As the A user’s workspace in nanoHUB is customized based on user complexity or popularity of a gateway increases, the preference and usage history. As a strategy to promote user gateway may also need to accommodate scaling by engagement in content creation, the citation and publication implementing or more robust or diverse infrastructure. framework keeps track of content usage and publishes 4 10​th​ International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 DOI-indexed citations on nanoHUB, which is also indexed by to stakeholders ranging from users to funding agencies. In Web of Science (WOS) and Google Scholar. A recommendation particular, many science gateways measure success through system has been developed to learn user patterns in order to supported scientific publications, but gateways should consider better serve nanoHUB content. ways to make this support richer and more direct, such as through the use of persistent identifiers, stronger guarantees on The nanoHUB user environment is backed by an effective persistence, integration with publication mechanisms like integration process executed by a sizeable developer team. The FigShare, richer metadata, and more powerful sharing integration workload for nanoHUB is enormous, including mechanisms. technology integration, a well-engineered content and tool development process, cyberinfrastructure (computing venues) A criticism of this paper’s thesis is that its emphasis on integration, gateway operation, etc. As a result, HUBZero supporting scientific publication does not address all of the captures nanoHUB middleware components and is released as a potential uses of a gateway, such as the support for education. platform approach for supporting general gateway development We use the term “publication” in the broad sense of making and hosting for other science domains. scientific data, results, and conclusions available to an increasingly broad circle, and science education is a form of A major goal of nanoHUB infrastructure provisioning is to practice publication through problem solving. We thus believe provide sufficient computing power for nano simulations. the current discussion can accommodate education-centric Various computing venues are continuously incorporated for that gateways. purpose, including local clusters, high-performance (e.g., XSEDE), high-throughput (e.g., OSG), volunteer computing Another objection is that “recognition of users” excludes (BOINC), and cloud computing resources. many science gateways and gateway-like services that do not require authentication or have notions of sessions. Gateways nanoHUB simulation tools and contents are mainly accessed support scientific research through a reproducible sequence of by end users on nanoHUB portal. To accommodate future steps to produce a particular state (such as a particular simulation growth of nanoHUB, their gateway team is developing a result). Sessions represent this state, and some form of universal service-oriented architecture based on REST web identification allows the user to manage the session. Gateways service protocols. The service computing approach will scale that do not support identities and sessions explicitly can still nanoHUB content/tool access and make the gateway support scientific research, but the steps in the creation of the programmable. state must be communicated through some means outside the gateway, such as a written description. Using sessions associated V. DISCUSSION, CONCLUSIONS, AND FUTURE DIRECTIONS with identities is a more straightforward mechanism. It is the intention of this paper to consider some defining features of science gateways and promote greater discussion Another interesting variation may be a science gateway that within the community on what exactly a science gateway is and helps manage other cyberinfrastructure. A gateway may allow does. Instead of basing these on operational considerations, we users, for example, to dynamically create and manage resources, instead posit that science gateways are user-centric which are in turn used for scientific research. These gateways cyberinfrastructure that support scientific research by federating may not directly support scientific publications, but they may access to diverse remote resources. Focusing on support for a still implement some of the basic abstractions described here. broadly defined publication process, we identify the following key characteristics: recognition of users, integration of services Future work is to consider a comprehensive survey of the on behalf of the user, creation of sessions to enable the user to larger community to map the concepts of this paper to specific track work, persistent archiving of sessions, and sharing and gateways. The basic ideas described in this paper may also serve publication of sessions and their content to a broader community. as the basis of a reference architecture for gateways that can be developed using the mechanisms of The Open Group This definition may help science gateways to clarify their Architecture Framework [27]. mission to themselves and to stakeholders in science, engineering, and scholarship. This may also help determine areas REFERENCES that need further development within specific projects and across 1. Wilkins-Diehr, N., Gannon, D., Klimeck, G., Oster, S. and Pamidighantam, S., 2008. TeraGrid science gateways and their impact on science. the community. It may provide guidance to gateways on how to Computer​, ​41​(11). measure success, and how they want to present this measurement 5 10​th​ International Workshop on Science Gateways (IWSG 2018), 13-15 June 2018 2. Wilkins-Diehr, N., 2007. Special issue: science gateways—common plant science​, ​2​, p.34. community interfaces to grid resources. ​Concurrency and Computation: 18. Priem, J., Taraborelli, D., Groth, P. and Neylon, C., 2010. Altmetrics: A Practice and Experience​, ​19​(6), pp.743-749. manifesto. 3. Lawrence, K.A., Zentner, M., Wilkins-Diehr, N., Wernert, J.A., Pierce, M., 19. Marru, Suresh, Rion Dooley, Nancy Wilkins-Diehr, Marlon Pierce, Mark Marru, S. and Michael, S., 2015. Science gateways today and tomorrow: Miller, Sudhakar Pamidighantam, and Julie Wernert. "Authoring a Science positive perspectives of nearly 5000 members of the research community. Gateway Cookbook." In Cluster Computing (CLUSTER), 2013 IEEE Concurrency and Computation: Practice and Experience,​ ​27​(16), International Conference on, pp. 1-3. IEEE, 2013. pp.4252-4268. 20. https://www.xsede.org/web/gateways/gateways-cookbook 4. Gesing, S. and Wilkins-Diehr, N., 2015. Science gateway workshops 2014 21. Broggini, F., Dellinger, J., Fomel, S. and Liu, Y., 2017. Reproducible special issue conference publications. ​Concurrency and Computation: research: Geophysics papers of the future—Introduction. Practice and Experience​, ​27​(16), pp.4247-4251. 22. Brinckman, A., Chard, K., Gaffney, N., Hategan, M., Jones, M.B., 5. Wilkins-Diehr, N., Gesing, S. and Kiss, T., 2015. Science gateway Kowalik, K., Kulasekaran, S., Ludäscher, B., Mecum, B.D., Nabrzyski, J. workshops 2013 special issue conference publications. ​Concurrency and and Stodden, V., 2018. Computing environments for reproducibility: Computation: Practice and Experience​, ​27​(2), pp.253-257. Capturing the “Whole Tale”. ​Future Generation Computer Systems​. 6. Gesing, S., Wilkins-Diehr, N., Barker, M. and Pierantoni, G., 2016. Science 23. Bánáti, A., Kacsuk, P. and Kozlovszky, M., 2017. Reproducibility analysis Gateway Workshops 2015 Special Issue Conference Publications. ​Journal of scientific workflows. ​Acta Polytechnica Hungarica​, ​14​(2), pp.201-217. of Grid Computing​, ​14​(4), pp.495-498. 24. Quinn, P.J., Barnes, D.G., Csabai, I., Cui, C., Genova, F., Hanisch, B., 7. Afgan, E., Baker, D., Van den Beek, M., Blankenberg, D., Bouvier, D., Kembhavi, A., Kim, S.C., Lawrence, A., Malkov, O. and Ohishi, M., 2004, Čech, M., Chilton, J., Clements, D., Coraor, N., Eberhard, C. and Grüning, September. The International Virtual Observatory Alliance: recent technical B., 2016. The Galaxy platform for accessible, reproducible and developments and the road ahead. In ​Optimizing Scientific Return for collaborative biomedical analyses: 2016 update. ​Nucleic acids research​, Astronomy through Information Technologies (Vol. 5493, pp. 137-146). 44​(W1), pp.W3-W10. International Society for Optics and Photonics. 8. Blankenberg, D., Kuster, G.V., Coraor, N., Ananda, G., Lazarus, R., 25. Simmhan, Y.L., Plale, B. and Gannon, D., 2005. A survey of data Mangan, M., Nekrutenko, A. and Taylor, J., 2010. Galaxy: a web-based provenance in e-science. ​ACM Sigmod Record​, ​34​(3), pp.31-36. genome analysis tool for experimentalists. ​Current protocols in molecular 26. Moreau, L., Clifford, B., Freire, J., Futrelle, J., Gil, Y., Groth, P., biology​, pp.19-10. Kwasnikowska, N., Miles, S., Missier, P., Myers, J. and Plale, B., 2011. 9. Miller, M.A., Pfeiffer, W. and Schwartz, T., 2010, November. Creating the The open provenance model core specification (v1. 1). ​Future generation CIPRES Science Gateway for inference of large phylogenetic trees. In computer systems​, ​27​(6), pp.743-756. Gateway Computing Environments Workshop (GCE), 2010​ (pp. 1-8). Ieee. 27. Haren, V., 2011. ​TOGAF Version 9.1​. Van Haren Publishing. 10. Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B.E., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J.B., Grout, J., Corlay, S. and Ivanov, P., 2016, May. Jupyter Notebooks-a publishing format for reproducible computational workflows. In ​ELPUB​ (pp. 87-90). 11. Klimeck, Gerhard, Michael McLennan, Sean P. Brophy, George B. Adams III, and Mark S. Lundstrom. "nanohub. org: Advancing education and research in nanotechnology." Computing in Science & Engineering 10, no. 5 (2008): 17-23. 12. Nakandala, S., Pamidighantam, S., Yodage, S., Doshi, N., Abeysinghe, E., Kankanamalage, C.P., Marru, S. and Pierce, M., 2016, July. Anatomy of the SEAGrid Science Gateway. In Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale (p. 40). ACM. 13. Savelyev, A., Brookes, E., 2017, GenApp: Extensible tool for rapid generation of web and native GUI applications, ​Future Generation Computer Systems​, DOI: 10.1016/j.future.2017.09.069. 14. Pierce, M.E., Marru, S., Gunathilake, L., Wijeratne, D.K., Singh, R., Wimalasena, C., Ratnayaka, S. and Pamidighantam, S., 2015. Apache Airavata: design and directions of a science gateway framework. Concurrency and Computation: Practice and Experience,​ ​27​(16), pp.4282-4291. 15. Kacsuk, P., Farkas, Z., Kozlovszky, M., Hermann, G., Balasko, A., Karoczkai, K. and Marton, I., 2012. WS-PGRADE/gUSE generic DCI gateway framework for a large variety of user communities. ​Journal of Grid Computing​, ​10​(4), pp.601-630. 16. McLennan, M. and Kennell, R., 2010. HUBzero: a platform for dissemination and collaboration in computational science and engineering. Computing in Science & Engineering​, ​12​(2). 17. Goff, S.A., Vaughn, M., McKay, S., Lyons, E., Stapleton, A.E., Gessler, D., Matasci, N., Wang, L., Hanlon, M., Lenards, A. and Muir, A., 2011. The iPlant collaborative: cyberinfrastructure for plant biology. ​Frontiers in 6