Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


  EXPERIENCE IN ORGANIZING FLEXIBLE ACCESS TO
 REMOTE COMPUTING RESOURCES FROM JUPYTERLAB
  ENVIRONMENT USING TECHNOLOGIES OF EVEREST
            AND TEMPLET PROJECTS
                       S. Vostokin1,a, S. Popov1, O. Sukhoroslov2,3
                               1
                                   Samara National Research University
    2
        Institute for Information Transmission Problems of the Russian Academy of Sciences
                                            3
                                                HSE University

                                         E-mail: a easts@mail.ru


The paper describes the experience of building distributed web applications based on the interactive
computing technologies of the Jupyter project. The new architecture of such applications is proposed,
considering the possibility of deploying a Jupyter notebook server separately from computing
resources, and the possibility to interact with several computing resources simultaneously. These
features are implemented using the Everest platform for resource integration and the Templet SDK for
accessing the platform from Jupyter notebooks. Two examples of computing and data processing
applications built on this architecture are discussed. The proposed solutions are designed to automate
resource-intensive computing activities in scientific and research projects.

Keywords: interactive computing, Project Jupyter, many-task application, distributed
computing


                                                       Sergei Vostokin, Stefan Popov, Oleg Sukhoroslov


                                                             Copyright © 2021 for this paper by its authors.
                    Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


                                                    558
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


1. Introduction
         New tools for automating workflows in the fields of data science, scientific computing, and
machine learning are under active development. One of the significant advances is the interactive web-
based development environment JupyterLab. JupyterLab allows one to quickly create a convenient
multi-window web interface for a distributed application that runs from a browser and does not require
local installation. However, the following problem exists: scientific computing applications require not
only a rich user interface but flexible access to a wide range of computing resources. The standard
solution of the problem – deploying JupyterLab where the computation is done – doesn't work in the
two important cases: (a) there is no technical feasibility of such deployment; (b) a distributed
application needs to work with several resources at the same time. In the article, we present a solution
that covers these use cases, an alternative to commercial cloud solutions such as Google Colab,
Yandex DataSphere, JetBrains Datalore, which are also based on Project Jupyter [1].


2. Method for building the distributed application
        In the solution, we use simple, affordable, but resource-limited JupyterLab deployment
options. The first one is the public cloud deployment based on MyBynder.org public service. The
second one is the virtual machine deployment in a private cloud powered by The Littlest JupyterHub.
In both variants of deployment, the JupyterLab server implements the interface (in the form of a
Jupyter notebook) and starts the orchestrator.
        We implement an orchestrator for managing tasks in many-task applications using the Templet
SDK [2]. This software development kit is a research project of Samara National Research University.
The orchestrator implements a variant of the actor model designed to manage tasks in many-task
applications.
        The orchestrator accesses the Everest platform through the REST protocol to execute tasks.
The Everest platform [3,4] was developed by the Institute for Information Transmission Problems of
the Russian Academy of Sciences to manage the execution of tasks on remote computing resources.
Everest server allows users to attach their resources and define resource access policy; distributes
application and launches tasks across resources; and returns the results of tasks to the orchestrator,
which generates the following tasks in accordance with the calculation logic.


3. Components of the distributed application
        A more detailed view of the solution is shown on Fig. 1. The figure shows the components of
a distributed application and their deployment. The application has three parts. The first part is
deployed on JupyterLab in the docker container by using MyBynder.org (or directly on The Littlest
JupyterHub server). This part contains a Jupyter notebook to define application workflow; an
orchestrator to dynamically form DAG of tasks; and the Templet runtime library that uses libcurl to
communicate with the Everest platform. The second part is the Everest platform server. On the
platform server, we have created a special Everest application and a code to be run on resources. The
Everest application's main function is to check the validity of the REST request and to map the request
to the command line. The third part is the Everest resource agent which is installed and running on
each resource used in the computation. The resource agent deploys code from the Everest application
onto the resource and invokes it using the command line.


                                                   559
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


                                        Figure 1. Application architecture
        The Fig.2 shows the architecture of the application in terms of its deployment. The
deployment starts with the registration of computing resources of the application on the Everest
platform and obtaining access tokens for agent programs through the Everest web interface (step 1).
        The next (step 2) is the installation of application components. This installation is performed
through the web interface of the Everest platform.
        After running and setting Windows 7 virtual machines in the corporate cloud of Samara
University is performed (step 3). It includes installing agent programs on them using the access tokens
and verifying the activity of agent programs through the web interface on the Everest platform.
        Then, at step 4, if necessary, the user uploads data to a file server in the corporate cloud of
Samara University. This can be performed through one of the virtual machines.
        At step 5 the user launches the application orchestrator from the GitHub code repository via
the web interface. This action automatically activates the Binder service (step 6) to build a docker
container with the application orchestrator running in the JupyterLab environment.
        Finally, the Binder deploys the docker container in the Google Cloud and returns the link to
the web interface of the orchestrator to the web terminal of the application user (step 7). After that, the
user launches the orchestrator via the web interface and starts processing (step 8).
        During the processing, the application orchestrator sends commands to launch the next tasks to
the Everest platform server and polls the status of previously launched tasks (step 9). At the same
time, the Everest platform server distributes tasks for execution to free virtual machines through
resource agent programs (step 10).


                                        Figure 2. Deployment procedure


                                                   560
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and
                           Education" (GRID'2021), Dubna, Russia, July 5-9, 2021


4. Sample applications: compute-intensive and data-intensive cases
        We have developed two examples to demonstrate the practical use of the described distributed
application architecture.
        The first example relates to the area of heterogeneous compute-intensive applications [5]. The
application is used to study dynamical systems based on the calculation of Lyapunov exponents. The
practical purpose of the application is to find the parameters of a dynamical system at which chaotic
behavior occurs. The purposes of using our architecture in this example are: to hide the heterogeneous
nature of the components of the application written in C ++ and the Maple system language from the
end-user; to enable all corporate licenses of Samara University at the same time to parallelize the
parametric scanning process; to flexibly customize the scanning process through the JupyterLab web
interface. The application implements the "bag of tasks" algorithmic skeleton.
        The second example relates to the area of data-intensive applications [6]. The application is
used to build a frequency dictionary for the Twitter microblogs. The practical purpose of such an
application is to track the dynamics of the vocabulary to learn the focus of public attention in the
subject area of interest. The purposes of using our architecture in this example are: to show the
possibility of non-dedicated computing resources utilization in a data processing task; to implement
processing based on a complex graph of task dependencies, generated programmatically using the
Templet SDK. The application implements the algorithmic skeleton called "asynchronous round-robin
tournament" [7].

5. Conclusion
        We have implemented a distributed application architecture that allows one to work via the
JupyterLab web interface without local installation, deploy JupyterLab separately from computing
resources, and run complex workflow scenarios involving parallel computing on multiple resources.
        As a potential future optimization, to minimize dependency on the JupyterLab deployment
method, we plan to implement the JupyterLab session as an Everest job that can be launched via
special Everest application.

References
[1] Project Jupyter. Available at: https://jupyter.org. (accessed 14.09.2021)
[2] The Templet Project. Available at: https://github.com/the-templet-project. (accessed 14.09.2021)
[3] The Everest Project. Available at: http://everest.distcomp.org. (accessed 14.09.2021)
[4] Sukhoroslov, O. Volkov, S. Afanasiev, A. A Web-Based Platform for Publication and Distributed
Execution of Computing Applications // 14th International Symposium on Parallel and Distributed
Computing (ISPDC). IEEE, 2015, pp. 175-184.
[5] Popov, S.N. Vostokin, S.V. Doroshin, A.V. Dynamical systems analysis using many task
interactive cloud computing // Journal of Physics: Conference Series, 2020, vol. 1694, issue 1.
[6] Vostokin, S.V. Bobyleva, I.V. Implementation of frequency analysis of twitter microblogging in a
hybrid cloud based on the Binder, Everest platform and the Samara University virtual desktop service
// CEUR Workshop Proceedings, 2020, vol. 2667, pp. 162-165.
[7] Vostokin, S.V. Bobyleva, I.V. Asynchronous round-robin tournament algorithms for many-task
data processing applications // International Journal of Open Information Technologies, ISSN: 2307-
8162, vol. 8, no. 4, 2020.


                                                   561