=Paper= {{Paper |id=Vol-1800/paper2 |storemode=property |title=SLA-aware Interactive Workflow Assistant for HPC Parameter Sweeping Experiments |pdfUrl=https://ceur-ws.org/Vol-1800/paper2.pdf |volume=Vol-1800 |authors=Bruno Silva,Marco Netto,Renato Cunha |dblpUrl=https://dblp.org/rec/conf/sc/SilvaNC16 }} ==SLA-aware Interactive Workflow Assistant for HPC Parameter Sweeping Experiments== https://ceur-ws.org/Vol-1800/paper2.pdf
              SLA-aware Interactive Workflow Assistant for
                HPC Parameter Sweeping Experiments

                             Bruno Silva, Marco A. S. Netto, Renato L. F. Cunha
                                                       IBM Research




ABSTRACT                                                             existing work does not exploit human knowledge/feedback
A common workflow in science and engineering is to (i) setup         and DOE techniques to assist users on parameter selection
and deploy large experiments with tasks comprising an ap-            decisions under these submission-execution-analysis work-
plication and multiple parameter values; (ii) generate in-           flows with Service Level Agreement (SLA) constraints.
termediate results; (iii) analyze them; and (iv) reprioritize           Over the years, optimization methods have been created
the tasks. These steps are repeated until the desired goal           to solve complex problems in industry and academia. How-
is achieved, which can be the evaluation/simulation of com-          ever, fully automatic solutions face some obstacles to obtain
plex systems or model calibration. Due to time and cost              satisfactory results due to the following reasons [20]:
constraints, sweeping all possible parameter values of the              • It is hard to obtain/create an optimization model that
user application is not always feasible. Experimental Design              reflects all aspects of a real-world problem, especially
techniques can help users reorganize submission-execution-                when multi-criteria objectives are involved;
analysis workflows to bring a solution in a more timely man-
ner. This paper introduces a novel tool that leverages users’           • Even when the optimization model is adequate, the
feedback on analyzing intermediate results of parameter sweep-            time/cost to find an optimal solution may violate user’s
ing experiments to advise them about their strategies on                  time or cost restrictions;
parameter selections tied to their SLA constraints. We eval-
uated our tool with three applications of distinct domains              • The analyst expertise and creativity are hard to code.
and search space shapes. Our main finding is that users with              Then, for problems that involve complex and impor-
submission-execution-analysis workflows can benefit from their            tant decisions (e.g., financial trading decisions), even
interaction with intermediate results and adapt themselves                fully automatic solutions must be, in the end, validated
according to their domain expertise and SLA constraints.                  by humans to be adopted.
                                                                        Generally, it is prohibitive to run application instances
Keywords                                                             with all possible parameter values due to time and cost
interactive optimization workflow; parametric sweeping ap-           constraints—cost, in particular, becomes a key factor con-
plication; design of experiments; high performance comput-           sidering execution of these applications in outsourced envi-
ing; user expertise; SLAs                                            ronments (e.g., public clouds). Additionally, optimization
                                                                     techniques can be adopted to suggest parameter values in
                                                                     order to reduce the number of evaluated scenarios and conse-
                                                                     quently reduce the experiment costs. Therefore, SLA-aware
1.   INTRODUCTION                                                    prediction methods and mechanisms to suggest parameter
  Evaluation of test scenarios and model calibration are             values would help engineers and scientists to evaluate para-
common in several industries including finance, aerospace,           metric applications.
health, and energy. Normally users run applications that                In this work, interactive optimization methods are pro-
contain a set of parameters, each able to assume a broad             posed to take advantage of human expertise/creativity and
range of values. These applications are known as parame-             processing power to solve optimization problems. We study
ter sweeping applications or parametric applications. In this        how the decision maker’s experience can be leveraged to find
context, it is necessary to run several application instances        solutions for parameter sweeping applications. The analyst’s
varying their parameter values to find a combination that            decisions can be impacted by the DOE analysis that shows
meets a given specification or optimization criteria.                which parameters have more impact on the output. For in-
  A popular practice in this context is to employ Design of          stance, an experiment explorer tool can show which param-
Experiments (DOE) techniques [21] to select parameter val-           eter values cause more impact on experiment results and the
ues to be evaluated. These methods help users understand             user can select suitable parameter value combinations based
the impact of each parameter value on the generated output.          on this information.
As users can check the results produced by each execution,              We introduce a tool, called Copper (Cognitive Optimizer
they can utilize the intermediate results to select the next         and Parameter Explorer), to help users in the evaluation of
parameter values to be evaluated. The knowledge on the im-           search spaces considering human feedback and domain ex-
pact of parameter values helps users explore the search space        pertise. As new samples are evaluated, Copper updates the
more effectively. Nevertheless, to the best of our knowledge,        impact of each parameter value on the results and generates


Copyright held by the author(s).
                                                                 4
         WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah



                                                                          intermediate            job          experiment
                                       submission        execution
                                                                         result analysis    reprioritization   completion
                                             (1)          (2)   (3)          (4) (5)              (5) (6)
                           (1)
                    SLA                                                                                                    execute jobs
                                      high level user workflow steps
              - response time
                                                                                                                       predict SLA violation
              - cost
                                                                                                                    create alternative strategies
                                                                                                                                                      (3)
              - result quality
                                                                     trigger execution of jobs (2)                                                          search space
                                                                  manually change job priorities                               Copper
                                                                change trust on the DOE technique
                                                                                                   (6)
                 analyst                                                                                                    DFO strategies


                                                                                                                    Computing infrastructure
                                                                                                                 (on-premise clusters, cloud, grid)
                                                                   notify SLA violation     (4)
                     (5)                                         suggest search strategy

              analyze intermediate results
                       track SLA
                modify search strategy
                                                                results of completed jobs


      Figure 1: User workflow with job submission, job execution, and intermediate result analysis steps.


new samples for evaluation by using derivative free optimiza-                                          produced results, what they called simulation-analysis loop.
tion (DFO) methods [4]. Moreover, clustering techniques are                                            Netto et al. [23] introduced a scheduler system able to au-
employed to evaluate the regions of search space in which the                                          tomatically offer more resources to parametric application
user is interested and to propose the user to change her ap-                                           jobs based on the quality of their intermediate generated re-
proach when the SLA is predicted to be violated. The main                                              sults so as users could get faster to their desired goal. More
contributions of this paper are therefore:                                                             recently, Mattoso et al. [18] surveyed the use of steering in
                                                                                                       the context of High Performance Computing (HPC) scien-
     • A tool to assist users in the evaluation of scenarios with                                      tific workflows highlighting a tighter integration between the
       parameter sweep applications using human feedback                                               user and the underlying workflow execution system. An-
       and domain expertise (§ 3);                                                                     other area to help users is the development of workflow man-
     • Clustering methods to evaluate user behavior and pro-                                           agement systems [6, 17].
       pose strategy changes in case of imminent SLA viola-                                               An important research topic related to our work is opti-
       tions (§ 3.2);                                                                                  mization assisted by humans [19, 20]. Meignan et al. [20]
                                                                                                       provided a detailed survey and taxonomy of efforts in in-
     • Evaluation of the tool using three applications from                                            teractive optimization applied to operations research. They
       different domains and search space shapes (§ 4).                                                explored the different roles a user can have in an optimiza-
                                                                                                       tion process, such as adjusting or adding new constraints
2.     BACKGROUND                                                                                      and objective, helping on the optimization process itself,
  This section presents research efforts related to our work                                           and guiding the optimization process by providing informa-
and an overview of the problem investigated in this paper.                                             tion related to decision variables. Meignan and Knust [19]
                                                                                                       proposed a system that employs analyst feedback on the op-
2.1     Related Work                                                                                   timization to use as long-term preferences for future runs.
   Assisting users in executing applications has been a re-                                            Nascimento and Eades [7] proposed a framework for hu-
search goal of several groups. Related to our work there are                                           mans to assist the optimization process via inserting do-
efforts from multiple areas: computational steering, human-                                            main knowledge, escaping from local minimum, reducing the
guided search, workflow management, design of experiment,                                              search space to be explored, and avoiding ambiguity for op-
interactive optimization, among others.                                                                timal multi-solutions. Researchers have also explored visu-
   Computational steering [5, 11–13, 22, 32] aims at provid-                                           alization techniques to add humans in the optimization pro-
ing users with tools that enable parameter reconfiguration                                             cess [3, 9, 15, 29]. WorkWays [24, 25] is a science gateway
while experiments are in progress. Parker and Johnson [26]                                             with human-in-the-loop support for running and managing
introduced a system called SCIRun that uses a dataflow pro-                                            scientific workflows.
gramming model and visual programming to simplify the                                                     Several efforts in the design of experiments happened over
tasks of creating, debugging, optimizing, and controlling                                              the last years. Kleijnen et al. [16] developed a survey and a
complex scientific simulations. Van Wijk et al. [30] high-                                             user guide on advances in this area until 2005. Using frac-
lighted that the implementation of computational steering                                              tional factorial design technique, Abramson et al. [1] devel-
in practice is hard. To overcome this problem they im-                                                 oped a system to facilitate parameter exploration and sup-
plemented an environment in which a data manager entity                                                port for abstracting the underlying computing platform.
facilitates the interaction between the application and the                                               Our research is based on existing work of these aforemen-
steering components. Chin et al. [2] incorporated computa-                                             tioned efforts in order to build a tool able to provide feed-
tional steering in mesoscale lattice Boltzmann simulations                                             back to the user on how her interaction with the parameter
and showed the benefits of their work. They discussed that                                             sweeping experiments impacts the defined time (and cost)
large scale simulations require not only computational re-                                             constraints.
sources but tools to manage these simulations and their




                                                                                              5
       WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah




                                Figure 2: Overview of Copper and application adapter.


2.2    Problem Description                                            the output of that process.
   Problems from several industries are tackled by users, also           Analysts may have initial insights about the regions of
named analysts, running complex computer simulations and              search space that lead to good results. However, these in-
optimizers that constitute a software with a set of parame-           sights may be wrong, and can change as long as intermediate
ters, where each parameter can receive different values. We           results are evaluated. For several problems, the codification
consider applications with n parameters and each parame-              of human insights is not feasible due to the following rea-
ter has a finite discrete domain Di where i ∈ {1, · · · , n}. A       sons. These insights come from observations of historical
function f : D1 × · · · × Dn → R is adopted to evaluate the           data which may not be available. Even when this data is
quality of parameter values and is specific for each paramet-         available, the time to analyze and codify the human insight
ric application. The considered optimization problem P is             may not be affordable. Human insights can be based on
described as follows:                                                 domain expertise, and the analyst has no experience/time
                                                                      to explain or codify this knowledge. Therefore, a mixed
                                                                      strategy that combines human expertise and optimization
                        max f (x)                                     approaches may be useful to help analysts in the evaluation
                  P
                        x ∈ D1 × · · · × Dn                           of complex optimization problems.
   We assume the user is able to generate f (x) according to             In this paper, we introduce a tool and a set of techniques
the simulation output. For instance, if the output g(x) of a          to help scientists and engineers to properly prioritize their
simulation process is a real value that should be minimized,          experiments, which can comprise traditional HPC jobs, or
then f (x) can be assigned as f (x) = −g(x) and the problem           jobs following high throughput computing [27]. By doing so,
definition remains the same. If the parametric application            they can find the expected solution in a more cost-effective
a generates multiple outputs (e.g., a : D1 × · · · × Dn →             way, and perhaps discard unnecessary work to be processed.
O1 × · · · × Om ), a wrapper function h : O1 × · · · × Om →              Figure 1 presents the submission-execution-analysis work-
R should be employed to represent the behavior of f (i.e.,            flow investigated in this scenario. In Step 1, the user submits
f (x) = h(a(x))).                                                     to Copper a request to run an application with her SLA con-
   For instance, suppose the analyst wants to calibrate a sim-        straints, which can contain deadline, cost, or result quality
ulation model that generates a list of predicted values over          restrictions. In this work we focus on deadline constraints.
a given period. In this case, the output of the parametric            Others restrictions (e.g., cost or energy) can be mapped into
application corresponds to a list of values, one for each time        the time it takes to execute jobs on the computing infras-
instant. A wrapper function (e.g., Root Mean Squared Er-              tructure. In a scenario where Copper is executed in the
ror) can be employed to generate a single quality value to            cloud, cost may become an important factor to be consid-
compare the difference between the generated values and a             ered. Users may decide to change their execution strategies
reference series.                                                     so as to meet cost constraints based on the cloud provider in-
   Running all possible values for each parameter can be              stance prices and charging model having job execution time
unfeasible—even with large high performance computing ma-             and resource requirements as input. Another example, an
chines. Therefore, users need strategies to select a subset of        energy utilization rate can be employed to define the max-
values for each parameter that covers parts of the explo-             imum amount of energy to run the experiments depending
ration space that will give them enough information to an-            on the execution time. The user also describes in the request
swer their questions. Such questions can be: what is the best         the parameters and possible values to run her application,
set of parameter values that provides a (close-to-)optimal            the job submission strategy (e.g., random search or range of
solution or what are the parameter relationships that have            parameter values), and the specification of the computing
more influence on the phenomenon being analyzed. The                  environment (e.g. number of processors and their configura-
strategies for parameter-value selections can be defined by           tion). In Step 2, Copper requests the resource management
Design of Experiments (DOE), which corresponds to a sys-              system to configure the environment and creates compute
tematic method to determine the relationship between pa-              jobs corresponding to parameter-value pairs.
rameters (also known as features) affecting a process and                Once jobs are executing, Copper monitors the state of the




                                                                  6
          WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah



running/completed/pending jobs to predict SLA violation                  Figure 3 shows Copper in the submission-execution-analysis
(Step 3). If SLA is about to be violated, a new strategy              workflow. Initially, the analyst selects the first jobs to be ex-
is defined to run the pending jobs. If that is the case, a            ecuted (initialEvaluation()). Those jobs can be selected at
notification of SLA violation and a new strategy is sent to           random by Copper or strategically selected by the analyst.
the user (Step 4) who, based on this information, can act             Next, Copper submits the jobs to the computational infras-
on the strategy to execute her jobs (Step 5). Meanwhile,              tructure (submitJobs()). Once the first jobs are finished,
the analyst can also manually change job priorities based             Copper updates the internal DOE model (updateModel())
on her domain expertise and on her trust on the ongoing               and presents the results to the analyst.
strategy (Step 6). As Copper exploits the analyst exper-                 After the initial phase, in the Evaluation Loop the ana-
tise and her feedback is not instantaneous, the job priority          lyst submits the jobs and evaluates the results. The analyst
asynchronously changes to avoid unnecessary delays in trig-           initially receives a list of job suggestions (jobSuggestion())
gering new jobs. This is particularly important as Copper             from Copper and adapts the list by including or removing
is designed to manage long execution time jobs while users            jobs according to her expertise (includeUserJobs()). Then,
follow their submission-execution-analysis workflows.                 the jobs are submitted, the Copper model is updated, and
                                                                      the results are presented to the analyst (similar to the initial
     The specific problems tackled in paper are therefore:            phase). Whenever the analyst considers the results reached
                                                                      the desired goal, the loop stops. If the ongoing strategy
      • How to suggest jobs for execution to the analyst and          is predicted to lead to an SLA violation, the result of the
        exploit her expertise to get faster problem resolution?       jobSuggestion() method returns the violation warning and
      • How to predict SLA violations based on clustering             the new samples to be evaluated according to the analyst’s
        strategies and pending/running/completed jobs?                predefined strategy (§ 3.2).

      • How to suggest to the analyst to change her strategy
        to avoid SLA violations based on intermediate results
        and her expertise?
                                                                            Analyst                    Copper   Computing Infrastructure
3.     ARCHITECTURE AND METHODS
   The proposed solution comprises the Copper tool, a com-                       initialEvaluation()
puting infrastructure, and a web client. Copper is a ser-                                                   submitJobs()
vice, which can be executed in the cloud or on-premise, re-
sponsible for learning the user strategy, suggesting strategy                                                 results
changes in case of probable SLA violations, and proposing                                                   updateModel()
job executions to the analyst. A simple adapter should be
created on the computational infrastructure to connect ex-
isting user applications and Copper. This component is spe-                              results
cific to each application and allows Copper to trigger job
execution by using REST commands. It is also responsible                       Evaluation Loop
for sending intermediate results and the list of parameters
                                                                                   jobSuggestion()
with their possible values to Copper.
   Figure 2 presents an overview of the Copper architecture.                           suggestions
The manager orchestrates the Copper operation, reads the
results, and stores them in a database as soon as jobs are                         includeUserJobs()
executed. The SLA evaluator and sampler receive those in-
termediate results to update the selected DFO method state
and cluster models. In order the produce new job candidates                           evaluateJobs()
(§ 3.1), the sampler utilizes DFO strategies such as gener-                                                 submitJobs()
alized greedy randomized adaptive search (GRASP), gen-
eral pattern search (GPS), or particle swarm (PS) optimiza-                                                   results
tion [4, 10]. The analyst selects the optimization strategy                                                 updateModel()
according to its preference. In this paper, we implemented a
GRASP variation that employs DOE to estimate the quality
of a given sample. As the focus of this paper is related to                              results
the interaction between the analyst and Copper, the com-
parison between the developed GRASP version and other                              isFinished()
optimization approaches is beyond the scope of this paper.
   For SLA violation prediction (§ 3.1), the SLA evaluator
employs clustering methods to estimate the number of pend-
ing jobs according to the user strategy. This number is com-
pared to the number of jobs that can be executed based on
the remaining time/budget defined in the SLA contract. If
                                                                      Figure 3: Copper in the submission-execution-
the user has no resources to evaluate all pending jobs ac-
                                                                      analysis workflow.
cording to a given strategy, Copper sends a warning and
suggests the user to change her approach.




                                                                  7
       WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah



 Algorithm 1: sampling - proposed sampling method.                     number of generated samples (S). The greediness level (β)
  Input : M , P , β, n                                                 is a real value contained in [0, 1] and represents how random
  Output: samples                                                      is the selection of parameter values according to their esti-
1 solution ← construction(M , β , P );
                                                                       mated quality. If the greediness level is close to one, only
2 samples ← localSearch(M , P , n, solution);
                                                                       high-quality parameter values can be selected. On the other
3 return samples;
                                                                       hand, β values close to zero allow the selection of a broad
                                                                       range of values including low-quality values.
                                                                          The construction phase is presented in Algorithm 2 and
                                                                       finds a candidate based on the greedy heuristic. Initially, an
  Algorithm 2: construction - algorithm for creating a                 empty solution is created (Line 1) to return the list of values
  solution candidate.                                                  for each parameter of the result solution. For each parameter
   Input : M , P                                                       p (Line 2), a reduced candidate list (RCL) is created and this
   Output: solution                                                    list is randomly sampled to get a value that composes the
1 solution ← ∅;
                                                                       final solution.
2 foreach p of P do
                                                                          We use the full factorial DOE method [21] to evaluate
3      RCL ← {e ∈ listValues(p, M ) | q(e) ≥                           the quality of each parameter value. q min and q max are the
        q min + β(q max − q min )};                                    minimum and maximum quality values of a given parameter
4      solution[indexOf(p)] ← getRandomElement(RCL);                   value estimated by the DOE method. The RCL is formed by
5 end                                                                  all feasible parameter values that can be used to construct
6 return solution;                                                     the partial solution. In order to insert a parameter value in
                                                                       the RCL (Line 3) its quality level (q) should be higher or
                                                                       equal than a threshold q min + β(q max − q min ). Observe that
  Algorithm 3: localSearch - find neighbors of a given                 if β ∼ 1, the quality threshold approximates to q max and
  solution.                                                            only the best quality results will be selected (pure greedy
   Input : M , P , n, solution                                         approach). On the other hand, if β ∼ 0, the quality thresh-
   Output: solutions                                                   old is close to q min and most of the parameter values can be
1 solutions ← ∅;                                                       used to compose the final solution (i.e., random approach).
2 for i ← 1 to n do                                                       Whenever part of the search space has enough evaluated
3      solutions[i] ← getNeighbor(M , P , solution);                   samples to apply the full factorial, the DOE model is up-
4 end                                                                  dated. Next, a component of RCL is randomly sampled to
5 return solutions;                                                    compose the final solution (Line 4), and finally the solution
                                                                       is returned (Line 6).
                                                                          The last phase of the sampling algorithm corresponds to
                                                                       the local search (Algorithm 3). In this phase, n samples
3.1    Default Sampling Method - GRASP                                 close to the candidate solution found in the previous phase
   Combinatorial optimization corresponds to a class of prob-          are selected. The getNeighbor() function is employed to get
lems that consist of finding an optimal solution from a fi-            a solution candidate neighbor by using euclidean distance
nite set of objects [31]. Generally, brute-force approaches            between samples.
cannot be applied to solve those problems due to time or
cost constraints. Therefore, analysts or computer programs             3.2    SLA violation prediction method
should employ heuristics to select candidate solutions that
                                                                          The method to predict SLA violations is based on cluster-
will probably represent good (or optimal) solutions. This
                                                                       ing techniques. We identify the clusters of the search space
section presents how Copper generates evaluation samples 1
                                                                       that the analyst is interested in and compare the number of
to the analyst using GRASP algorithm. This method is
                                                                       pending jobs in these clusters to the number of jobs that can
based on the original GRASP algorithm [10] but instead of
                                                                       be executed according to the SLA contract. In this work, we
automatically evaluating the produced samples, the analyst
                                                                       use Density-Based Spatial Clustering of Applications with
decides whether the samples will be assessed or not. This
                                                                       Noise (DBSCAN) [14] method to identify the analyst strat-
sampling approach corresponds to the default method to
                                                                       egy. This clustering method is attractive as the analyst
suggest evaluation of the search space samples to the user.
                                                                       does not need to inform the number of clusters before eval-
Our approach is not strictly tied to this optimization method
                                                                       uation [31]. The DBSCAN algorithm groups samples with
and other techniques can be adopted to generate user sam-
                                                                       many neighbors determining high density clusters. Samples
ples (e.g., GPS or PS).
                                                                       that lie in low-density clusters are defined as outliers and
   The GRASP sampling method is presented in Algorithm
                                                                       may not be classified in any cluster. Here this algorithm
1, which comprises two phases: construction and local search.
                                                                       is adopted to determine previously evaluated clusters of the
The construction phase builds a feasible solution for each al-
                                                                       search space with good solutions and suggests these clusters
gorithm execution (Line 1). Next, the local search generates
                                                                       in case of possible SLA violations. The user defines how the
the neighbors of the previous solution (Line 2). Finally, the
                                                                       pending jobs will be distributed over the identified clusters.
set of results is returned (including the initial solution). The
                                                                          Figure 4 presents the basic idea of the proposed solution.
algorithm inputs are the following: the DOE model (M ),
                                                                       Suppose the search space is composed of two parameters
the list of parameters (P ), the greediness level (β), and the
                                                                       and each particular combination (v1, v2) represents a job
1
 Samples and jobs represent an instance of the application             that will be submitted to the computational infrastructure.
with a set of parameter values. These terms are used in the            v1 and v2 are possible values for parameters 1 and 2 respec-
paper interchangeably.                                                 tively. As long as the analyst submits new jobs, the clus-




                                                                   8
                            WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah



                                                                                                                       4.1.1    Applications
                         job not executed             job executed + bad result
                                                                                        interest regions                 Here is a short description of the applications we used in
                         job executed + good result   best results (not executed yet)
                                                                                                                       the evaluation with an overview of their input parameters
                                                                                                                       and output users are interested in.
                                                                                                                         1. ifm: The Integrated Flood Model (IFM) [28] is a




                                                                                                    search space
                                                                                                                       hydrological model aimed at providing high resolution flood
parameter 2 (values)




                                                                                                                       forecasts. IFM contains a soil and an overland routing model
                                                                                                                       where the soil model estimates the surface-runoff based on
                                                                                                                       incoming precipitation, soil, and land use properties. When
                                                                                                                       IFM is deployed to calculate flooding predictions, it requires
                       parameter 1 (values)                                                                            a calibration which consists in running several simulations
                                                                                                                       that match data collected by real sensors. To evaluate IFM,
                                                                                                                       we varied the parameters related to incoming precipitation
Figure 4: Example of search space with a set of in-
                                                                                                                       and soil properties with the goal of minimizing the difference
terest clusters.
                                                                                                                       between simulated and real data captured by sensors.
                                                                                                                          2. schedsim: Scheduler Simulator (SchedSim) is a sim-
tering approach identifies the clusters of the search space                                                            ulator to assist in the creation of policies to manage High
that provide good results. The quality of job results can be                                                           Performance Computing clusters. It accepts a variety of
previously defined by using a threshold value or the analyst                                                           parameters including number of processors in the cluster,
can qualify each job result (e.g., good or bad) whenever it                                                            partitions, and scheduling algorithms. Tuning the sched-
becomes available.                                                                                                     uler and cluster properties to meet client business goals is
   In this example, the analyst is searching the best result in                                                        not a trivial task and several scenarios must be executed to
three clusters of the search space. The clusters are identified                                                        achieve that. For the evaluation, we had the following input:
by the clustering algorithm and the number of pending jobs                                                             (i) a workload containing historical data of jobs submitted
is counted. If this number is higher than the remaining jobs                                                           to a cluster; (ii) a fixed number of cluster processors; and
defined by the SLA the user should change her strategy. In                                                             (iii) two variable partitions. We wanted to know what was
this case, the number of pending jobs is 15 (including the                                                             the partition sizes and which jobs (requested time and al-
best result). If the analyst can run 15 jobs, at most, jobs due                                                        located processors) should go to such partitions having the
to SLA restrictions, Copper advises the analyst to change                                                              following goals: (i) minimize the overall job response time
her strategy.                                                                                                          and (ii) small jobs (under 1h requested time) should wait no
   The strategy can be defined considering exploration and                                                             more than 3 hours in the cluster waiting queue.
exploitation aspects. We define exploitation level (ex ) as the
proportion of pending jobs that will be executed considering                                                              3. mazerunner: Maze Runner is an in-house created
samples inside the detected clusters. For instance, if the                                                             game in which simulated runner has to find the way out of
exploitation level is 1.0 all the pending jobs will be executed                                                        a maze. The objective of the game is to find a runner con-
taking the parameters contained in the clusters of interest.                                                           figuration (e.g., velocity, movement strategy) that leads to a
If the exploitation level is 0, pending jobs in not explored                                                           shorter time to find the maze exit. There are enemies in the
clusters (outside the clusters) will be given higher priority.                                                         maze, and the runner needs to escape from them. Addition-
   Considering all possible pending jobs in all clusters (i.e.                                                         ally, the runner can slip on obstacles and stay vulnerable for
ex ×#pending jobs), the user defines which ones of these jobs                                                          some time depending on her velocity. This game is simple
will execute for each cluster. The default approach utilizes                                                           and intuitive and was created to demonstrate concepts of
weighted average according the mean quality of evaluated                                                               the proposed assistant for a broader audience.
jobs inside the clusters. For jobs outside the clusters (i.e.                                                             The search spaces related to the applications used in this
(1 − ex ) × #pending jobs), the default GRASP strategy re-                                                             work are illustrated in Figure 5. Multiple parameter com-
mains the same (§ 3.1).                                                                                                binations are represented in a single axis and each square
                                                                                                                       corresponds to a parameter combination. The quality of a
                                                                                                                       parameter configuration is represented as a given color (dark
4.                       EVALUATION                                                                                    colors represent better results). For instance, the top-right
  In this section, we evaluate the Copper effectiveness in                                                             square of Figure 5c presents the (V1 1, V2 1, V3 2, V4 3 )
helping users run optimization and computer simulations                                                                parameter configuration of maze runner. In this work, we
under SLA constraints. We first describe three applications                                                            assume the output of parametric applications as black box
served as use cases and metrics analyzed to understand the                                                             functions, therefore we will not enter in details about the
benefits of the assistant. In particular, we are interested                                                            meaning of parameters and their values.
in analyzing three major aspects considering submission-                                                                  By observing the search spaces of the three applications,
execution-analysis workflows: (i) the interaction of Copper                                                            SchedSim presents a well behaved variation of the output
with analysts of different expertise levels; (ii) how the an-                                                          when the parameters values change. A similar behavior hap-
alyst confidence on Copper affects the evaluation of user                                                              pens to the IFM search space. However, for higher values
applications; (iii) the impact of Copper SLA violation pre-                                                            of P1 the output presents no variation. This ‘flat’ search
dictions on user experiments.                                                                                          space region reduces the Copper capacity to predict good
                                                                                                                       parameter values. Regarding the Maze Runner search space,
4.1                        Experiment Setup                                                                            configurations whose parameter P1 = V1 3 provide better
  This section presents an overview of the setup to evaluate                                                           results. However, the output variation does not have the
Copper.                                                                                                                same gradual characteristics as in the other applications.




                                                                                                                   9
        WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah




                                                                                                                                                                                                                                                    V2_10
                                                                                                                                                                                                                                                            V2_11
                                                                                                                                                                                                                                                                    V2_12
                                                                                                                                                                                                                                                                            V2_13
                                                                                                                                                                                                                                                                                    V2_14
                                                                                                                                                                                                                                                                                             V2_15
                                                                                                                                                                                                                                                                                                     V2_16
                                                                                                                                                                                                                                                                                                             V2_17
                                                                                                                                                                                                                                                                                                                     V2_18
                                                                                                                                                                                                                                                                                                                             V2_19
                                                                                                                                                                                                                                                                                                                                     V2_20
                                                                                                                                                                                    V2_1
                                                                                                                                                                                           V2_2
                                                                                                                                                                                                  V2_3
                                                                                                                                                                                                         V2_4
                                                                                                                                                                                                                V2_5
                                                                                                                                                                                                                       V2_6
                                                                                                                                                                                                                              V2_7
                                                                                                                                                                                                                                      V2_8
                                                                                                                                                                                                                                             V2_9
         66.0
         71.0
         76.0
         81.0
         86.0
         91.0
         96.0
          1.0
          6.0
         11.0
         16.0
         21.0
         26.0
         31.0
         36.0
         41.0
         46.0
         51.0
         56.0
         61.0
         66.0
         71.0
         76.0
         81.0
         86.0
         91.0
         96.0
          1.0
          6.0
         11.0
         16.0
         21.0
         26.0
         31.0
         36.0
         41.0
         46.0
         51.0
         56.0
         61.0
         66.0
         71.0
         76.0
         81.0
         86.0
         91.0
         96.0
                                                     V4_1




                                                            V4_2




                                                                          V4_3




                                                                                         V4_4




                                                                                                   V4_5




                                                                                                                 V4_6
                                                                                                                           P4




                                                                                                                                                                                                           V3_10
                                                                                                                                                                                                                   V3_11
                                                                                                                                                                                                                              V3_12
                                                                                                                                                                                                                                       V3_13
                                                                                                                                                                                                                                                V3_14
                                                                                                                                                                                                                                                            V3_15
                                                                                                                                                                                                                                                                     V3_16
                                                                                                                                                                                                                                                                               V3_17
                                                                                                                                                                                                                                                                                            V3_18
                                                                                                                                                                                                                                                                                                     V3_19
                                                                                                                                                                                                                                                                                                               V3_20
                                                                                                                                        V3_1
                                                                                                                                               V3_2
                                                                                                                                                      V3_3
                                                                                                                                                             V3_4
                                                                                                                                                                    V3_5
                                                                                                                                                                           V3_6
                                                                                                                                                                                  V3_7
                                                                                                                                                                                           V3_8
                                                                                                                                                                                                  V3_9
                                                                                                                                                                                                                                                                                                                             P3




                                                {
                                                {
                                                {
                                                {
                                                {
                                                {
                                                 V3_2




                                                 V3_2




                                                 V3_2




                                                 V3_2




                                                 V3_2




                                                 V3_2
                                                 V3_1




                                                 V3_1




                                                 V3_1




                                                 V3_1




                                                 V3_1




                                                 V3_1
                                                                                                                           P3




                                                 V3_5




                                                 V3_5




                                                 V3_5




                                                 V3_5




                                                 V3_5




                                                 V3_5
                                                 V3_3




                                                 V3_3




                                                 V3_3




                                                 V3_3




                                                 V3_3




                                                 V3_3
                                                 V3_6




                                                 V3_6




                                                 V3_6




                                                 V3_6




                                                 V3_6




                                                 V3_6
                                                 V3_4




                                                 V3_4




                                                 V3_4




                                                 V3_4




                                                 V3_4




                                                 V3_4
                                                                                                                                                                                                                                                                                                                             P2




                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                        {
                                                                                                                                            1.0
                                                                                                                                            6.0
                                                                                                                                          11.0
                                                                                                                                          16.0
                                                                                                                                          21.0
                                                                                                                                          26.0
                                                                                                                                          31.0
                                                                                                                                          36.0
                                                                                                                                          41.0
                                                                                                                                          46.0
                                                                                                                                          51.0
                                                                                                                                          56.0
                                                                                                                                          61.0
                                                                                                                                          66.0
                                                                                                                                          71.0
                                                                                                                                          76.0
                                                                                                                                          81.0
                                                                                                                                          86.0
                                                                                                                                          91.0
                                                                                                                                          96.0
                                                                                                                                            1.0
                                                                                                                                            6.0
                                                                                                                                          11.0
                                                                                                                                          16.0
                                                                                                                                          21.0
                                                                                                                                          26.0
                                                                                                                                          31.0
                                                                                                                                          36.0
                                                                                                                                          41.0
                                                                                                                                          46.0
                                                                                                                                          51.0
                                                                                                                                          56.0
                                                                                                                                          61.0
                                                                                                                                          66.0
                                                                                                                                          71.0
                                                                                                                                          76.0
                                                                                                                                          81.0
                                                                                                                                          86.0
                                                                                                                                          91.0
                                                                                                                                          96.0
                                                                                                                                           1.0
                                                                                                                                           6.0
                                                                                                                                          11.0
                                                                                                                                          16.0
                                                                                                                                          21.0
                                                                                                                                          26.0
                                                                                                                                          31.0
                                                                                                                                          36.0
                                                                                                                                          41.0
                                                                                                                                          46.0
                                                                                                                                          51.0
                                                                                                                                          56.0
                                                                                                                                          61.0
                                                                                                                                          66.0
                                                                                                                                          71.0
                                                                                                                                          76.0
                                                                                                                                          81.0
                                                                                                                                          86.0
                                                                                                                                          91.0
                                                                                                                                          96.0




                                                                                                                                           1.0
                                                                                                                                           6.0
                                                                                                                                          11.0
                                                                                                                                          16.0
                                                                                                                                          21.0
                                                                                                                                          26.0
                                                                                                                                          31.0
                                                                                                                                          36.0
                                                                                                                                          41.0
                                                                                                                                          46.0
                                                                                                                                          51.0
                                                                                                                                          56.0
                                                                                                                                          61.0
                                                                                                                                          66.0
                                                                                                                                          71.0
                                                                                                                                          76.0
                                                                                                                                          81.0
                                                                                                                                          86.0
                                                                                                                                          91.0
                                                                                                                                          96.0
                                                                                                                                           1.0
                                                                                                                                           6.0
                                                                                                                                         11.0
                                                                                                                                         16.0
                                                                                                                                         21.0
                                                                                                                                         26.0
                                                                                                                                         31.0
                                                                                                                                         36.0
                                                                                                                                         41.0
                                                                                                                                         46.0
                                                                                                                                         51.0
                                                                                                                                         56.0
                                                                                                                                         61.0
                                                                                                                                         66.0
                                                                                                                                         71.0
                                                                                                                                         76.0
                                                                                                                                         81.0
                                                                                                                                         86.0
                                                                                                                                         91.0
                                                                                                                                         96.0




                                                                                                                                           1.0
                                                                                                                                           6.0
                                                                                                                                         11.0
                                                                                                                                         16.0
                                                                                                                                         21.0
                                                                                                                                         26.0
                                                                                                                                         31.0
                                                                                                                                         36.0
                                                                                                                                         41.0
                                                                                                                                         46.0
                                                                                                                                         51.0
                                                                                                                                         56.0
                                                                                                                                         61.0
                                                                                                                                         66.0
                                                                                                                                         71.0
                                                                                                                                         76.0
                                                                                                                                         81.0
                                                                                                                                         86.0
                                                                                                                                         91.0
                                                                                                                                         96.0
                                                                                                                                           1.0
                                                                                                                                           6.0
                                                                                                                                         11.0
                                                                                                                                         16.0
                                                                                                                                         21.0
                                                                                                                                         26.0
                                                                                                                                         31.0
                                                                                                                                         36.0
                                                                                                                                         41.0
                                                                                                                                         46.0
                                                                                                                                         51.0
                                                                                                                                         56.0
                                                                                                                                         61.0
                                                                                                                                         66.0
                                                                                                                                         71.0
                                                                                                                                         76.0
                                                                                                                                         81.0
                                                                                                                                         86.0
                                                                                                                                         91.0
                                                                                                                                         96.0
                                                                                                                                           1.0
                                                                                                                                           6.0
                                                                                                                                         11.0
                                                                                                                                         16.0
                                                                                                                                         21.0
                                                                                                                                         26.0
                                                                                                                                         31.0
                                                                                                                                         36.0
                                                                                                                                         41.0
                                                                                                                                         46.0
                                                                                                                                         51.0
                                                                                                                                         56.0
                                                                                                                                         61.0
                                                                                                                                         66.0
                                                                                                                                         71.0
                                                                                                                                         76.0
                                                                                                                                         81.0
                                                                                                                                         86.0
                                                                                                                                         91.0
                                                                                                                                         96.0
                                                                                                                                           1.0
                                                                                                                                           6.0
                                                                                                                                         11.0
                                                                                                                                         16.0
                                                                                                                                         21.0
                                                                                                                                         26.0
                                                                                                                                         31.0
                                                                                                                                         36.0
                                                                                                                                         41.0
                                                                                                                                         46.0
                                                                                                                                         51.0
                                                                                                                                         56.0
                                                                                                                                         61.0
                                                                                                                                         66.0
                                                                                                                                         71.0
                                                                                                                                         76.0
                                                                                                                                         81.0
                                                                                                                                         86.0
                                                                                                                                         91.0
                                                                                                                                         96.0




                                                                                                                                           1.0
                                                                                                                                           6.0
                                                                                                                                         11.0
                                                                                                                                         16.0
                                                                                                                                         21.0
                                                                                                                                         26.0
                                                                                                                                         31.0
                                                                                                                                         36.0
                                                                                                                                         41.0
                                                                                                                                         46.0
                                                                                                                                         51.0
                                                                                                                                         56.0
                                                                                                                                         61.0
                                                                                                                                         66.0
                                                                                                                                         71.0
                                                                                                                                         76.0
                                                                                                                                         81.0
                                                                                                                                         86.0
                                                                                                                                         91.0
                                                                                                                                         96.0
                                                                                                                                          1.0
                                                                                                                                          6.0
                                                                                                                                         11.0
                                                                                                                                         16.0
                                                                                                                                         21.0
                                                                                                                                         26.0
                                                                                                                                         31.0
                                                                                                                                         36.0
                                                                                                                                         41.0
                                                                                                                                         46.0
                                                                                                                                         51.0
                                                                                                                                         56.0
                                                                                                                                         61.0
                                                                                                                                         66.0
                                                                                                                                         71.0
                                                                                                                                         76.0
                                                                                                                                         81.0
                                                                                                                                         86.0
                                                                                                                                         91.0
                                                                                                                                         96.0




                                                                                                                                          1.0
                                                                                                                                          6.0
                                                                                                                                         11.0
                                                                                                                                         16.0
                                                                                                                                         21.0
                                                                                                                                         26.0
                                                                                                                                         31.0
                                                                                                                                         36.0
                                                                                                                                         41.0
                                                                                                                                         46.0
                                                                                                                                         51.0
                                                                                                                                         56.0
                                                                                                                                         61.0
                                                                                                                                         66.0
                                                                                                                                         71.0
                                                                                                                                         76.0
                                                                                                                                         81.0
                                                                                                                                         86.0
                                                                                                                                         91.0
                                                                                                                                         96.0
                                                                                                                                          1.0
                                                                                                                                          6.0
                                                                                                                                        11.0
                                                                                                                                        16.0
                                                                                                                                        21.0
                                                                                                                                        26.0
                                                                                                                                        31.0
                                                                                                                                        36.0
                                                                                                                                        41.0
                                                                                                                                        46.0
                                                                                                                                        51.0
                                                                                                                                        56.0
                                                                                                                                        61.0
                                                                                                                                        66.0
                                                                                                                                        71.0
                                                                                                                                        76.0
                                                                                                                                        81.0
                                                                                                                                        86.0
                                                                                                                                        91.0
                                                                                                                                        96.0




                                                                                                                                          1.0
                                                                                                                                          6.0
                                                                                                                                        11.0
                                                                                                                                        16.0
                                                                                                                                        21.0
                                                                                                                                        26.0
                                                                                                                                        31.0
                                                                                                                                        36.0
                                                                                                                                        41.0
                                                                                                                                        46.0
                                                                                                                                        51.0
                                                                                                                                        56.0
                                                                                                                                        61.0
                                                                                                                                        66.0
                                                                                                                                        71.0
                                                                                                                                        76.0
                                                                                                                                        81.0
                                                                                                                                        86.0
                                                                                                                                        91.0
                                                                                                                                        96.0




                                                                                                                                          1.0
                                                                                                                                          6.0
                                                                                                                                        11.0
                                                                                                                                        16.0
                                                                                                                                        21.0
                                                                                                                                        26.0
                                                                                                                                        31.0
                                                                                                                                        36.0
                                                                                                                                        41.0
                                                                                                                                        46.0
                                                                                                                                        51.0
                                                                                                                                        56.0
                                                                                                                                        61.0
                                                                                                                                        66.0
                                                                                                                                        71.0
                                                                                                                                        76.0
                                                                                                                                        81.0
                                                                                                                                        86.0
                                                                                                                                        91.0
                                                                                                                                        96.0
                                                                                                                                          1.0
                                                                                                                                          6.0
                                                                                                                                        11.0
                                                                                                                                        16.0
                                                                                                                                        21.0
                                                                                                                                        26.0
                                                                                                                                        31.0
                                                                                                                                        36.0
                                                                                                                                        41.0
                                                                                                                                        46.0
                                                                                                                                        51.0
                                                                                                                                        56.0
                                                                                                                                        61.0
                                                                                                                                        66.0
                                                                                                                                        71.0
                                                                                                                                        76.0
                                                                                                                                        81.0
                                                                                                                                        86.0
                                                                                                                                        91.0
                                                                                                                                        96.0




                                                                                                                                          1.0
                                                                                                                                          6.0
                                                                                                                                        11.0
                                                                                                                                        16.0
                                                                                                                                        21.0
                                                                                                                                        26.0
                                                                                                                                        31.0
                                                                                                                                        36.0
                                                                                                                                        41.0
                                                                                                                                        46.0
                                                                                                                                        51.0
                                                                                                                                        56.0
                                                                                                                                        61.0
                                                                                                                                        66.0
                                                                                                                                        71.0
                                                                                                                                        76.0
                                                                                                                                        81.0
                                                                                                                                        86.0
                                                                                                                                        91.0
                                                                                                                                        96.0




                                                                                                                                          1.0
                                                                                                                                          6.0
                                                                                                                                        11.0
                                                                                                                                        16.0
                                                                                                                                        21.0
                                                                                                                                        26.0
                                                                                                                                        31.0
                                                                                                                                        36.0
                                                                                                                                        41.0
                                                                                                                                        46.0
                                                                                                                                        51.0
                                                                                                                                        56.0
                                                                                                                                        61.0
                                                                                                                                        66.0
                                                                                                                                        71.0
                                                                                                                                        76.0
                                                                                                                                        81.0
                                                                                                                                        86.0
                                                                                                                                        91.0
                                                                                                                                        96.0
                                                                                                                                          1.0
                                                                                                                                          6.0
                                                                                                                                        11.0
                                                                                                                                        16.0
                                                                                                                                        21.0
                                                                                                                                        26.0
                                                                                                                                        31.0
                                                                                                                                        36.0
                                                                                                                                        41.0
                                                                                                                                        46.0
                                                                                                                                        51.0
                                                                                                                                        56.0
                                                                                                                                        61.0
                                                                                                                                        66.0
                                                                                                                                        71.0
                                                                                                                                        76.0
                                                                                                                                        81.0
                                                                                                                                        86.0
                                                                                                                                        91.0
                                                                                                                                        96.0
                                                                                                                                         1.0
                                                                                                                                         6.0
                                                                                                                                        11.0
                                                                                                                                        16.0
                                                                                                                                        21.0
                                                                                                                                        26.0
                                                                                                                                        31.0
                                                                                                                                        36.0
                                                                                                                                        41.0
                                                                                                                                        46.0
                                                                                                                                        51.0
                                                                                                                                        56.0
                                                                                                                                        61.0
                                                                                                                                        66.0
                                                                                                                                        71.0
                                                                                                                                        76.0
                                                                                                                                        81.0
                                                                                                                                        86.0
                                                                                                                                        91.0
                                                                                                                                        96.0
                                                                                                                                 V1_1


                                          {
                                              V2_1
                                              V2_2                                                                               V1_2
                                              V2_3                                                                               V1_3
                                   V1_1
                                              V2_4
                                              V2_5                                                                               V1_4



                                          {
                                              V2_6
                                              V2_1                                                                               V1_5
                                              V2_2                                                                               V1_6
                                              V2_3
                                   V1_2                                                                                          V1_7
                                              V2_4
                                              V2_5
                                                                                                                                 V1_8



                                          {
                                              V2_6
                                              V2_1                                                                               V1_9
                                              V2_2
                                              V2_3                                                                              V1_10
                                   V1_3
                                              V2_4
                                              V2_5
                                                                                                                                V1_11
                                                                                                                                V1_12


                                          {
                                              V2_6
                                              V2_1
                                              V2_2                                                                              V1_13
                                              V2_3                                                                              V1_14
                                   V1_4
                                              V2_4
                                              V2_5                                                                              V1_15



                                          {
                                              V2_6
                                              V2_1                                                                              V1_16
                                              V2_2                                                                              V1_17
                                              V2_3
                                   V1_5                                                                                         V1_18
                                              V2_4
           Normalized Goal [0,1]




                                              V2_5
                                                                                                                                V1_19



                                          {
                                              V2_6
                                              V2_1                                                                              V1_20
                                              V2_2
                                              V2_3                                                                              V1_21
                                   V1_6       V2_4
                                              V2_5
                                                                                                                                V1_22
                                              V2_6

                                   P1 P2                                                                                         P1                                    b) SchedSim search space
                                                                   a) IFM search space
                                                                                                V4_1                      V4_2                               V4_3
                                                                                                                                                                                              P4
                                                                                      {
                                                                                      {
                                                                                      {
                                                                                        V3_1



                                                                                                       V3_2



                                                                                                                   V3_1



                                                                                                                                 V3_2



                                                                                                                                                      V3_1



                                                                                                                                                                              V3_2
                                                                                                                                                                                                  P3




                                                                          {
                                                                                 V2_1

                                                                   V1_1          V2_2

                                                                                 V2_3




                                                                          {
                                                                                 V2_1
                                                                   V1_2          V2_2

                                                                                 V2_3




                                                                          {
                                                                                 V2_1
                                                                   V1_3          V2_2

                                                                                 V2_3
                                                                   P1            P2                       c) Maze Runner search space


Figure 5: Search spaces for IFM, SchedSim and Maze Runner applications. Each application has n parameters
(P 1 · · · P n) and each parameter P j can assume m values (V j 1, · · · , V j m). The search spaces sizes for IFM,
SchedSim, and Maze Runner are 6 × 6 × 6 × 6 = 1296, 22 × 20 × 20 = 8800, and 3 × 3 × 2 × 3 = 54, respectively.


4.1.2    Experiments Description                                                                                                 mal solutions. Therefore, we divided the agents into three
   We consider software agents to simulate the human be-                                                                         categories based on the insight quality level. The first group
havior [19] and divide the experiments into two main groups.                                                                     of agents are related to users that have a previous intuition
First, the agents evaluate the tool with no SLA requirements                                                                     about good parameter values, and this intuition may lead
(§ 4.2). In this case, the assessment aims at finding how                                                                        to bad solutions (bad insights). The second group contains
many jobs are necessary to find an optimal solution. The                                                                         agents that have previous ideas about the parameter values
software agent only interacts with Copper to obtain sugges-                                                                      and these values result in proper solutions (good insights).
tions of jobs to be executed and no SLA-aware assessment                                                                         The last group of agents represents users with no insight
is performed.                                                                                                                    about the clusters of search space that may contain the op-
   The second evaluation is performed to find the best result                                                                    timal solution (no insights).
considering a limited number of jobs (§ 4.3). Therefore,                                                                            For each agent insight level, we vary the user confidence
agents interact with Copper to obtain strategy suggestions                                                                       level on the job suggestions that come from Copper and the
if the ongoing experiments tend to extrapolate the number                                                                        capacity of an agent to learn from intermediate results. The
of jobs established in the SLA.                                                                                                  confidence level determines the percentage of jobs that are
                                                                                                                                 submitted taking into account Copper suggestions. For in-
                                                                                                                                 stance, if the user submits 100 jobs and the confidence level
4.2     Finding an Optimal Solution                                                                                              is 40%, then 40 jobs are submitted based on Copper recom-
  In this evaluation, we assess the number of jobs that must                                                                     mendation and the rest is triggered based on the agent intu-
be executed to find an optimal solution. The agents have                                                                         ition. This intuition corresponds to a subset of search space
insights about which parameter values may result in opti-                                                                        in which the analyst believes the optimal result is contained.




                                                                                                                           10
          WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah



                                                   Agent does not learn from intermediate results
                                                   Agent learns from intermediate results



    350                                       90                                               160
    300                                       80                                               140
                                              70
    250                                                                                        120
                                              60
     200                                                                                       100
                                              50
    150                                       40                                                80
                                              30                                                60
    100
                                              20                                                40
     50                                       10                                                20
      0                                        0                                                 0
   3500                                      300                                               700
   3000                                      250                                               600
   2500                                                                                        500
                                             200
   2000                                                                                        400
                                             150
   1500                                                                                        300
                                             100
   1000                                                                                        200
    500                                       50                                               100
      0                                        0                                                     0
     50                                       25                                                    40
                                                                                                    35
     40                                       20
                                                                                                    30
     30                                       15                                                    25
                                                                                                    20
     20                                       10
                                                                                                    15
     10                                        5                                                    10
                                                                                                     5
      0                                        0                                                     0
           10 20 30 40 50 60 70 80 90 100          10 20 30 40 50 60 70 80 90 100                        10 20 30 40 50 60 70 80 90 100
                        Percentage of samples from advisor (agent confidence on Copper - confidence level)

Figure 6: # Jobs to find an optimal solution as a function of agent confidence level for all applications. 100%
confidence level experiments correspond to the proposed GRASP solution without analyst intervention.


We evaluate agents that learn or not from intermediate re-                for these applications. For agents with bad insights, as the
sults. When the agent learns from intermediate results, it                confidence level on recommendations made by Copper in-
increases the probability of selecting a given parameter value            crease, the number of jobs to find an optimal result reduces.
if this value leads to good intermediate results and decreases            Then, users with bad insights may have benefits when using
the likelihood of choosing a given value if this parameter                Copper. For Agents that do not learn from intermediate
value leads to bad results. As the focus of this experiment               results, SchedSim results present the most significant differ-
is to present how the framework takes benefits from users                 ence between relying on Copper to provide the jobs param-
recommendations, we do not explore in details here how the                eters or not. Increasing the confidence in suggestions made
agents learn from intermediate results.                                   by Copper also reduces the number of jobs to find the opti-
   Figure 6 shows the percentage of jobs that are submitted               mal configuration in IFM and Maze Runner. However, this
according to Copper recommendation (confidence level) for                 difference is less significant than the previous one. As the
all applications as a function of the number of jobs that must            SchedSim search space has interesting characteristics about
be submitted to find an optimal solution. As the proposed                 the objective function growing related to the input param-
sampling algorithm is stochastic, multiple experiments were               eters, the proposed sampling algorithm takes less steps to
performed to find a 95% confidence interval of the mean                   find the optimal parameter configuration.
number of jobs to find an optimal result. The bootstrap                      For bad insight agents that learn from intermediate re-
method [8] was adopted to find the confidence interval and                sults, there is a significant difference between learning from
the sample size for each result is 500—this number is high                intermediate results or not. For example, bad insight agents
and reduces the size of the confidence interval. The bar                  for SchedSim with 10% confidence level presents a big differ-
graphs show the mean value and the confidence interval for                ence between learning from intermediate results or not. This
each experiment.                                                          behavior can be explained by the interesting characteristics
   The first, second, and third rows of Figure 6 are related              of ShedSim search space. Figure 5 shows that low values of
respectively to IFM, SchedSim, and Maze Runner results.                   SchedSim parameters P1, P2, and P3 lead to good results.
The three columns of the same figure, shows the experi-                   The agents learn that these values lead to good results and
ments for agents with bad, good and no previous insights                  increase the probability of choosing combinations of these




                                                                     11
       WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah



values.
                                                                                     1.00                Best Results
   For agents with good insights, the number of jobs to find
the optimal solution increases with the agent confidence                             0.98
level. This result is the opposite of the previous one and
indicates that users with good insights may have a signifi-                          0.96
cant reduction of steps to find an optimal solution. Agents                          0.94                                         50 Jobs




                                                                       Quality (%)
with good insights about the behavior of SchedSim and Maze                                                                        100 Jobs
Runner present a regular growth in the number of jobs to                             0.92                                         150 Jobs
find the optimal solution when the confidence level on Cop-                                                                       200 Jobs
                                                                                     0.90
per increases. However, IFM does not present this charac-
teristic as there is an abrupt increase from 90 to 100 percent                       0.88
of confidence level. Whenever 100% confidence level is em-
                                                                                     0.86
ployed, the agent only takes Copper suggestions to submit
new jobs. In this case, Copper needs more jobs to select pa-                                 0.0    0.33         0.66       1.0
                                                                                                       Exploitation Level
rameters outside the flat area of IFM search space. As 90%
confidence level agents submit 10% of jobs based on user in-
sight. The DOE model is updated considering the non-flat               Figure 7: Best results for experiments with SLA
areas (agent good insight) and speeds-up the optimization              restrictions - IFM.
process. For agents with good insights, there is no expres-
sive difference between learning from intermediate results or                        1.00                Best Results
not. As the number of jobs to find the optimal solution is
reduced, the number of samples is not sufficient to update                           0.99
the user preference and significantly affect the result.
   For agents with no insights, the agent’s efficiency is im-                        0.98
pacted by the confidence level. Whenever the confidence                                                                           30 Jobs
level on Copper increases the results get better. However,             Quality (%)   0.97
                                                                                                                                  60 Jobs
this impact is less notable to no insight agents when com-                                                                        90 Jobs
pared to the results of bad insight agents. Similarly to the                                                                      120 Jobs
                                                                                     0.96
bad insight agents, there is a difference between learning
from intermediate results.                                                           0.95

4.3    Restricted Experiments                                                        0.94
                                                                                             0.0    0.33         0.66       1.0
   This set of experiments utilizes the same agents to find the                                        Exploitation Level
best effort solution but with limited number of jobs that can
be executed due to SLA constraints. We assume that all jobs            Figure 8: Best results for experiments with SLA
take the same time to execute, and the SLA constraints cor-            restrictions - SchedSim.
respond only to time requirements. Other conditions such as
costs or energy consumption can be used by mapping these
                                                                                     1.000               Best Results
requirements into the number of submitted jobs. The qual-
ity of a result is given by the ratio of the result found and
the optimal result.                                                                  0.998
   For each experiment set, we changed the maximum num-
ber of jobs that can be submitted and the exploitation level.
                                                                                     0.996                                        15 Jobs
                                                                       Quality (%)




The exploitation level is defined as the proportion of pend-                                                                      20 Jobs
ing jobs that will be submitted considering samples inside                                                                        25 Jobs
the clusters defined in § 3.2. As the output of an experi-                           0.994                                        30 Jobs
ment is not deterministic, each bar in the chart presents the
mean value and the 95% confidence related to 500 runs for                            0.992
each maximum jobs, and exploitation level pair. In these ex-
periments, the proposed clustering method is applied when
                                                                                     0.990
the number of submitted jobs reaches 70% of the maximum                                       0.0   0.33         0.66       1.0
number of jobs and the agent accepts the Copper recom-                                                 Exploitation Level
mendation for all experiments. The restricted experiment
results for IFM, SchedSim, and Maze Runner are respec-                 Figure 9: Best results for experiments with SLA
tively presented in Figures 7, 8, and 9.                               restrictions - Maze Runner.
   The IFM experiment set examines the best effort scenario
with a limited number of jobs. In this case, instead of pre-
ferring exploration or exploration of sampling space, it is            jobs is reduced (in this case 50 jobs). Based on these results
preferable to adopt a mixed strategy to find good param-               we can conclude that for this experiment it is preferable to
eter values. For instance, for 50 jobs the experiment with             adopt a mixed strategy to evaluate the search space. There-
exploitation level of 33% presents better quality when com-            fore, experienced analysts can take advantage of combined
pared to the others. Similar to the other experiment sets,             search strategies of exploitation/exploration to achieve bet-
the variation of results quality is higher when the number of          ter results.




                                                                  12
        WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah



   For the SchedSim results (Figure 8), it is possible to ob-               gineering using many-task computing. IEEE Transac-
serve that for a reduced number of jobs (i.e., 30) the exploita-            tions on Parallel and Distributed Systems, 22(6):960–
tion level has a significant influence on the output quality.               973, 2011.
The impact of exploitation level decreases as the number of
submitted jobs increase because the probability of finding               [2] J. Chin, J. Harting, S. Jha, P. V. Coveney, A. R.
better results are higher if more jobs are executed. For this                Porter, and S. M. Pickles. Steering in computational
experiment set, it is preferable to explore the search space                 science: mesoscale modelling and simulation. Contem-
instead of concentrating the submitted jobs in the cluster                   porary Physics, 44(5):417–434, 2003.
areas. This result suggests that the best results are spread             [3] L. Colgan, R. Spence, and P. Rankin. The cock-
through the solution search space.                                           pit metaphor. Behaviour & Information Technology,
   Figure 9 presents the results of experiments with a re-                   14(4):251–263, 1995.
stricted number of jobs related to Maze Runner. Differ-
ent from the previous application, whenever the exploitation             [4] A. R. Conn, K. Scheinberg, and L. N. Vicente. Intro-
level is higher, the results have better quality. On the other               duction to derivative-free optimization, volume 8. Siam,
hand, when the SLA constraint is relaxed (more jobs are                      2009.
allowed) the impact of exploitation levels becomes less im-
portant. We can conclude that the best results are grouped               [5] B. K. Danani and B. D. D’Amora. Computational steer-
in the solution search space, then strategies that exploit in-               ing for high performance computing: applications on
termediate results can be adopted to get better results.                     blue gene/q system. In Proceedings of the Symposium
                                                                             on High Performance Computing, pages 202–209. Soci-
                                                                             ety for Computer Simulation International, 2015.
5.   CONCLUSIONS
   Several areas in science and engineering rely on many ex-             [6] E. Deelman, D. Gannon, M. Shields, and I. Taylor.
ecutions of a software system with different values for each                 Workflows and e-science: An overview of workflow sys-
supported parameter. These executions are responsible for                    tem features and capabilities. Future Generation Com-
evaluating test scenarios and calibrating models. Normally,                  puter Systems, 25(5):528–540, 2009.
users follow a workflow where they setup a set of exper-
iments, generate intermediate results, analyze them, and                 [7] H. A. D. do Nascimento and P. Eades. User hints: a
reprioritize jobs for the next batch. Reprioritization is a                  framework for interactive optimization. Future Gener-
key step as executing all possible values for all supported                  ation Computer Systems, 21(7):1177–1191, 2005.
parameters is usually not feasible, especially under cost and            [8] B. Efron and R. Tibshirani. An Introduction to the
deadline constraints determine by strict SLAs.                               Bootstrap. Chapman & Hall/CRC Monographs on
   This work introduced a tool to help users in executing                    Statistics & Applied Probability. Taylor & Francis,
their software systems by exploring their domain expertise                   1994.
and design of experiment techniques. By learning user strate-
gies on sweeping parameters, our tool is able to detect if               [9] A. Endert, M. S. Hossain, N. Ramakrishnan, C. North,
the ongoing strategy is able to meet SLA requirements and                    P. Fiaux, and C. Andrews. The human is the loop: new
suggest how jobs can be reprioritized in case an SLA viola-                  directions for visual analytics. Journal of intelligent
tion is predicted to happen while users run their submission-                information systems, 43(3):411–435, 2014.
execution-analysis workflows. Our main lessons while devel-
oping the tool and evaluating with three applications are:              [10] T. A. Feo and M. G. Resende. Greedy randomized
(i) it is important to facilitate user interaction with an au-               adaptive search procedures. Journal of global optimiza-
tomated design of experiment technique as users may bring                    tion, 6(2):109–133, 1995.
domain expertise that can reach desired results faster; (ii) as         [11] M. Garcı́a, J. Duque, P. Boulanger, and P. Figueroa.
the desired solution may be found by mixing exploration and                  Computational steering of cfd simulations using a
exploitation strategies (e.g. breadth-first search and depth-                grid computing environment. International Journal
first search), users can benefit from a tool that can identify               on Interactive Design and Manufacturing (IJIDeM),
if the ongoing strategy is correct and can meet SLA con-                     9(3):235–245, 2015.
straints; (iii) evaluations with a restricted number of exper-
iments are highly impacted by Copper suggestions. Then,                 [12] W. Gu, J. Vetter, and K. Schwan. An annotated bibli-
the proposed tool and methods are especially suitable for                    ography of interactive program steering. ACM Sigplan
evaluation of parametric applications with tight deadlines.                  Notices, 29(9):140–148, 1994.
                                                                        [13] E. Kail, P. Kacsuk, and M. Kozlovszky. A novel ap-
Acknowledgements                                                             proach to user-steering in scientific workflows. In Pro-
We would like to thank Miguel Paredes Quinones and the                       ceedings of the International Symposium on Applied
anonymous reviewers for their valuable comments. This                        Computational Intelligence and Informatics (SACI),
work has been partially supported by FINEP/MCTI un-                          pages 233–236. IEEE, 2015.
der grant no. 03.14.0062.00.
                                                                        [14] K. Khan, S. U. Rehman, K. Aziz, S. Fong, and S. Saras-
                                                                             vady. Dbscan: Past, present and future. In Appli-
REFERENCES                                                                   cations of Digital Information and Web Technologies
 [1] D. Abramson, B. Bethwaite, C. Enticott, S. Garic, and                   (ICADIWT), 2014 Fifth International Conference on
     T. Peachey. Parameter exploration in science and en-                    the, pages 232–238. IEEE, 2014.




                                                                   13
       WORKS 2016 Workshop, Workflows in Support of Large-Scale Science, November 2016, Salt Lake City, Utah



[15] G. W. Klau, N. Lesh, J. Marks, and M. Mitzen-                  [24] H. Nguyen and D. Abramson. WorkWays: Interac-
     macher. Human-guided search. Journal of Heuristics,                 tive workflow-based science gateways. In Proceedings of
     16(3):289–310, 2010.                                                the International Conference on E-Science (e-Science).
                                                                         IEEE, 2012.
[16] J. P. Kleijnen, S. M. Sanchez, T. W. Lucas, and T. M.
     Cioppa. State-of-the-art review: a user’s guide to             [25] H. A. Nguyen, D. Abramson, T. Kipouros, A. Janke,
     the brave new world of designing simulation experi-                 and G. Galloway. WorkWays: interacting with scien-
     ments. INFORMS Journal on Computing, 17(3):263–                     tific workflows. Concurrency and Computation: Prac-
     289, 2005.                                                          tice and Experience, 27(16):4377–4397, 2015.
[17] B. Ludäscher, I. Altintas, C. Berkley, D. Higgins,            [26] S. G. Parker and C. R. Johnson. SCIRun: a scientific
     E. Jaeger, M. Jones, E. A. Lee, J. Tao, and Y. Zhao.                programming environment for computational steering.
     Scientific workflow management and the kepler system.               In Proceedings of the 1995 ACM/IEEE conference on
     Concurrency and Computation: Practice and Experi-                   Supercomputing, page 52. ACM, 1995.
     ence, 18(10):1039–1065, 2006.
                                                                    [27] I. Raicu, I. Foster, M. Wilde, Z. Zhang, K. Iskra,
[18] M. Mattoso, J. Dias, K. A. Ocaña, E. Ogasawara,
                                                                         P. Beckman, Y. Zhao, A. Szalay, A. Choudhary, P. Lit-
     F. Costa, F. Horta, V. Silva, and D. de Oliveira. Dy-
                                                                         tle, et al. Middleware support for many-task comput-
     namic steering of HPC scientific workflows: A sur-
                                                                         ing. Cluster Computing, 13(3):291–314, 2010.
     vey. Future Generation Computer Systems, 46:100–113,
     2015.                                                          [28] S. Singhal, S. Aneja, F. Liu, L. V. Real, and T. George.
                                                                         IFM: A scalable high resolution flood modeling frame-
[19] D. Meignan and S. Knust. Interactive optimization with
                                                                         work. In Proceedings of the International European
     long-term preferences inference on a shift scheduling
                                                                         Conference on Parallel and Distributed Computing
     problem. In Proceedings of the 14th European Meta-
                                                                         (Euro-Par). Springer, 2014.
     heuristics Workshop, 2013.
[20] D. Meignan, S. Knust, J.-M. Frayret, G. Pesant, and            [29] L. Tweedie, B. Spence, D. Williams, and R. Bhogal.
     N. Gaud. A review and taxonomy of interactive                       The attribute explorer. In Conference companion on
     optimization methods in operations research. ACM                    Human factors in computing systems, pages 435–436.
     Transactions on Interactive Intelligent Systems (TiiS),             ACM, 1994.
     5(3):17, 2015.                                                 [30] J. J. Van Wijk, R. Van Liere, and J. D. Mulder. Bring-
[21] D. C. Montgomery. Design and analysis of experiments.               ing computational steering to the user. In Scientific
     John Wiley & Sons, 2008.                                            Visualization Conference, 1997, pages 304–304. IEEE,
                                                                         1997.
[22] J. D. Mulder, J. J. van Wijk, and R. van Liere. A
     survey of computational steering environments. Future          [31] L. A. Wolsey and G. L. Nemhauser. Integer and com-
     generation computer systems, 15(1):119–129, 1999.                   binatorial optimization. John Wiley & Sons, 2014.
[23] M. A. S. Netto, A. Breda, and O. de Souza. Scheduling          [32] H. Wright, R. H. Crompton, S. Kharche, and
     complex computer simulations on heterogeneous non-                  P. Wenisch. Steering and visualization: Enabling tech-
     dedicated machines: a case study in structural bioin-               nologies for computational science. Future Generation
     formatics. In Proceedings of the IEEE International                 Computer Systems, 26(3):506–513, 2010.
     Symposium on Cluster Computing and the Grid (CC-
     Grid 2005). IEEE, 2005.




                                                               14