=Paper= {{Paper |id=Vol-2978/saml-paper6 |storemode=property |title=Self-Adaptation for Machine Learning Based Systems |pdfUrl=https://ceur-ws.org/Vol-2978/saml-paper6.pdf |volume=Vol-2978 |authors=Maria Casimiro,Paolo Romano,David Garlan,Gabriel A. Moreno,Eunsuk Kang,Mark Klein |dblpUrl=https://dblp.org/rec/conf/ecsa/Casimiro0GMKK21 }} ==Self-Adaptation for Machine Learning Based Systems== https://ceur-ws.org/Vol-2978/saml-paper6.pdf
Self-Adaptation for Machine Learning Based Systems
Maria Casimiro1,2 , Paolo Romano2 , David Garlan1 , Gabriel A. Moreno3 , Eunsuk Kang1 and
Mark Klein3
1
  Institute for Software Research, Carnegie Mellon University, Pittsburgh, PA, USA
2
  INESC-ID, Instituto Superior Técnico, University of Lisbon, Lisbon, Portugal
3
  Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA, USA


                                             Abstract
                                             Today’s world is witnessing a shift from human-written software to machine-learned software, with the rise of systems
                                             that rely on machine learning. These systems typically operate in non-static environments, which are prone to unexpected
                                             changes, as is the case of self-driving cars and enterprise systems. In this context, machine-learned software can misbehave.
                                             Thus, it is paramount that these systems are capable of detecting problems with their machined-learned components and
                                             adapt themselves to maintain desired qualities. For instance, a fraud detection system that cannot adapt its machine-learned
                                             model to efficiently cope with emerging fraud patterns or changes in the volume of transactions is subject to losses of millions
                                             of dollars. In this paper, we take a first step towards the development of a framework aimed to self-adapt systems that rely
                                             on machine-learned components. We describe: (i) a set of causes of machine-learned component misbehavior and a set of
                                             adaptation tactics inspired by the literature on machine learning, motivating them with the aid of a running example; (ii) the
                                             required changes to the MAPE-K loop, a popular control loop for self-adaptive systems; and (iii) the challenges associated
                                             with developing this framework. We conclude the paper with a set of research questions to guide future work.

                                             Keywords
                                             Self-adaptive systems, Machine Learning, Model degradation



1. Introduction                                                                                                         diagnosis, which relies on ML for classifying types of
                                                                                                                        diseases of sick patients [4]; self-driving cars, which use
The field of self-adaptive systems (SAS) is an extensive                                                                ML to determine whether they should stop based on how
and active research area that has made steady improve-                                                                  distant they are from the car in front [5]; robots, which
ments for years. SAS react to environment changes, faults                                                               rely on ML models to predict the amount of remaining
and internal system issues to improve the system’s be-                                                                  battery power [6]; and targeted advertisement services,
havior, utility and/or dependability [1]. These systems                                                                 which rely on recommender systems to show users items
usually adopt an architecture, known as the MAPE-K                                                                      that they may find interesting [7].
loop, which monitors the system, decides when it needs                                                                     For such systems, adaptation poses a key concern. In
adaptation, selects the best course of action to improve                                                                addition to the reasons that traditional systems must
the system, and executes it [2]. The actions available                                                                  adapt (faults, changing requirements, unexpected loads,
for the system to execute are usually called tactics. The                                                               etc.), ML-based components may fail to perform as ex-
literature on SAS spans a broad range of systems such as                                                                pected, thereby reducing system utility. For instance,
enterprise systems, and cyber-physical systems (CPS).                                                                   changes in a system’s operating environment can intro-
   In parallel with the maturing of SAS research, a new                                                                 duce drifts in the input data of the ML models making
class of systems has emerged: supervised and semi-                                                                      them less accurate [8], or attacks may attempt to subvert
supervised machine learning (ML) based systems are now                                                                  the intended functionality of the system [9].
becoming ubiquitous. Such systems embed one or more                                                                        Thankfully, there is a large number of emerging
components, whose behavior is derived from training                                                                     techniques that have been developed by the ML com-
data, into a larger system containing traditional compu-                                                                munity for adapting supervised ML models and that
tational entities (web services, databases, operator inter-                                                             could in principle be used as adaptation tactics in a
faces). Examples include: fraud detection, which uses a                                                                 self-adaptive system. These range from off-line, from-
classifier to detect fraudulent transactions [3]; medical                                                               scratch model retraining and replacement, at one ex-
                                                                                                                        treme, to incremental approaches performed in-situ, at
SAML’21: International Workshop on Software Architecture and                                                            the other [10, 11, 12, 13, 14, 15]. And more techniques
Machine Learning, September 13–17, 2021, Växjö, Sweden
                                                                                                                        are being developed constantly.
" maria.casimiro@tecnico.ulisboa.pt (M. Casimiro);
romano@inesc-id.pt (P. Romano); garlan@cs.cmu.edu (D. Garlan);                                                             Unfortunately, determining when and how to take
gmoreno@sei.cmu.edu (G. A. Moreno); eunsukk@andrew.cmu.edu                                                              advantage of such tactics to perform adaptation is highly
(E. Kang); mk@sei.cmu.edu (M. Klein)                                                                                    non-trivial. First, there is a large number of possible
                                       © 2021 Copyright for this paper by Carnegie Mellon University and the authors.
                                       Use permitted under Creative Commons License Attribution 4.0 International       adaptation tactics that could potentially be applied to
                                       (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)                                          an ML component, but not all approaches work with
all forms of supervised ML models. For example, some    ing system) may rely on ML to perform a given function
training models may allow a system to selectively “forget”
                                                        (e.g., decide the tactic to execute), the actual system that
certain inputs, while others do not. Similarly, some ML is adapted (i.e. the managed system) does not rely on any
models support transfer learning to incrementally updateML component. These systems have at their disposal a
a learnt model, but not all do.                         set of tactics that, for instance, change a system’s archi-
   Second, the value of investing in improving the accu-tecture (e.g., adding/removing servers) or the quality of
racy of an ML component is strongly context-dependent   the service they provide (e.g., increasing/decreasing the
– often depending on both the domain and timing consid- rendering quality of images) in response to environment
erations. For example, while a medical diagnosis system changes. Usually, tactic outcomes have some uncertainty
may support model retraining at run time, the latency   that can be modeled via probabilistic methods given as-
of this tactic may make it infeasible for self-driving cars,
                                                        sumptions on the underlying hardware/software plat-
which rely instead on swifter tactics (such as replacingforms and their characteristics. Further, one can measure
the ML component entirely) that can address real-time   the properties of such systems through the use of metrics
system response requirements. In a different mode of op-such as latency, throughput and content quality.
eration, however, both types of tactics may be available,  Determining the costs and benefits of such adaptation
e.g., if the self-driving car is stopped (parked mode of op-
                                                        tactics has been well researched and there are numerous
eration), it may be feasible to retrain an underperforming
                                                        techniques and algorithms for that end [17]. However,
model without compromising safety.                      new challenges arise when considering managed systems
   Third, calculating the costs and benefits of these tactics
                                                        that depend on ML models. Not only are we missing a
is difficult, particularly in a whole-system context, where
                                                        well-understood and generally applicable set of tactics
improving a particular component’s performance may      that SAS can use to adapt ML-based systems, but also
or may not improve overall system utility. Costs includethe properties of ML components, such as accuracy and
time, resources (processing, memory, power), and servicefairness, may not change consistently with the tactic that
disruption. Benefits derive for instance from increased is executed. For example, if we retrain an ML model, its
accuracy or fairness of the ML component, which can in  accuracy is not always affected in the same way, but may
turn lead to better performing down-stream components   depend on the samples available to retrain the model, on
and support overall business goals (e.g. by improving ad-
                                                        the duration of the retraining process, and on the model’s
vertisement revenue). Both costs and benefits can be hard
                                                        hyper-parameters. Similarly, model fairness may also be
to quantify, however, and hence to reason about when de-affected in different ways due to the training samples
termining whether an ML adaptation tactic makes sense.  that are fed during re-training [18].
   We argue, therefore, that in order to harness the po-   To improve the self-adaptive capabilities of systems
tential of the rich space of ML adaptation mechanisms, it
                                                        and their performance, recent research has proposed
is necessary to develop methods that can reason about   SASs that rely on ML techniques and models to adapt the
which tactics are available to adapt the ML component,  system [19, 20]. Specifically, ML is used in the adaptation
which are the most effective to employ in a given context
                                                        manager to: update adaptation policies, predict resource
so that system utility is maximized, and how to integrate
                                                        usage, update run-time models, reduce adaptation spaces,
them into modern adaptive systems architectures. Specif-predict anomalies, and collect knowledge. Additionally,
ically, in this paper we attempt to bring some clarity to
                                                        learning is typically leveraged to improve the Analysis
this emerging but critical aspect of SAS by outlining (i)
                                                        and Plan components of the MAPE-K loop [19].
a set of causes of ML component performance degrada-       In this paper, we focus on the problem of how to
tion and a set of adaptation tactics derived from research
                                                        leverage self-adaptation to correct and adapt supervised
on ML (§ 3); (ii) architectural and algorithmic changes ML components of a managed system, while increas-
required to incorporate effective ML adaptation into theing overall utility of ML-based systems when their ML
MAPE-K loop, a popular framework for monitoring and     components are underperforming. This vision is aligned
controlling self-adaptive systems (§ 4); and (iii) the mod-
                                                        with the one presented by Bures [21] in which the au-
eling and engineering challenges associated with realiz-thor claims that “self-adaptation should stand in equal-to-
ing the full potential for adaptation of ML-based systems
                                                        equal relationship to AI. It should both benefit from AI and
(§ 4). We conclude with a set of open research questions.
                                                        enable AI.” Extending this vision further, we argue that
                                                        the techniques developed in this context could also be
                                                        applied, in a recursive fashion, to self-adapt adaptation
2. Background & Related Work                            managers that rely on ML components to enhance their
                                                        effectiveness and robustness. For instance a planner that
Current literature on SAS focuses on managed systems
                                                        relies on ML to reduce the adaptation space could have
that do not embed (nor rely upon) ML models [16]. That
                                                        its own self-adaptation manager to ensure that the ML
is, although the self-adaptation mechanism (i.e. manag-
                                                        component is working as expected.
   The vision presented in this paper differs from work       may introduce higher latencies that compromise SLAs.
on collective SAS since we are targeting systems with         However, the impact of these mispredictions varies not
only one agent and with a centralized learning process,       only from client to client, with whom different SLAs may
whereas this line of research focuses on systems with mul-    have been agreed upon, but also in time, since during
tiple agents that can share knowledge with each other.        specific periods, e.g., Black Friday, the volume of transac-
   Differently, our vision ties in the field of life-         tions is substantially altered. During busy days such as
long/continual learning [22, 23], which deals with open-      these, adapting the ML models responsible for fraud de-
world problems, with the field of self-adaptive systems. In   tection so that they are less strict and reduce false alarms
fact, dealing with open-world changes was identified by       is crucial in order to preserve system utility. However,
Gheibi et. al. [19] as an open problem in the SAS domain.     this adaptation entails a delicate trade-off, since less strict
Specifically, Lifelong Learning deals with the problem of     models can allow fraudulent transactions to be accepted.
leveraging past knowledge to learn a new task better and      Further, these systems are subject to constantly evolving
Continual Learning is focused on solving the problem of       fraud patterns, to which the ML models must adapt [24].
maintaining the accuracy of old tasks when learning new
tasks [23]. The techniques developed in this domain can       3.2. Causes of Degradation of ML
be leveraged by SASs to improve ML components when
unexpected changes occur in the environment or when
                                                                   Components’ Accuracy
the performance of the ML component is degraded and af-       We now focus on problems that deteriorate the perfor-
fects overall system utility. Overall, our focus is on SASs   mance of ML components such that they are no longer
and on how to integrate techniques from these research        able to maintain system utility at a desired level. In par-
domains into a generic, yet rigorous/principled frame-        ticular, we present two classes of problems, which, we
work that can decide which ML component to adapt, how         argue, are general enough to be representative of most
and when. The next section provides details on possible       of the issues addressed by the existing ML literature.
causes of ML component degradation and repair tactics
inspired by this field of research.                         Data-set Shift. When the distribution of the inputs to
                                                            a model changes, such that it becomes substantially differ-
                                                            ent from the distribution on which the model was trained,
3. Adaptation of ML-based                                   we find ourselves in the presence of a problem commonly
     Systems                                                known as data-set shift [8, 11, 10, 25]. As recent work has
                                                            shown, not all data-set shifts are malign [10]. As such,
We now motivate the need for self-adaptive ML-based an effective SAS should not only detect shifts, but also
systems through an example from the enterprise systems be able to assess their actual impact on system utility.
domain. Then, we present a set of possible causes for In a fraud detection system, data-set shift occurs when
ML component performance degradation and a set of new fraud patterns emerge (e.g., charges at a particular
adaptation tactics.                                         merchant), or when patterns of legitimate transactions
                                                            change, for instance due to busy shopping days like Black
3.1. Running Example – Fraud Detection Friday and Christmas [24]. Although the actual features
       System.                                              used for classification may not change, their distribution
                                                            does. This means that different values of the features
Consider a fraud detection system that relies on ML mod- now characterize legitimate and fraudulent transactions.
els for scoring credit/debit card transactions. The score
attributed by the ML model is then used by a rule-based Incorrect Data. This problem arises when there are
model to decide whether transactions are legitimate or samples in the model’s training set that are incorrectly
fraudulent. Typical clients of companies that provide labeled [26] or when test data is tampered with, thus lead-
fraud detection services are banks and merchants. In ing the model to mispredict when certain inputs arrive.
this setting, system utility is typically defined based on The former can happen, for instance, when unsupervised
attributes such as the cost of losing clients due to in- techniques are used to label examples in order to boot-
correctly declined transactions, fairness (no client is de- strap the training set of a second supervised model [26].
clined more often) [18] and the overall cost of service Incorrect data can also make their way into a model’s
level agreement (SLA) violations (these systems have training set due to attackers that intentionally pollute
strict SLAs to process transactions in real time, e.g. at it so as to cause the ML component to incorrectly pre-
most 200ms on the 99.999th percentile of the latencies’ dict outputs for certain inputs [12, 9]. For instance, in
distribution [3]). While cost and revenue are directly the fraud detection case, security breaches could lead to
affected by ML model’s mispredictions, response time is
affected by model complexity, i.e., more complex models
Table 1
Examples of general adaptation tactics for ML-based systems with their strengths (‘+’) and weaknesses (‘–’).
          Tactic    Description                                        Properties

     Component      Replace an under-performing                        + Fast and inexpensive, when possible
                    component by one that better                       – Non ML-based estimators may not be available in all scenarios
    Replacement
                    matches the current environment                    – Alternative estimators, when available, may be more robust but less precise
                    Rely on a human to classify some incoming          + Accuracy of human-based labels expected to be high
   Human-based
                    samples or to correct the labeling                 – Expert knowledge may be expensive to obtain
    Labeling [14]
                    of samples in the training set                     and/or introduce unacceptable latency
                    Reuse knowledge gathered previously                + Less data-hungry than plain retrain
         Transfer
                    on different tasks/problems to                     – Effectiveness dependent on the similarities between old and new tasks/data
    Learning [27]
                    accelerate the learning of new tasks               – Computationally intensive process
                    Remove samples that are no longer representative   + Fast when ratio between data to forget and data-set size is small
  Unlearning [13]
                    from the training set and from the model           – Cost/latency for identifying examples to unlearn can be large and context-dependent

                    Retrain with new data and maybe                    + Generic and robust method
                                                                       – Effective only once a relatively large number of instances of the new data are available
     Retrain [15]   choose new values for the
                                                                       – Computationally intensive process
                    ML model’s hyper-parameters
                                                                       – Accuracy and latency of the retrain process may vary significantly


poisoning the data used for training ML models, hence           components or giving them correct samples [14]. For
causing them to make incorrect predictions.                     instance, whenever the ML component suspects a trans-
                                                                action of being fraudulent, it can be automatically can-
3.3. Repair Tactics                                             celed. Then, the user can be informed of the decision
                                                                and asked whether the transaction should be authorized
Table 1 illustrates a collection of tactics that can be used or declined in the future. Another possibility is to add
to deal with issues introduced by ML-based components. humans to the loop when adding samples to the ML com-
These tactics were inspired by research on ML [22, 14, ponent’s training set. In this scenario, an expert can be
27, 13, 15]. Next, we describe the tactics presented in the asked to review the most uncertain classifications so as
table, motivating them with scenarios in which they can to improve the quality of the training samples. In the
be applied and discussing their costs and benefits.             former scenario, the benefits are easily quantifiable, since
                                                                the risk of accepting a possibly fraudulent transaction
Component replacement. This tactic assumes the can be measured via its economic value. However, users
existence of a repository of components and respective may get annoyed if their transactions are canceled too
meta-data that can be analyzed to determine if there ex- often, to the extent that they may stop purchasing using
ists a component that is better suited for the current that credit card provider. As for relying on experts to
system state. For example, when the volume of transac- review uncertain classifications, having an on-demand
tions changes, for instance in special days such as Black expert performing this task is expensive and the latency
Friday, ML models may consider the increased frequency of the manual labeling process may be unacceptable.
of transactions as an indicator of fraud and erroneously
flag legitimate transactions as fraudulent. Such mispre- Transfer learning. Transfer learning (TL) techniques
dictions can lead to significant financial losses [3], thus re- leverage knowledge obtained when performing previous
quiring timely fixes and rendering the use of high latency tasks that are similar to the current one so that learning
tactics infeasible (note that in this context, transactions the current task becomes easier [27]. Suppose that: (i)
need to be accepted/rejected within milliseconds [3]). As a fraud detection company has a set of clients (such as
such, only low latency tactics can be applied. An example banks), (ii) the company has a unique ML model for each
is to replace the underperforming models with rule-based client, so that it complies with data privacy regulations1 ,
models, e.g., developed by experts for specific situations, and (iii) one of its clients is affected by a new attack pat-
and/or to switch to previously trained models that are tern, which is eventually learned by that client’s model.
known to perform well in similar conditions. A benefit of In this scenario, TL techniques [29, 27] can be used to
this tactic, whenever it is available, is too enable a swift improve the other clients’ models so that they can react
reaction to data set shifts. Its main cost depends on the la- to the same attack. Estimating the benefits of executing
tency and resources used for the analysis of the candidate this tactic for a given client boils down to estimating
replacing components available in the repository.               the likelihood that this client may suffer the same attack.
                                                                Yet, the execution of this tactic typically implies high
Human-based labeling. Humans are often able to computational costs (e.g., if cloud resources are used)
recognize patterns, problems, and objects more accu-                 1
                                                                       Since privacy is important in this domain, there are techniques
rately than ML components [14]. Thus, depending on
                                                                that can be used to deal with the problem of ensuring data confiden-
the domain, humans may play a role in correcting these tiality and anonymity in information transfer between clients [28].
and non-negligible latency, which may render this tactic
economically unfavorable, or even inadequate, e.g., if
the attack on a different client is imminent and the TL
process is slow.

Unlearning. This tactic corresponds to unlearning
data that no longer reflects the current environment/state
of the system and its lineage, thus eliminating the effect
of that data on current predictions [13], while avoiding
a full model retrain. A key problem that stands in the
way of the execution of this tactic is the identification of
incorrect labels. For instance, in a fraud detection system,
incorrectly classified transactions may all be eventually
identified for “free”, although with large latencies, when
users review their credit card statements. Conversely, in
scenarios in which the identification of incorrect sam-
ples is not readily available, one may leverage automatic
techniques, such as the one described in [30], which are
faster but typically less accurate. As such, the cost and      Figure 1: MAPE-K loop over an ML-based system with a mix
                                                               of ML and non-ML components, with specific challenges for
complexity of this task vary depending on the context.
                                                               each MAPE-K stage. White arrows represent dependencies
Then, after identifying the incorrect samples, the model       between components.
must be updated to accurately reflect the correct data. At
this point, the advantage of unlearning techniques with
respect to a typical full model retrain is the time savings
(up to 9.5 × 104 ) that can be achieved [13].               MAPE-K loop actuates over a system composed of non-
                                                            ML and ML components (Figure 1) we argue that each
Retrain and/or hyper-parameter optimization. stage of the MAPE-K loop should be revised to effectively
This is a general tactic that involves retraining the model leverage tactics such as the ones mentioned.
with new data that reflects recent relevant data-set
drifts, e.g., a new kind of attack in a fraud detection 4.1. Monitor
system. There are many types of retraining, ranging
                                                            The Monitor stage has to keep track of the inputs used
from a simple model refresh (incorporate new data
                                                            when querying ML components because shifts of the
using old hyper-parameters), to a full retrain (including
                                                            input distributions may affect the predictions. For in-
hyper-parameter optimization, possibly encompassing
                                                            stance, the detection of out-of-distribution inputs may
different model types/architectures), which imply
                                                            mean that there has been a change in the environment
different computational costs and can benefit model’s
                                                            and thus the model used by some ML component may
accuracy at different extents. In the presence of data-set
                                                            no longer be representative of the current environment.
shift, when there is new data that already incorporates
                                                            The challenge here is not only detecting the occurrence
the new input distribution, this tactic often represents a
                                                            of shifts in a timely and reliable fashion, but also how
simple, yet possibly expensive, approach to deal with
                                                            to effectively characterize them — since different types
this problem. The benefits of this tactic are dependent on
                                                            of shifts require different reaction methods. As in other
the type of retrain process and on the quality of the new
                                                            SAS, typical attributes that contribute to the system’s
data. As for its cost, if retraining is performed on the
                                                            utility (e.g., latency, throughput) or the satisfaction of re-
cloud, it can be directly converted to the economic cost
                                                            quired system properties must be monitored. In addition
of renting the virtual machines and several techniques
                                                            to these, the Monitor stage must also gather the outputs
exist to predict such costs [31, 32].
                                                            of the ML component to account for situations in which
                                                            changes in the inputs go by unnoticed, perhaps because
4. MAPE-K Loop for ML-Based                                 they are too slow, but that manifest themselves faster in
                                                            the outputs [33]. Examples of outputs to monitor are, for
     Systems                                                instance, shifts in the output distribution, model’s accu-
                                                            racy and error – obtained by comparing predictions with
In SAS, the MAPE-K loop typically actuates over a system
                                                            real outcomes. A relevant challenge here is that often real
composed of non-ML components. To enable the devel-
                                                            outcomes are only known after a long time, if ever. For in-
opment of self-adaptive ML-based systems, in which the
                                                            stance, in fraud detection, false negatives (i.e., undetected
real fraud) are known only when users file a complaint         approaches[38]. An additional concern is that some of
and false positives are normally undetectable (since no        these tactics may require a considerable use of resources
feedback is obtained for transactions that are legitimate      to execute, either in the system itself or offloaded. This
but rejected by the system). Approaches such as those          requires Plan to account for this impact or cost.
proposed in [33, 11, 34] provide a good starting point            For ML-based systems that rely on multiple ML com-
for the implementation of a Monitor for self-adaptive          ponents, whenever a system property is (expected to be)
ML-based systems.                                              violated or when system utility decreases, fault localiza-
                                                               tion may be required to understand which component is
Challenges. Monitoring input and output distribu- underperforming and should be repaired/replaced [39].
tions requires keeping track of a multitude of features
and parameters which would otherwise be disregarded. Challenges. Although there are several ap-
This is already challenging due to the amount of data that proaches [31, 40] that attempt to predict the time/cost
needs to be stored, maintained, and analyzed. Finding of training ML models, this is a complex problem
suitable frequencies to gather these data and adapting that is strongly influenced by the type of ML models
them in the face of evolving time constraints is an even considered, their hyper-parameters and the underlying
bigger challenge in time-critical domains [35, 11].            (cloud) infrastructure. These techniques represent a
                                                               natural starting point to estimate the costs and benefits
4.2. Analyze                                                   of adaptation tactics such as the ones presented. Yet,
                                                               developing techniques for predicting the costs/benefits
The Analyze stage is responsible for determining whether of complex tactics, e.g. unlearning, remains an open
degradations of the prediction quality of ML components challenge. One interesting direction is to exploit
are affecting (or predicted to affect) other system com- techniques for estimating the uncertainty [25] of ML
ponents and system utility to such an extent that adap- models to quantify both the likelihood of models’ mispre-
tation may be required. To accomplish this, one can dictions as well as the potential benefits deriving from
leverage techniques developed by the ML community to employing corrective adaptation tactics. Certain ML
detect possible issues in the inputs and outputs of the models can directly estimate their own uncertainty [41],
model [8, 11, 10, 33], errors in its training set [36] and the or additional techniques (e.g. ensembles [42]) can be
appearance of new features relevant for prediction [37]. used to obtain uncertainty estimations. Still, existing
These techniques must then be adjusted for the particular techniques can suffer from significant shortcomings in
case of each system, which includes adapting them to practical settings [25].
different ML models and tasks.                                    Finally, tactics that modify ML components are compu-
                                                               tationally expensive (e.g., non-negligible latency). Thus,
Challenges. Estimating the impact of an ML compo- Plan must have mechanisms to verify that the system can
nent on other system components and on system utility execute the tactic without compromising other compo-
can be challenging because often (mis)predictions affect nents/properties, or even the entire system.
the system’s utility/dependability in ways that are not
only application- but also context-dependent. For in- 4.4. Execute
stance, during periods with higher transaction volumes,
such as on Black Friday, mispredictions have higher im- To execute a given adaptation tactic, the Execute stage
pact on system utility, since during these periods it is must have access to mechanisms to improve or replace
more critical to accurately detect fraud, while maximizing the ML component and/or its training set. As in the
accepted transactions. Architectural models can capture conventional MAPE-K loop, we require implementations
the information flows among components, but the chal- of adaptation tactics that are not only efficient to execute,
lenge is to estimate how the uncertainty in the output of but also have predictable costs/benefits and are resilient
the ML components propagates throughout the system. to run-time exceptions.

4.3. Plan                                                   Challenges. A key challenge is how to enhance the
                                                            predictability of the execution of the ML adaptation tac-
The Plan stage is responsible for identifying which adap- tics, which often require the processing of large volumes
tation tactics (if any) to employ to address issues with of data (e.g., to re-train a large scale model) possibly
ML components affecting the system. As with other self- under stringent timing constraints. We argue that the
adaptation approaches, this reasoning should consider community of SAS would benefit from the availability
the costs and benefits of each viable tactic. Further, most of open-source software frameworks that implement a
of the proposed tactics have a non-negligible latency, range of generic adaptation tactics for ML components.
which needs to be accounted for as in latency-aware
This would allow one to mask complexity, promote inter- Acknowledgments
operability and comparability of SAS. Further, it would
also provide an opportunity to assemble, in a common Support for this research was provided by Fundação para
framework, techniques that have been proposed over a Ciência e a Tecnologia (Portuguese Foundation for
many years in different areas of the AI/ML literature.      Science and Technology) through the Carnegie Mellon
                                                            Portugal Program under Grant SFRH/BD/150643/2020
                                                            and via projects with references POCI-01-0247-
4.5. Knowledge                                              FEDER-045915, POCI-01-0247-FEDER-045907, and
Finally, the Knowledge module is responsible for main- UIDB/50021/2020. This material is based upon work
taining information that reflects what is known about funded and supported by the Department of Defense
the environment and the system. For ML-based systems, under Contract No. FA8702-15-D-0002 with Carnegie
the Knowledge component should evolve in order to keep Mellon University for the operation of the Software
track of the costs/benefits of each tactic on the affected Engineering Institute, a federally funded research and
ML components and system’s utility. This corresponds development center. DM21-0052
to: gathering knowledge on how each tactic altered an
ML component and on the context in which the tactic
was executed; and meta information on training sets, for References
instance characterizing the most important features for
                                                             [1] B. H. C. Cheng, et al., Software Engineering for Self-
predicting the costs and benefits of the different tactics.
                                                                 Adaptive Systems: A Research Roadmap, Springer,
This added knowledge should be leveraged to improve
                                                                 2009.
the decision making process and, thus, improve adapta-
                                                             [2] J. O. Kephart, D. M. Chess, The vision of autonomic
tion. By gathering knowledge on how each tactic altered
                                                                 computing, Computer 36 (2003).
an ML component and on the context in which the tactic
                                                             [3] B. Branco, et al., Interleaved sequence rnns for
was executed, the Analyze and Plan stages can take more
                                                                 fraud detection, in: Procs. of KDD, 2020.
effective decisions on when to adapt and which tactic to
                                                             [4] B. J. Erickson, et al., Machine learning for medical
execute, respectively. Finally, for a tactic that replaces
                                                                 imaging, Radiographics 37 (2017).
underperforming ML components with non ML-based
                                                             [5] Z. Chen, X. Huang, End-to-end learning for lane
ones, Knowledge must contain a repository of the avail-
                                                                 keeping of self-driving cars, in: Procs. of IV, 2017.
able components and their meta-data. This meta-data, we
                                                             [6] P. Jamshidi, et al., Machine learning meets quan-
argue, should provide information to enable reasoning
                                                                 titative planning: Enabling self-adaptation in au-
on whether the necessary preconditions to enable a safe
                                                                 tonomous robots, in: Procs. of SEAMS, 2019.
and timely reconfiguration hold.
                                                             [7] H.-T. Cheng, et al., Wide & deep learning for rec-
                                                                 ommender systems, in: Procs. of DLRS, 2016.
5. Conclusions and Future Work                               [8] J. Quionero-Candela, et al., Dataset shift in machine
                                                                 learning, The MIT Press, 2009.
This work introduced a vision for a new breed of self- [9] T. Gu, et al., Badnets: Evaluating backdooring
adaptive frameworks that brings together techniques              attacks on deep neural networks, IEEE Access 7
developed by the ML literature (used here as adaptation          (2019).
tactics), and reasons about the cost/benefits trade offs of [10] S. Rabanser, et al., Failing loudly: An empirical
each, with the end goal of adapting degraded ML com-             study of methods for detecting dataset shift, in:
ponents of ML-based systems to maintain system utility.          Procs. of NIPS, 2019.
With the aid of a running example we showed how dif- [11] F. Pinto, et al., Automatic model monitoring for data
ferent adaptation tactics can be applied to repair ML            streams, arXiv preprint arXiv:1908.04240 (2019).
models when different real-life situations hinder system [12] L. Huang, et al., Adversarial machine learning, in:
utility. Further, we identified a set of key requirements        Procs. of AISec, 2011.
that should be supported by the various elements of the [13] Y. Cao, J. Yang, Towards making systems forget
classic MAPE-K control loop and a set of challenging             with machine unlearning, in: Procs. of S&P, IEEE,
research problems. Finally, we highlight the following           2015.
research questions as directions for future work: (i) How [14] B. Miller, et al., Reviewer integration and perfor-
to estimate the costs and benefits of each tactic? (ii) How      mance measurement for malware detection, in:
to reason about the impact of ML mispredictions on sys-          Procs. of DIMVA, 2016.
tem utility? (iii) How do changes to one ML component [15] Y. Wu, et al., DeltaGrad: Rapid retraining of ma-
impact the other components in the system? (iv) How to           chine learning models, in: Procs. of ICML, 2020.
reason about the long-term impacts of adaptation tactics
on system utility?
[16] C. Krupitzer, et al., A survey on engineering ap-            tion, Springer, 2018.
     proaches for self-adaptive systems (2018).              [36] Z. Abedjan, et al., Detecting data errors: Where are
[17] K. Ervasti, A survey on network measurement:                 we and what needs to be done?, Procs. of VLDB 9
     Concepts, techniques, and tools (2016).                      (2016).
[18] A. F. Cruz, et al., A bandit-based algorithm            [37] D. Papamartzivanos, et al., Introducing deep learn-
     for fairness-aware hyperparameter optimization,              ing self-adaptive misuse network intrusion detec-
     CoRR abs/2010.03665 (2020).                                  tion systems, IEEE Access 7 (2019).
[19] O. Gheibi, et al., Applying machine learning in self-   [38] G. A. Moreno, et al., Flexible and efficient decision-
     adaptive systems: A systematic literature review,            making for proactive latency-aware self-adaptation,
     arXiv preprint arXiv:2103.04112 (2021).                      ACM Trans. Auton. Adapt. Syst. 13 (2018).
[20] T. R. D. Saputri, S.-W. Lee, The application of ma-     [39] A. Christi, et al., Evaluating fault localization for
     chine learning in self-adaptive systems: A system-           resource adaptation via test-based software modifi-
     atic literature review, IEEE Access 8 (2020).                cation, in: Procs. of QRS, 2019.
[21] T. Bureš, Self-adaptation 2.0, in: 2021 International   [40] O. Alipourfard, et al., Cherrypick: Adaptively un-
     Symposium on Software Engineering for Adaptive               earthing the best cloud configurations for big data
     and Self-Managing Systems (SEAMS), 2021.                     analytics, in: Procs. of NSDI, 2017.
[22] D. L. Silver, Q. Yang, L. Li, Lifelong machine learn-   [41] M. A. Osborne, et al., Gaussian processes for global
     ing systems: Beyond learning algorithms, in: 2013            optimization, in: LION, 2009.
     AAAI spring symposium series, 2013.                     [42] L. Breiman, Bagging predictors, in: Machine Learn-
[23] B. Liu, Learning on the job: Online lifelong and con-        ing, volume 24, Springer, 1996.
     tinual learning, in: Procs. of the AAAI Conference
     on Artificial Intelligence, volume 34, 2020.
[24] D. Aparício, et al., Arms: Automated rules man-
     agement system for fraud detection, arXiv preprint
     arXiv:2002.06075 (2020).
[25] Y. Ovadia, et al., Can you trust your model's un-
     certainty? evaluating predictive uncertainty under
     dataset shift, in: Procs. of NIPS, 2019.
[26] D. Wu, et al., A highly accurate framework for self-
     labeled semisupervised classification in industrial
     applications, IEEE TII 14 (2018).
[27] S. J. Pan, Q. Yang, A survey on transfer learning,
     IEEE TKDE 22 (2009).
[28] Y. Liu, et al., A secure federated transfer learning
     framework, Procs. of IS 35 (2020).
[29] K. Swersky, et al., Multi-task bayesian optimization,
     Procs. of NIPS 26 (2013).
[30] Y. Cao, et al., Efficient repair of polluted machine
     learning systems via causal unlearning, in: Procs.
     of Asia CCS, 2018.
[31] M. Casimiro, et al., Lynceus: Cost-efficient tuning
     and provisioning of data analytic jobs, in: Procs. of
     ICDCS, 2020.
[32] P. Mendes, et al., TrimTuner: Efficient optimiza-
     tion of machine learning jobs in the cloud via sub-
     sampling, in: MASCOTS, 2020.
[33] X. Zhou, et al., A Framework to Monitor Machine
     Learning Systems Using Concept Drift Detection,
     Springer, 2019.
[34] Z. Yang, M. H. Asyrofi, D. Lo, BiasRV: Uncovering
     biased sentiment predictions at runtime, CoRR
     abs/2105.14874 (2021). arXiv:2105.14874.
[35] E. Bartocci, et al., Specification-based monitoring
     of cyber-physical systems: a survey on theory, tools
     and applications, in: Lectures on Runtime Verifica-