1. Introduction

Self-Adaptation for Machine Learning Based Systems

Maria Casimiro

0 1

Paolo Romano

David Garlan

Gabriel A. Moreno

Eunsuk Kang

Mark Klein

2 0 INESC-ID, Instituto Superior Técnico, University of Lisbon , Lisbon , Portugal 1 Institute for Software Research, Carnegie Mellon University , Pittsburgh, PA , USA 2 Software Engineering Institute, Carnegie Mellon University , Pittsburgh, PA , USA

Today's world is witnessing a shift from human-written software to machine-learned software, with the rise of systems that rely on machine learning. These systems typically operate in non-static environments, which are prone to unexpected changes, as is the case of self-driving cars and enterprise systems. In this context, machine-learned software can misbehave. Thus, it is paramount that these systems are capable of detecting problems with their machined-learned components and adapt themselves to maintain desired qualities. For instance, a fraud detection system that cannot adapt its machine-learned model to eficiently cope with emerging fraud patterns or changes in the volume of transactions is subject to losses of millions of dollars. In this paper, we take a first step towards the development of a framework aimed to self-adapt systems that rely on machine-learned components. We describe: (i) a set of causes of machine-learned component misbehavior and a set of adaptation tactics inspired by the literature on machine learning, motivating them with the aid of a running example; (ii) the required changes to the MAPE-K loop, a popular control loop for self-adaptive systems; and (iii) the challenges associated with developing this framework. We conclude the paper with a set of research questions to guide future work.

eol>Self-adaptive systems Machine Learning Model degradation

1. Introduction

diagnosis, which relies on ML for classifying types of diseases of sick patients [4]; self-driving cars, which use The field of self-adaptive systems (SAS) is an extensive ML to determine whether they should stop based on how and active research area that has made steady improve- distant they are from the car in front [5]; robots, which ments for years. SAS react to environment changes, faults rely on ML models to predict the amount of remaining and internal system issues to improve the system’s be- battery power [6]; and targeted advertisement services, havior, utility and/or dependability [1]. These systems which rely on recommender systems to show users items usually adopt an architecture, known as the MAPE-K that they may find interesting [7]. loop, which monitors the system, decides when it needs For such systems, adaptation poses a key concern. In adaptation, selects the best course of action to improve addition to the reasons that traditional systems must the system, and executes it [2]. The actions available adapt (faults, changing requirements, unexpected loads, for the system to execute are usually called tactics. The etc.), ML-based components may fail to perform as exliterature on SAS spans a broad range of systems such as pected, thereby reducing system utility. For instance, enterprise systems, and cyber-physical systems (CPS). changes in a system’s operating environment can intro

In parallel with the maturing of SAS research, a new duce drifts in the input data of the ML models making class of systems has emerged: supervised and semi- them less accurate [8], or attacks may attempt to subvert supervised machine learning (ML) based systems are now the intended functionality of the system [9]. becoming ubiquitous. Such systems embed one or more Thankfully, there is a large number of emerging components, whose behavior is derived from training techniques that have been developed by the ML comdata, into a larger system containing traditional compu- munity for adapting supervised ML models and that tational entities (web services, databases, operator inter- could in principle be used as adaptation tactics in a faces). Examples include: fraud detection, which uses a self-adaptive system. These range from of-line, fromclassifier to detect fraudulent transactions [ 3]; medical scratch model retraining and replacement, at one extreme, to incremental approaches performed in-situ, at SAML’21: International Workshop on Software Architecture and the other [10, 11, 12, 13, 14, 15]. And more techniques Machine Learning, September 13–17, 2021, Växjö, Sweden are being developed constantly. " maria.casimiro@tecnico.ulisboa.pt (M. Casimiro); romano@inesc-id.pt (P. Romano); garlan@cs.cmu.edu (D. Garlan); Unfortunately, determining when and how to take gmoreno@sei.cmu.edu (G. A. Moreno); eunsukk@andrew.cmu.edu advantage of such tactics to perform adaptation is highly (E. Kang); mk@sei.cmu.edu (M. Klein) non-trivial. First, there is a large number of possible ©Us2e02p1erCmoipttyerdiguhntdfoerr tChriesaptaivpeerCboymCmaornnesgLieicMenesleloAntUtrnibivuetriosinty4a.0ndInttheernaauttihoonrasl. adaptation tactics that could potentially be applied to CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g (CCCEBUY R4.0)W.orkshop Proceedings (CEUR-WS.org) an ML component, but not all approaches work with all forms of supervised ML models. For example, some ing system) may rely on ML to perform a given function training models may allow a system to selectively “forget” (e.g., decide the tactic to execute), the actual system that certain inputs, while others do not. Similarly, some ML is adapted (i.e. the managed system) does not rely on any models support transfer learning to incrementally update ML component. These systems have at their disposal a a learnt model, but not all do. set of tactics that, for instance, change a system’s archi

Second, the value of investing in improving the accu- tecture (e.g., adding/removing servers) or the quality of racy of an ML component is strongly context-dependent the service they provide (e.g., increasing/decreasing the – often depending on both the domain and timing consid- rendering quality of images) in response to environment erations. For example, while a medical diagnosis system changes. Usually, tactic outcomes have some uncertainty may support model retraining at run time, the latency that can be modeled via probabilistic methods given asof this tactic may make it infeasible for self-driving cars, sumptions on the underlying hardware/software platwhich rely instead on swifter tactics (such as replacing forms and their characteristics. Further, one can measure the ML component entirely) that can address real-time the properties of such systems through the use of metrics system response requirements. In a diferent mode of op- such as latency, throughput and content quality. eration, however, both types of tactics may be available, Determining the costs and benefits of such adaptation e.g., if the self-driving car is stopped (parked mode of op- tactics has been well researched and there are numerous eration), it may be feasible to retrain an underperforming techniques and algorithms for that end [17]. However, model without compromising safety. new challenges arise when considering managed systems

Third, calculating the costs and benefits of these tactics that depend on ML models. Not only are we missing a is dificult, particularly in a whole-system context, where well-understood and generally applicable set of tactics improving a particular component’s performance may that SAS can use to adapt ML-based systems, but also or may not improve overall system utility. Costs include the properties of ML components, such as accuracy and time, resources (processing, memory, power), and service fairness, may not change consistently with the tactic that disruption. Benefits derive for instance from increased is executed. For example, if we retrain an ML model, its accuracy or fairness of the ML component, which can in accuracy is not always afected in the same way, but may turn lead to better performing down-stream components depend on the samples available to retrain the model, on and support overall business goals (e.g. by improving ad- the duration of the retraining process, and on the model’s vertisement revenue). Both costs and benefits can be hard hyper-parameters. Similarly, model fairness may also be to quantify, however, and hence to reason about when de- afected in diferent ways due to the training samples termining whether an ML adaptation tactic makes sense. that are fed during re-training [18].

We argue, therefore, that in order to harness the po- To improve the self-adaptive capabilities of systems tential of the rich space of ML adaptation mechanisms, it and their performance, recent research has proposed is necessary to develop methods that can reason about SASs that rely on ML techniques and models to adapt the which tactics are available to adapt the ML component, system [19, 20]. Specifically, ML is used in the adaptation which are the most efective to employ in a given context manager to: update adaptation policies, predict resource so that system utility is maximized, and how to integrate usage, update run-time models, reduce adaptation spaces, them into modern adaptive systems architectures. Specif- predict anomalies, and collect knowledge. Additionally, ically, in this paper we attempt to bring some clarity to learning is typically leveraged to improve the Analysis this emerging but critical aspect of SAS by outlining (i) and Plan components of the MAPE-K loop [19]. a set of causes of ML component performance degrada- In this paper, we focus on the problem of how to tion and a set of adaptation tactics derived from research leverage self-adaptation to correct and adapt supervised on ML (§ 3); (ii) architectural and algorithmic changes ML components of a managed system, while increasrequired to incorporate efective ML adaptation into the ing overall utility of ML-based systems when their ML MAPE-K loop, a popular framework for monitoring and components are underperforming. This vision is aligned controlling self-adaptive systems (§ 4); and (iii) the mod- with the one presented by Bures [21] in which the aueling and engineering challenges associated with realiz- thor claims that “self-adaptation should stand in equal-toing the full potential for adaptation of ML-based systems equal relationship to AI. It should both benefit from AI and (§ 4). We conclude with a set of open research questions. enable AI.” Extending this vision further, we argue that the techniques developed in this context could also be applied, in a recursive fashion, to self-adapt adaptation 2. Background & Related Work managers that rely on ML components to enhance their efectiveness and robustness. For instance a planner that Current literature on SAS focuses on managed systems relies on ML to reduce the adaptation space could have that do not embed (nor rely upon) ML models [16]. That its own self-adaptation manager to ensure that the ML is, although the self-adaptation mechanism (i.e. manag- component is working as expected.

The vision presented in this paper difers from work may introduce higher latencies that compromise SLAs. on collective SAS since we are targeting systems with However, the impact of these mispredictions varies not only one agent and with a centralized learning process, only from client to client, with whom diferent SLAs may whereas this line of research focuses on systems with mul- have been agreed upon, but also in time, since during tiple agents that can share knowledge with each other. specific periods, e.g., Black Friday, the volume of transac

Diferently, our vision ties in the field of life- tions is substantially altered. During busy days such as long/continual learning [22, 23], which deals with open- these, adapting the ML models responsible for fraud deworld problems, with the field of self-adaptive systems. In tection so that they are less strict and reduce false alarms fact, dealing with open-world changes was identified by is crucial in order to preserve system utility. However, Gheibi et. al. [19] as an open problem in the SAS domain. this adaptation entails a delicate trade-of, since less strict Specifically, Lifelong Learning deals with the problem of models can allow fraudulent transactions to be accepted. leveraging past knowledge to learn a new task better and Further, these systems are subject to constantly evolving Continual Learning is focused on solving the problem of fraud patterns, to which the ML models must adapt [24]. maintaining the accuracy of old tasks when learning new tasks [23]. The techniques developed in this domain can 3.2. Causes of Degradation of ML be leveraged by SASs to improve ML components when Components’ Accuracy unexpected changes occur in the environment or when the performance of the ML component is degraded and af- We now focus on problems that deteriorate the perforfects overall system utility. Overall, our focus is on SASs mance of ML components such that they are no longer and on how to integrate techniques from these research able to maintain system utility at a desired level. In pardomains into a generic, yet rigorous/principled frame- ticular, we present two classes of problems, which, we work that can decide which ML component to adapt, how argue, are general enough to be representative of most and when. The next section provides details on possible of the issues addressed by the existing ML literature. causes of ML component degradation and repair tactics inspired by this field of research.

Data-set Shift. When the distribution of the inputs to

a model changes, such that it becomes substantially difer3. Adaptation of ML-based ent from the distribution on which the model was trained, we find ourselves in the presence of a problem commonly Systems known as data-set shift [8, 11, 10, 25]. As recent work has shown, not all data-set shifts are malign [10]. As such, We now motivate the need for self-adaptive ML-based an efective SAS should not only detect shifts, but also systems through an example from the enterprise systems be able to assess their actual impact on system utility. domain. Then, we present a set of possible causes for In a fraud detection system, data-set shift occurs when ML component performance degradation and a set of new fraud patterns emerge (e.g., charges at a particular adaptation tactics. merchant), or when patterns of legitimate transactions change, for instance due to busy shopping days like Black 3.1. Running Example – Fraud Detection Friday and Christmas [24]. Although the actual features System. used for classification may not change, their distribution does. This means that diferent values of the features Consider a fraud detection system that relies on ML mod- now characterize legitimate and fraudulent transactions. els for scoring credit/debit card transactions. The score attributed by the ML model is then used by a rule-based Incorrect Data. This problem arises when there are model to decide whether transactions are legitimate or samples in the model’s training set that are incorrectly fraudulent. Typical clients of companies that provide labeled [26] or when test data is tampered with, thus leadfraud detection services are banks and merchants. In ing the model to mispredict when certain inputs arrive. this setting, system utility is typically defined based on The former can happen, for instance, when unsupervised attributes such as the cost of losing clients due to in- techniques are used to label examples in order to bootcorrectly declined transactions, fairness (no client is de- strap the training set of a second supervised model [26]. clined more often) [18] and the overall cost of service Incorrect data can also make their way into a model’s level agreement (SLA) violations (these systems have training set due to attackers that intentionally pollute strict SLAs to process transactions in real time, e.g. at it so as to cause the ML component to incorrectly premost 200ms on the 99.999th percentile of the latencies’ dict outputs for certain inputs [12, 9]. For instance, in distribution [3]). While cost and revenue are directly the fraud detection case, security breaches could lead to afected by ML model’s mispredictions, response time is afected by model complexity, i.e., more complex models poisoning the data used for training ML models, hence components or giving them correct samples [14]. For causing them to make incorrect predictions. instance, whenever the ML component suspects a transaction of being fraudulent, it can be automatically can3.3. Repair Tactics celed. Then, the user can be informed of the decision and asked whether the transaction should be authorized Table 1 illustrates a collection of tactics that can be used or declined in the future. Another possibility is to add to deal with issues introduced by ML-based components. humans to the loop when adding samples to the ML comThese tactics were inspired by research on ML [22, 14, ponent’s training set. In this scenario, an expert can be 27, 13, 15]. Next, we describe the tactics presented in the asked to review the most uncertain classifications so as table, motivating them with scenarios in which they can to improve the quality of the training samples. In the be applied and discussing their costs and benefits. former scenario, the benefits are easily quantifiable, since the risk of accepting a possibly fraudulent transaction Component replacement. This tactic assumes the can be measured via its economic value. However, users existence of a repository of components and respective may get annoyed if their transactions are canceled too meta-data that can be analyzed to determine if there ex- often, to the extent that they may stop purchasing using ists a component that is better suited for the current that credit card provider. As for relying on experts to system state. For example, when the volume of transac- review uncertain classifications, having an on-demand tions changes, for instance in special days such as Black expert performing this task is expensive and the latency Friday, ML models may consider the increased frequency of the manual labeling process may be unacceptable. of transactions as an indicator of fraud and erroneously lfag legitimate transactions as fraudulent. Such mispre- Transfer learning. Transfer learning (TL) techniques dictions can lead to significant financial losses [ 3], thus re- leverage knowledge obtained when performing previous quiring timely fixes and rendering the use of high latency tasks that are similar to the current one so that learning tactics infeasible (note that in this context, transactions the current task becomes easier [27]. Suppose that: (i) need to be accepted/rejected within milliseconds [3]). As a fraud detection company has a set of clients (such as such, only low latency tactics can be applied. An example banks), (ii) the company has a unique ML model for each is to replace the underperforming models with rule-based client, so that it complies with data privacy regulations1 , models, e.g., developed by experts for specific situations, and (iii) one of its clients is afected by a new attack patand/or to switch to previously trained models that are tern, which is eventually learned by that client’s model. known to perform well in similar conditions. A benefit of In this scenario, TL techniques [29, 27] can be used to this tactic, whenever it is available, is too enable a swift improve the other clients’ models so that they can react reaction to data set shifts. Its main cost depends on the la- to the same attack. Estimating the benefits of executing tency and resources used for the analysis of the candidate this tactic for a given client boils down to estimating replacing components available in the repository. the likelihood that this client may sufer the same attack. Yet, the execution of this tactic typically implies high computational costs (e.g., if cloud resources are used) Human-based labeling. Humans are often able to recognize patterns, problems, and objects more accurately than ML components [14]. Thus, depending on the domain, humans may play a role in correcting these 1Since privacy is important in this domain, there are techniques that can be used to deal with the problem of ensuring data confidentiality and anonymity in information transfer between clients [28]. and non-negligible latency, which may render this tactic economically unfavorable, or even inadequate, e.g., if the attack on a diferent client is imminent and the TL process is slow.

Unlearning. This tactic corresponds to unlearning data that no longer reflects the current environment/state of the system and its lineage, thus eliminating the efect of that data on current predictions [13], while avoiding a full model retrain. A key problem that stands in the way of the execution of this tactic is the identification of incorrect labels. For instance, in a fraud detection system, incorrectly classified transactions may all be eventually identified for “free”, although with large latencies, when users review their credit card statements. Conversely, in scenarios in which the identification of incorrect samples is not readily available, one may leverage automatic techniques, such as the one described in [30], which are faster but typically less accurate. As such, the cost and complexity of this task vary depending on the context.

Then, after identifying the incorrect samples, the model must be updated to accurately reflect the correct data. At this point, the advantage of unlearning techniques with respect to a typical full model retrain is the time savings (up to 9.5 × 104) that can be achieved [13].

Retrain and/or hyper-parameter optimization.

This is a general tactic that involves retraining the model with new data that reflects recent relevant data-set drifts, e.g., a new kind of attack in a fraud detection system. There are many types of retraining, ranging from a simple model refresh (incorporate new data using old hyper-parameters), to a full retrain (including hyper-parameter optimization, possibly encompassing diferent model types/architectures), which imply diferent computational costs and can benefit model’s accuracy at diferent extents. In the presence of data-set shift, when there is new data that already incorporates the new input distribution, this tactic often represents a simple, yet possibly expensive, approach to deal with this problem. The benefits of this tactic are dependent on the type of retrain process and on the quality of the new data. As for its cost, if retraining is performed on the cloud, it can be directly converted to the economic cost of renting the virtual machines and several techniques exist to predict such costs [31, 32].

4. MAPE-K Loop for ML-Based Systems In SAS, the MAPE-K loop typically actuates over a system composed of non-ML components. To enable the development of self-adaptive ML-based systems, in which the MAPE-K loop actuates over a system composed of nonML and ML components (Figure 1) we argue that each stage of the MAPE-K loop should be revised to efectively leverage tactics such as the ones mentioned.

4.1. Monitor The Monitor stage has to keep track of the inputs used when querying ML components because shifts of the input distributions may afect the predictions. For instance, the detection of out-of-distribution inputs may mean that there has been a change in the environment and thus the model used by some ML component may no longer be representative of the current environment.

The challenge here is not only detecting the occurrence of shifts in a timely and reliable fashion, but also how to efectively characterize them — since diferent types of shifts require diferent reaction methods. As in other SAS, typical attributes that contribute to the system’s utility (e.g., latency, throughput) or the satisfaction of required system properties must be monitored. In addition to these, the Monitor stage must also gather the outputs of the ML component to account for situations in which changes in the inputs go by unnoticed, perhaps because they are too slow, but that manifest themselves faster in the outputs [33]. Examples of outputs to monitor are, for instance, shifts in the output distribution, model’s accuracy and error – obtained by comparing predictions with real outcomes. A relevant challenge here is that often real outcomes are only known after a long time, if ever. For instance, in fraud detection, false negatives (i.e., undetected real fraud) are known only when users file a complaint and false positives are normally undetectable (since no feedback is obtained for transactions that are legitimate but rejected by the system). Approaches such as those proposed in [33, 11, 34] provide a good starting point for the implementation of a Monitor for self-adaptive ML-based systems. approaches[38]. An additional concern is that some of these tactics may require a considerable use of resources to execute, either in the system itself or ofloaded. This requires Plan to account for this impact or cost.

For ML-based systems that rely on multiple ML components, whenever a system property is (expected to be) violated or when system utility decreases, fault localization may be required to understand which component is underperforming and should be repaired/replaced [39].

Challenges. Monitoring input and output distributions requires keeping track of a multitude of features and parameters which would otherwise be disregarded. Challenges. Although there are several apThis is already challenging due to the amount of data that proaches [31, 40] that attempt to predict the time/cost needs to be stored, maintained, and analyzed. Finding of training ML models, this is a complex problem suitable frequencies to gather these data and adapting that is strongly influenced by the type of ML models them in the face of evolving time constraints is an even considered, their hyper-parameters and the underlying bigger challenge in time-critical domains [35, 11]. (cloud) infrastructure. These techniques represent a natural starting point to estimate the costs and benefits 4.2. Analyze of adaptation tactics such as the ones presented. Yet, developing techniques for predicting the costs/benefits The Analyze stage is responsible for determining whether of complex tactics, e.g. unlearning, remains an open degradations of the prediction quality of ML components challenge. One interesting direction is to exploit are afecting (or predicted to afect) other system com- techniques for estimating the uncertainty [25] of ML ponents and system utility to such an extent that adap- models to quantify both the likelihood of models’ mispretation may be required. To accomplish this, one can dictions as well as the potential benefits deriving from leverage techniques developed by the ML community to employing corrective adaptation tactics. Certain ML detect possible issues in the inputs and outputs of the models can directly estimate their own uncertainty [41], model [8, 11, 10, 33], errors in its training set [36] and the or additional techniques (e.g. ensembles [42]) can be appearance of new features relevant for prediction [37]. used to obtain uncertainty estimations. Still, existing These techniques must then be adjusted for the particular techniques can sufer from significant shortcomings in case of each system, which includes adapting them to practical settings [25]. diferent ML models and tasks. Finally, tactics that modify ML components are computationally expensive (e.g., non-negligible latency). Thus, Challenges. Estimating the impact of an ML compo- Plan must have mechanisms to verify that the system can nent on other system components and on system utility execute the tactic without compromising other compocan be challenging because often (mis)predictions afect nents/properties, or even the entire system. the system’s utility/dependability in ways that are not only application- but also context-dependent. For in- 4.4. Execute stance, during periods with higher transaction volumes, such as on Black Friday, mispredictions have higher im- To execute a given adaptation tactic, the Execute stage pact on system utility, since during these periods it is must have access to mechanisms to improve or replace more critical to accurately detect fraud, while maximizing the ML component and/or its training set. As in the accepted transactions. Architectural models can capture conventional MAPE-K loop, we require implementations the information flows among components, but the chal- of adaptation tactics that are not only eficient to execute, lenge is to estimate how the uncertainty in the output of but also have predictable costs/benefits and are resilient the ML components propagates throughout the system. to run-time exceptions. 4.3. Plan Challenges. A key challenge is how to enhance the predictability of the execution of the ML adaptation tacThe Plan stage is responsible for identifying which adap- tics, which often require the processing of large volumes tation tactics (if any) to employ to address issues with of data (e.g., to re-train a large scale model) possibly ML components afecting the system. As with other self- under stringent timing constraints. We argue that the adaptation approaches, this reasoning should consider community of SAS would benefit from the availability the costs and benefits of each viable tactic. Further, most of open-source software frameworks that implement a of the proposed tactics have a non-negligible latency, range of generic adaptation tactics for ML components. which needs to be accounted for as in latency-aware This would allow one to mask complexity, promote interoperability and comparability of SAS. Further, it would also provide an opportunity to assemble, in a common framework, techniques that have been proposed over many years in diferent areas of the AI/ML literature.

Acknowledgments Support for this research was provided by Fundação para

a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) through the Carnegie Mellon Portugal Program under Grant SFRH/BD/150643/2020 4.5. Knowledge and via projects with references POCI-01-0247FEDER-045915, POCI-01-0247-FEDER-045907, and Finally, the Knowledge module is responsible for main- UIDB/50021/2020. This material is based upon work taining information that reflects what is known about funded and supported by the Department of Defense the environment and the system. For ML-based systems, under Contract No. FA8702-15-D-0002 with Carnegie the Knowledge component should evolve in order to keep Mellon University for the operation of the Software track of the costs/benefits of each tactic on the afected Engineering Institute, a federally funded research and ML components and system’s utility. This corresponds development center. DM21-0052 to: gathering knowledge on how each tactic altered an ML component and on the context in which the tactic

References

was executed; and meta information on training sets, for instance characterizing the most important features for predicting the costs and benefits of the diferent tactics.

This added knowledge should be leveraged to improve the decision making process and, thus, improve adaptation. By gathering knowledge on how each tactic altered an ML component and on the context in which the tactic was executed, the Analyze and Plan stages can take more efective decisions on when to adapt and which tactic to execute, respectively. Finally, for a tactic that replaces underperforming ML components with non ML-based ones, Knowledge must contain a repository of the available components and their meta-data. This meta-data, we argue, should provide information to enable reasoning on whether the necessary preconditions to enable a safe and timely reconfiguration hold. [16] C. Krupitzer, et al., A survey on engineering ap- tion, Springer, 2018.

proaches for self-adaptive systems (2018). [36] Z. Abedjan, et al., Detecting data errors: Where are [17] K. Ervasti, A survey on network measurement: we and what needs to be done?, Procs. of VLDB 9

Concepts, techniques, and tools (2016). (2016). [18] A. F. Cruz, et al., A bandit-based algorithm [37] D. Papamartzivanos, et al., Introducing deep learnfor fairness-aware hyperparameter optimization, ing self-adaptive misuse network intrusion detecCoRR abs/2010.03665 (2020). tion systems, IEEE Access 7 (2019). [19] O. Gheibi, et al., Applying machine learning in self- [38] G. A. Moreno, et al., Flexible and eficient decisionadaptive systems: A systematic literature review, making for proactive latency-aware self-adaptation, arXiv preprint arXiv:2103.04112 (2021). ACM Trans. Auton. Adapt. Syst. 13 (2018). [20] T. R. D. Saputri, S.-W. Lee, The application of ma- [39] A. Christi, et al., Evaluating fault localization for chine learning in self-adaptive systems: A system- resource adaptation via test-based software modifiatic literature review, IEEE Access 8 (2020). cation, in: Procs. of QRS, 2019. [21] T. Bureš, Self-adaptation 2.0, in: 2021 International [40] O. Alipourfard, et al., Cherrypick: Adaptively unSymposium on Software Engineering for Adaptive earthing the best cloud configurations for big data and Self-Managing Systems (SEAMS), 2021. analytics, in: Procs. of NSDI, 2017. [22] D. L. Silver, Q. Yang, L. Li, Lifelong machine learn- [41] M. A. Osborne, et al., Gaussian processes for global ing systems: Beyond learning algorithms, in: 2013 optimization, in: LION, 2009.

AAAI spring symposium series, 2013. [42] L. Breiman, Bagging predictors, in: Machine Learn[23] B. Liu, Learning on the job: Online lifelong and con- ing, volume 24, Springer, 1996. tinual learning, in: Procs. of the AAAI Conference on Artificial Intelligence, volume 34, 2020. [24] D. Aparício, et al., Arms: Automated rules management system for fraud detection, arXiv preprint arXiv:2002.06075 (2020). [25] Y. Ovadia, et al., Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift, in: Procs. of NIPS, 2019. [26] D. Wu, et al., A highly accurate framework for selflabeled semisupervised classification in industrial applications, IEEE TII 14 (2018). [27] S. J. Pan, Q. Yang, A survey on transfer learning,

IEEE TKDE 22 (2009). [28] Y. Liu, et al., A secure federated transfer learning

framework, Procs. of IS 35 (2020). [29] K. Swersky, et al., Multi-task bayesian optimization,

Procs. of NIPS 26 (2013). [30] Y. Cao, et al., Eficient repair of polluted machine learning systems via causal unlearning, in: Procs.

of Asia CCS, 2018. [31] M. Casimiro, et al., Lynceus: Cost-eficient tuning and provisioning of data analytic jobs, in: Procs. of

ICDCS, 2020. [32] P. Mendes, et al., TrimTuner: Eficient optimization of machine learning jobs in the cloud via subsampling, in: MASCOTS, 2020. [33] X. Zhou, et al., A Framework to Monitor Machine

Learning Systems Using Concept Drift Detection,

Springer, 2019. [34] Z. Yang, M. H. Asyrofi, D. Lo, BiasRV: Uncovering biased sentiment predictions at runtime, CoRR abs/2105.14874 (2021). arXiv:2105.14874. [35] E. Bartocci, et al., Specification-based monitoring of cyber-physical systems: a survey on theory, tools and applications, in: Lectures on Runtime Verifica