=Paper=
{{Paper
|id=Vol-2978/saml-paper6
|storemode=property
|title=Self-Adaptation for Machine Learning Based Systems
|pdfUrl=https://ceur-ws.org/Vol-2978/saml-paper6.pdf
|volume=Vol-2978
|authors=Maria Casimiro,Paolo Romano,David Garlan,Gabriel A. Moreno,Eunsuk Kang,Mark Klein
|dblpUrl=https://dblp.org/rec/conf/ecsa/Casimiro0GMKK21
}}
==Self-Adaptation for Machine Learning Based Systems==
Self-Adaptation for Machine Learning Based Systems Maria Casimiro1,2 , Paolo Romano2 , David Garlan1 , Gabriel A. Moreno3 , Eunsuk Kang1 and Mark Klein3 1 Institute for Software Research, Carnegie Mellon University, Pittsburgh, PA, USA 2 INESC-ID, Instituto Superior Técnico, University of Lisbon, Lisbon, Portugal 3 Software Engineering Institute, Carnegie Mellon University, Pittsburgh, PA, USA Abstract Today’s world is witnessing a shift from human-written software to machine-learned software, with the rise of systems that rely on machine learning. These systems typically operate in non-static environments, which are prone to unexpected changes, as is the case of self-driving cars and enterprise systems. In this context, machine-learned software can misbehave. Thus, it is paramount that these systems are capable of detecting problems with their machined-learned components and adapt themselves to maintain desired qualities. For instance, a fraud detection system that cannot adapt its machine-learned model to efficiently cope with emerging fraud patterns or changes in the volume of transactions is subject to losses of millions of dollars. In this paper, we take a first step towards the development of a framework aimed to self-adapt systems that rely on machine-learned components. We describe: (i) a set of causes of machine-learned component misbehavior and a set of adaptation tactics inspired by the literature on machine learning, motivating them with the aid of a running example; (ii) the required changes to the MAPE-K loop, a popular control loop for self-adaptive systems; and (iii) the challenges associated with developing this framework. We conclude the paper with a set of research questions to guide future work. Keywords Self-adaptive systems, Machine Learning, Model degradation 1. Introduction diagnosis, which relies on ML for classifying types of diseases of sick patients [4]; self-driving cars, which use The field of self-adaptive systems (SAS) is an extensive ML to determine whether they should stop based on how and active research area that has made steady improve- distant they are from the car in front [5]; robots, which ments for years. SAS react to environment changes, faults rely on ML models to predict the amount of remaining and internal system issues to improve the system’s be- battery power [6]; and targeted advertisement services, havior, utility and/or dependability [1]. These systems which rely on recommender systems to show users items usually adopt an architecture, known as the MAPE-K that they may find interesting [7]. loop, which monitors the system, decides when it needs For such systems, adaptation poses a key concern. In adaptation, selects the best course of action to improve addition to the reasons that traditional systems must the system, and executes it [2]. The actions available adapt (faults, changing requirements, unexpected loads, for the system to execute are usually called tactics. The etc.), ML-based components may fail to perform as ex- literature on SAS spans a broad range of systems such as pected, thereby reducing system utility. For instance, enterprise systems, and cyber-physical systems (CPS). changes in a system’s operating environment can intro- In parallel with the maturing of SAS research, a new duce drifts in the input data of the ML models making class of systems has emerged: supervised and semi- them less accurate [8], or attacks may attempt to subvert supervised machine learning (ML) based systems are now the intended functionality of the system [9]. becoming ubiquitous. Such systems embed one or more Thankfully, there is a large number of emerging components, whose behavior is derived from training techniques that have been developed by the ML com- data, into a larger system containing traditional compu- munity for adapting supervised ML models and that tational entities (web services, databases, operator inter- could in principle be used as adaptation tactics in a faces). Examples include: fraud detection, which uses a self-adaptive system. These range from off-line, from- classifier to detect fraudulent transactions [3]; medical scratch model retraining and replacement, at one ex- treme, to incremental approaches performed in-situ, at SAML’21: International Workshop on Software Architecture and the other [10, 11, 12, 13, 14, 15]. And more techniques Machine Learning, September 13–17, 2021, Växjö, Sweden are being developed constantly. " maria.casimiro@tecnico.ulisboa.pt (M. Casimiro); romano@inesc-id.pt (P. Romano); garlan@cs.cmu.edu (D. Garlan); Unfortunately, determining when and how to take gmoreno@sei.cmu.edu (G. A. Moreno); eunsukk@andrew.cmu.edu advantage of such tactics to perform adaptation is highly (E. Kang); mk@sei.cmu.edu (M. Klein) non-trivial. First, there is a large number of possible © 2021 Copyright for this paper by Carnegie Mellon University and the authors. Use permitted under Creative Commons License Attribution 4.0 International adaptation tactics that could potentially be applied to (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) an ML component, but not all approaches work with all forms of supervised ML models. For example, some ing system) may rely on ML to perform a given function training models may allow a system to selectively “forget” (e.g., decide the tactic to execute), the actual system that certain inputs, while others do not. Similarly, some ML is adapted (i.e. the managed system) does not rely on any models support transfer learning to incrementally updateML component. These systems have at their disposal a a learnt model, but not all do. set of tactics that, for instance, change a system’s archi- Second, the value of investing in improving the accu-tecture (e.g., adding/removing servers) or the quality of racy of an ML component is strongly context-dependent the service they provide (e.g., increasing/decreasing the – often depending on both the domain and timing consid- rendering quality of images) in response to environment erations. For example, while a medical diagnosis system changes. Usually, tactic outcomes have some uncertainty may support model retraining at run time, the latency that can be modeled via probabilistic methods given as- of this tactic may make it infeasible for self-driving cars, sumptions on the underlying hardware/software plat- which rely instead on swifter tactics (such as replacingforms and their characteristics. Further, one can measure the ML component entirely) that can address real-time the properties of such systems through the use of metrics system response requirements. In a different mode of op-such as latency, throughput and content quality. eration, however, both types of tactics may be available, Determining the costs and benefits of such adaptation e.g., if the self-driving car is stopped (parked mode of op- tactics has been well researched and there are numerous eration), it may be feasible to retrain an underperforming techniques and algorithms for that end [17]. However, model without compromising safety. new challenges arise when considering managed systems Third, calculating the costs and benefits of these tactics that depend on ML models. Not only are we missing a is difficult, particularly in a whole-system context, where well-understood and generally applicable set of tactics improving a particular component’s performance may that SAS can use to adapt ML-based systems, but also or may not improve overall system utility. Costs includethe properties of ML components, such as accuracy and time, resources (processing, memory, power), and servicefairness, may not change consistently with the tactic that disruption. Benefits derive for instance from increased is executed. For example, if we retrain an ML model, its accuracy or fairness of the ML component, which can in accuracy is not always affected in the same way, but may turn lead to better performing down-stream components depend on the samples available to retrain the model, on and support overall business goals (e.g. by improving ad- the duration of the retraining process, and on the model’s vertisement revenue). Both costs and benefits can be hard hyper-parameters. Similarly, model fairness may also be to quantify, however, and hence to reason about when de-affected in different ways due to the training samples termining whether an ML adaptation tactic makes sense. that are fed during re-training [18]. We argue, therefore, that in order to harness the po- To improve the self-adaptive capabilities of systems tential of the rich space of ML adaptation mechanisms, it and their performance, recent research has proposed is necessary to develop methods that can reason about SASs that rely on ML techniques and models to adapt the which tactics are available to adapt the ML component, system [19, 20]. Specifically, ML is used in the adaptation which are the most effective to employ in a given context manager to: update adaptation policies, predict resource so that system utility is maximized, and how to integrate usage, update run-time models, reduce adaptation spaces, them into modern adaptive systems architectures. Specif-predict anomalies, and collect knowledge. Additionally, ically, in this paper we attempt to bring some clarity to learning is typically leveraged to improve the Analysis this emerging but critical aspect of SAS by outlining (i) and Plan components of the MAPE-K loop [19]. a set of causes of ML component performance degrada- In this paper, we focus on the problem of how to tion and a set of adaptation tactics derived from research leverage self-adaptation to correct and adapt supervised on ML (§ 3); (ii) architectural and algorithmic changes ML components of a managed system, while increas- required to incorporate effective ML adaptation into theing overall utility of ML-based systems when their ML MAPE-K loop, a popular framework for monitoring and components are underperforming. This vision is aligned controlling self-adaptive systems (§ 4); and (iii) the mod- with the one presented by Bures [21] in which the au- eling and engineering challenges associated with realiz-thor claims that “self-adaptation should stand in equal-to- ing the full potential for adaptation of ML-based systems equal relationship to AI. It should both benefit from AI and (§ 4). We conclude with a set of open research questions. enable AI.” Extending this vision further, we argue that the techniques developed in this context could also be applied, in a recursive fashion, to self-adapt adaptation 2. Background & Related Work managers that rely on ML components to enhance their effectiveness and robustness. For instance a planner that Current literature on SAS focuses on managed systems relies on ML to reduce the adaptation space could have that do not embed (nor rely upon) ML models [16]. That its own self-adaptation manager to ensure that the ML is, although the self-adaptation mechanism (i.e. manag- component is working as expected. The vision presented in this paper differs from work may introduce higher latencies that compromise SLAs. on collective SAS since we are targeting systems with However, the impact of these mispredictions varies not only one agent and with a centralized learning process, only from client to client, with whom different SLAs may whereas this line of research focuses on systems with mul- have been agreed upon, but also in time, since during tiple agents that can share knowledge with each other. specific periods, e.g., Black Friday, the volume of transac- Differently, our vision ties in the field of life- tions is substantially altered. During busy days such as long/continual learning [22, 23], which deals with open- these, adapting the ML models responsible for fraud de- world problems, with the field of self-adaptive systems. In tection so that they are less strict and reduce false alarms fact, dealing with open-world changes was identified by is crucial in order to preserve system utility. However, Gheibi et. al. [19] as an open problem in the SAS domain. this adaptation entails a delicate trade-off, since less strict Specifically, Lifelong Learning deals with the problem of models can allow fraudulent transactions to be accepted. leveraging past knowledge to learn a new task better and Further, these systems are subject to constantly evolving Continual Learning is focused on solving the problem of fraud patterns, to which the ML models must adapt [24]. maintaining the accuracy of old tasks when learning new tasks [23]. The techniques developed in this domain can 3.2. Causes of Degradation of ML be leveraged by SASs to improve ML components when unexpected changes occur in the environment or when Components’ Accuracy the performance of the ML component is degraded and af- We now focus on problems that deteriorate the perfor- fects overall system utility. Overall, our focus is on SASs mance of ML components such that they are no longer and on how to integrate techniques from these research able to maintain system utility at a desired level. In par- domains into a generic, yet rigorous/principled frame- ticular, we present two classes of problems, which, we work that can decide which ML component to adapt, how argue, are general enough to be representative of most and when. The next section provides details on possible of the issues addressed by the existing ML literature. causes of ML component degradation and repair tactics inspired by this field of research. Data-set Shift. When the distribution of the inputs to a model changes, such that it becomes substantially differ- ent from the distribution on which the model was trained, 3. Adaptation of ML-based we find ourselves in the presence of a problem commonly Systems known as data-set shift [8, 11, 10, 25]. As recent work has shown, not all data-set shifts are malign [10]. As such, We now motivate the need for self-adaptive ML-based an effective SAS should not only detect shifts, but also systems through an example from the enterprise systems be able to assess their actual impact on system utility. domain. Then, we present a set of possible causes for In a fraud detection system, data-set shift occurs when ML component performance degradation and a set of new fraud patterns emerge (e.g., charges at a particular adaptation tactics. merchant), or when patterns of legitimate transactions change, for instance due to busy shopping days like Black 3.1. Running Example – Fraud Detection Friday and Christmas [24]. Although the actual features System. used for classification may not change, their distribution does. This means that different values of the features Consider a fraud detection system that relies on ML mod- now characterize legitimate and fraudulent transactions. els for scoring credit/debit card transactions. The score attributed by the ML model is then used by a rule-based Incorrect Data. This problem arises when there are model to decide whether transactions are legitimate or samples in the model’s training set that are incorrectly fraudulent. Typical clients of companies that provide labeled [26] or when test data is tampered with, thus lead- fraud detection services are banks and merchants. In ing the model to mispredict when certain inputs arrive. this setting, system utility is typically defined based on The former can happen, for instance, when unsupervised attributes such as the cost of losing clients due to in- techniques are used to label examples in order to boot- correctly declined transactions, fairness (no client is de- strap the training set of a second supervised model [26]. clined more often) [18] and the overall cost of service Incorrect data can also make their way into a model’s level agreement (SLA) violations (these systems have training set due to attackers that intentionally pollute strict SLAs to process transactions in real time, e.g. at it so as to cause the ML component to incorrectly pre- most 200ms on the 99.999th percentile of the latencies’ dict outputs for certain inputs [12, 9]. For instance, in distribution [3]). While cost and revenue are directly the fraud detection case, security breaches could lead to affected by ML model’s mispredictions, response time is affected by model complexity, i.e., more complex models Table 1 Examples of general adaptation tactics for ML-based systems with their strengths (‘+’) and weaknesses (‘–’). Tactic Description Properties Component Replace an under-performing + Fast and inexpensive, when possible component by one that better – Non ML-based estimators may not be available in all scenarios Replacement matches the current environment – Alternative estimators, when available, may be more robust but less precise Rely on a human to classify some incoming + Accuracy of human-based labels expected to be high Human-based samples or to correct the labeling – Expert knowledge may be expensive to obtain Labeling [14] of samples in the training set and/or introduce unacceptable latency Reuse knowledge gathered previously + Less data-hungry than plain retrain Transfer on different tasks/problems to – Effectiveness dependent on the similarities between old and new tasks/data Learning [27] accelerate the learning of new tasks – Computationally intensive process Remove samples that are no longer representative + Fast when ratio between data to forget and data-set size is small Unlearning [13] from the training set and from the model – Cost/latency for identifying examples to unlearn can be large and context-dependent Retrain with new data and maybe + Generic and robust method – Effective only once a relatively large number of instances of the new data are available Retrain [15] choose new values for the – Computationally intensive process ML model’s hyper-parameters – Accuracy and latency of the retrain process may vary significantly poisoning the data used for training ML models, hence components or giving them correct samples [14]. For causing them to make incorrect predictions. instance, whenever the ML component suspects a trans- action of being fraudulent, it can be automatically can- 3.3. Repair Tactics celed. Then, the user can be informed of the decision and asked whether the transaction should be authorized Table 1 illustrates a collection of tactics that can be used or declined in the future. Another possibility is to add to deal with issues introduced by ML-based components. humans to the loop when adding samples to the ML com- These tactics were inspired by research on ML [22, 14, ponent’s training set. In this scenario, an expert can be 27, 13, 15]. Next, we describe the tactics presented in the asked to review the most uncertain classifications so as table, motivating them with scenarios in which they can to improve the quality of the training samples. In the be applied and discussing their costs and benefits. former scenario, the benefits are easily quantifiable, since the risk of accepting a possibly fraudulent transaction Component replacement. This tactic assumes the can be measured via its economic value. However, users existence of a repository of components and respective may get annoyed if their transactions are canceled too meta-data that can be analyzed to determine if there ex- often, to the extent that they may stop purchasing using ists a component that is better suited for the current that credit card provider. As for relying on experts to system state. For example, when the volume of transac- review uncertain classifications, having an on-demand tions changes, for instance in special days such as Black expert performing this task is expensive and the latency Friday, ML models may consider the increased frequency of the manual labeling process may be unacceptable. of transactions as an indicator of fraud and erroneously flag legitimate transactions as fraudulent. Such mispre- Transfer learning. Transfer learning (TL) techniques dictions can lead to significant financial losses [3], thus re- leverage knowledge obtained when performing previous quiring timely fixes and rendering the use of high latency tasks that are similar to the current one so that learning tactics infeasible (note that in this context, transactions the current task becomes easier [27]. Suppose that: (i) need to be accepted/rejected within milliseconds [3]). As a fraud detection company has a set of clients (such as such, only low latency tactics can be applied. An example banks), (ii) the company has a unique ML model for each is to replace the underperforming models with rule-based client, so that it complies with data privacy regulations1 , models, e.g., developed by experts for specific situations, and (iii) one of its clients is affected by a new attack pat- and/or to switch to previously trained models that are tern, which is eventually learned by that client’s model. known to perform well in similar conditions. A benefit of In this scenario, TL techniques [29, 27] can be used to this tactic, whenever it is available, is too enable a swift improve the other clients’ models so that they can react reaction to data set shifts. Its main cost depends on the la- to the same attack. Estimating the benefits of executing tency and resources used for the analysis of the candidate this tactic for a given client boils down to estimating replacing components available in the repository. the likelihood that this client may suffer the same attack. Yet, the execution of this tactic typically implies high Human-based labeling. Humans are often able to computational costs (e.g., if cloud resources are used) recognize patterns, problems, and objects more accu- 1 Since privacy is important in this domain, there are techniques rately than ML components [14]. Thus, depending on that can be used to deal with the problem of ensuring data confiden- the domain, humans may play a role in correcting these tiality and anonymity in information transfer between clients [28]. and non-negligible latency, which may render this tactic economically unfavorable, or even inadequate, e.g., if the attack on a different client is imminent and the TL process is slow. Unlearning. This tactic corresponds to unlearning data that no longer reflects the current environment/state of the system and its lineage, thus eliminating the effect of that data on current predictions [13], while avoiding a full model retrain. A key problem that stands in the way of the execution of this tactic is the identification of incorrect labels. For instance, in a fraud detection system, incorrectly classified transactions may all be eventually identified for “free”, although with large latencies, when users review their credit card statements. Conversely, in scenarios in which the identification of incorrect sam- ples is not readily available, one may leverage automatic techniques, such as the one described in [30], which are faster but typically less accurate. As such, the cost and Figure 1: MAPE-K loop over an ML-based system with a mix of ML and non-ML components, with specific challenges for complexity of this task vary depending on the context. each MAPE-K stage. White arrows represent dependencies Then, after identifying the incorrect samples, the model between components. must be updated to accurately reflect the correct data. At this point, the advantage of unlearning techniques with respect to a typical full model retrain is the time savings (up to 9.5 × 104 ) that can be achieved [13]. MAPE-K loop actuates over a system composed of non- ML and ML components (Figure 1) we argue that each Retrain and/or hyper-parameter optimization. stage of the MAPE-K loop should be revised to effectively This is a general tactic that involves retraining the model leverage tactics such as the ones mentioned. with new data that reflects recent relevant data-set drifts, e.g., a new kind of attack in a fraud detection 4.1. Monitor system. There are many types of retraining, ranging The Monitor stage has to keep track of the inputs used from a simple model refresh (incorporate new data when querying ML components because shifts of the using old hyper-parameters), to a full retrain (including input distributions may affect the predictions. For in- hyper-parameter optimization, possibly encompassing stance, the detection of out-of-distribution inputs may different model types/architectures), which imply mean that there has been a change in the environment different computational costs and can benefit model’s and thus the model used by some ML component may accuracy at different extents. In the presence of data-set no longer be representative of the current environment. shift, when there is new data that already incorporates The challenge here is not only detecting the occurrence the new input distribution, this tactic often represents a of shifts in a timely and reliable fashion, but also how simple, yet possibly expensive, approach to deal with to effectively characterize them — since different types this problem. The benefits of this tactic are dependent on of shifts require different reaction methods. As in other the type of retrain process and on the quality of the new SAS, typical attributes that contribute to the system’s data. As for its cost, if retraining is performed on the utility (e.g., latency, throughput) or the satisfaction of re- cloud, it can be directly converted to the economic cost quired system properties must be monitored. In addition of renting the virtual machines and several techniques to these, the Monitor stage must also gather the outputs exist to predict such costs [31, 32]. of the ML component to account for situations in which changes in the inputs go by unnoticed, perhaps because 4. MAPE-K Loop for ML-Based they are too slow, but that manifest themselves faster in the outputs [33]. Examples of outputs to monitor are, for Systems instance, shifts in the output distribution, model’s accu- racy and error – obtained by comparing predictions with In SAS, the MAPE-K loop typically actuates over a system real outcomes. A relevant challenge here is that often real composed of non-ML components. To enable the devel- outcomes are only known after a long time, if ever. For in- opment of self-adaptive ML-based systems, in which the stance, in fraud detection, false negatives (i.e., undetected real fraud) are known only when users file a complaint approaches[38]. An additional concern is that some of and false positives are normally undetectable (since no these tactics may require a considerable use of resources feedback is obtained for transactions that are legitimate to execute, either in the system itself or offloaded. This but rejected by the system). Approaches such as those requires Plan to account for this impact or cost. proposed in [33, 11, 34] provide a good starting point For ML-based systems that rely on multiple ML com- for the implementation of a Monitor for self-adaptive ponents, whenever a system property is (expected to be) ML-based systems. violated or when system utility decreases, fault localiza- tion may be required to understand which component is Challenges. Monitoring input and output distribu- underperforming and should be repaired/replaced [39]. tions requires keeping track of a multitude of features and parameters which would otherwise be disregarded. Challenges. Although there are several ap- This is already challenging due to the amount of data that proaches [31, 40] that attempt to predict the time/cost needs to be stored, maintained, and analyzed. Finding of training ML models, this is a complex problem suitable frequencies to gather these data and adapting that is strongly influenced by the type of ML models them in the face of evolving time constraints is an even considered, their hyper-parameters and the underlying bigger challenge in time-critical domains [35, 11]. (cloud) infrastructure. These techniques represent a natural starting point to estimate the costs and benefits 4.2. Analyze of adaptation tactics such as the ones presented. Yet, developing techniques for predicting the costs/benefits The Analyze stage is responsible for determining whether of complex tactics, e.g. unlearning, remains an open degradations of the prediction quality of ML components challenge. One interesting direction is to exploit are affecting (or predicted to affect) other system com- techniques for estimating the uncertainty [25] of ML ponents and system utility to such an extent that adap- models to quantify both the likelihood of models’ mispre- tation may be required. To accomplish this, one can dictions as well as the potential benefits deriving from leverage techniques developed by the ML community to employing corrective adaptation tactics. Certain ML detect possible issues in the inputs and outputs of the models can directly estimate their own uncertainty [41], model [8, 11, 10, 33], errors in its training set [36] and the or additional techniques (e.g. ensembles [42]) can be appearance of new features relevant for prediction [37]. used to obtain uncertainty estimations. Still, existing These techniques must then be adjusted for the particular techniques can suffer from significant shortcomings in case of each system, which includes adapting them to practical settings [25]. different ML models and tasks. Finally, tactics that modify ML components are compu- tationally expensive (e.g., non-negligible latency). Thus, Challenges. Estimating the impact of an ML compo- Plan must have mechanisms to verify that the system can nent on other system components and on system utility execute the tactic without compromising other compo- can be challenging because often (mis)predictions affect nents/properties, or even the entire system. the system’s utility/dependability in ways that are not only application- but also context-dependent. For in- 4.4. Execute stance, during periods with higher transaction volumes, such as on Black Friday, mispredictions have higher im- To execute a given adaptation tactic, the Execute stage pact on system utility, since during these periods it is must have access to mechanisms to improve or replace more critical to accurately detect fraud, while maximizing the ML component and/or its training set. As in the accepted transactions. Architectural models can capture conventional MAPE-K loop, we require implementations the information flows among components, but the chal- of adaptation tactics that are not only efficient to execute, lenge is to estimate how the uncertainty in the output of but also have predictable costs/benefits and are resilient the ML components propagates throughout the system. to run-time exceptions. 4.3. Plan Challenges. A key challenge is how to enhance the predictability of the execution of the ML adaptation tac- The Plan stage is responsible for identifying which adap- tics, which often require the processing of large volumes tation tactics (if any) to employ to address issues with of data (e.g., to re-train a large scale model) possibly ML components affecting the system. As with other self- under stringent timing constraints. We argue that the adaptation approaches, this reasoning should consider community of SAS would benefit from the availability the costs and benefits of each viable tactic. Further, most of open-source software frameworks that implement a of the proposed tactics have a non-negligible latency, range of generic adaptation tactics for ML components. which needs to be accounted for as in latency-aware This would allow one to mask complexity, promote inter- Acknowledgments operability and comparability of SAS. Further, it would also provide an opportunity to assemble, in a common Support for this research was provided by Fundação para framework, techniques that have been proposed over a Ciência e a Tecnologia (Portuguese Foundation for many years in different areas of the AI/ML literature. Science and Technology) through the Carnegie Mellon Portugal Program under Grant SFRH/BD/150643/2020 and via projects with references POCI-01-0247- 4.5. Knowledge FEDER-045915, POCI-01-0247-FEDER-045907, and Finally, the Knowledge module is responsible for main- UIDB/50021/2020. This material is based upon work taining information that reflects what is known about funded and supported by the Department of Defense the environment and the system. For ML-based systems, under Contract No. FA8702-15-D-0002 with Carnegie the Knowledge component should evolve in order to keep Mellon University for the operation of the Software track of the costs/benefits of each tactic on the affected Engineering Institute, a federally funded research and ML components and system’s utility. This corresponds development center. DM21-0052 to: gathering knowledge on how each tactic altered an ML component and on the context in which the tactic was executed; and meta information on training sets, for References instance characterizing the most important features for [1] B. H. C. Cheng, et al., Software Engineering for Self- predicting the costs and benefits of the different tactics. Adaptive Systems: A Research Roadmap, Springer, This added knowledge should be leveraged to improve 2009. the decision making process and, thus, improve adapta- [2] J. O. Kephart, D. M. Chess, The vision of autonomic tion. By gathering knowledge on how each tactic altered computing, Computer 36 (2003). an ML component and on the context in which the tactic [3] B. Branco, et al., Interleaved sequence rnns for was executed, the Analyze and Plan stages can take more fraud detection, in: Procs. of KDD, 2020. effective decisions on when to adapt and which tactic to [4] B. J. Erickson, et al., Machine learning for medical execute, respectively. Finally, for a tactic that replaces imaging, Radiographics 37 (2017). underperforming ML components with non ML-based [5] Z. Chen, X. Huang, End-to-end learning for lane ones, Knowledge must contain a repository of the avail- keeping of self-driving cars, in: Procs. of IV, 2017. able components and their meta-data. This meta-data, we [6] P. Jamshidi, et al., Machine learning meets quan- argue, should provide information to enable reasoning titative planning: Enabling self-adaptation in au- on whether the necessary preconditions to enable a safe tonomous robots, in: Procs. of SEAMS, 2019. and timely reconfiguration hold. [7] H.-T. Cheng, et al., Wide & deep learning for rec- ommender systems, in: Procs. of DLRS, 2016. 5. Conclusions and Future Work [8] J. Quionero-Candela, et al., Dataset shift in machine learning, The MIT Press, 2009. This work introduced a vision for a new breed of self- [9] T. Gu, et al., Badnets: Evaluating backdooring adaptive frameworks that brings together techniques attacks on deep neural networks, IEEE Access 7 developed by the ML literature (used here as adaptation (2019). tactics), and reasons about the cost/benefits trade offs of [10] S. Rabanser, et al., Failing loudly: An empirical each, with the end goal of adapting degraded ML com- study of methods for detecting dataset shift, in: ponents of ML-based systems to maintain system utility. Procs. of NIPS, 2019. With the aid of a running example we showed how dif- [11] F. Pinto, et al., Automatic model monitoring for data ferent adaptation tactics can be applied to repair ML streams, arXiv preprint arXiv:1908.04240 (2019). models when different real-life situations hinder system [12] L. Huang, et al., Adversarial machine learning, in: utility. Further, we identified a set of key requirements Procs. of AISec, 2011. that should be supported by the various elements of the [13] Y. Cao, J. Yang, Towards making systems forget classic MAPE-K control loop and a set of challenging with machine unlearning, in: Procs. of S&P, IEEE, research problems. Finally, we highlight the following 2015. research questions as directions for future work: (i) How [14] B. Miller, et al., Reviewer integration and perfor- to estimate the costs and benefits of each tactic? (ii) How mance measurement for malware detection, in: to reason about the impact of ML mispredictions on sys- Procs. of DIMVA, 2016. tem utility? (iii) How do changes to one ML component [15] Y. Wu, et al., DeltaGrad: Rapid retraining of ma- impact the other components in the system? (iv) How to chine learning models, in: Procs. of ICML, 2020. reason about the long-term impacts of adaptation tactics on system utility? [16] C. Krupitzer, et al., A survey on engineering ap- tion, Springer, 2018. proaches for self-adaptive systems (2018). [36] Z. Abedjan, et al., Detecting data errors: Where are [17] K. Ervasti, A survey on network measurement: we and what needs to be done?, Procs. of VLDB 9 Concepts, techniques, and tools (2016). (2016). [18] A. F. Cruz, et al., A bandit-based algorithm [37] D. Papamartzivanos, et al., Introducing deep learn- for fairness-aware hyperparameter optimization, ing self-adaptive misuse network intrusion detec- CoRR abs/2010.03665 (2020). tion systems, IEEE Access 7 (2019). [19] O. Gheibi, et al., Applying machine learning in self- [38] G. A. Moreno, et al., Flexible and efficient decision- adaptive systems: A systematic literature review, making for proactive latency-aware self-adaptation, arXiv preprint arXiv:2103.04112 (2021). ACM Trans. Auton. Adapt. Syst. 13 (2018). [20] T. R. D. Saputri, S.-W. Lee, The application of ma- [39] A. Christi, et al., Evaluating fault localization for chine learning in self-adaptive systems: A system- resource adaptation via test-based software modifi- atic literature review, IEEE Access 8 (2020). cation, in: Procs. of QRS, 2019. [21] T. Bureš, Self-adaptation 2.0, in: 2021 International [40] O. Alipourfard, et al., Cherrypick: Adaptively un- Symposium on Software Engineering for Adaptive earthing the best cloud configurations for big data and Self-Managing Systems (SEAMS), 2021. analytics, in: Procs. of NSDI, 2017. [22] D. L. Silver, Q. Yang, L. Li, Lifelong machine learn- [41] M. A. Osborne, et al., Gaussian processes for global ing systems: Beyond learning algorithms, in: 2013 optimization, in: LION, 2009. AAAI spring symposium series, 2013. [42] L. Breiman, Bagging predictors, in: Machine Learn- [23] B. Liu, Learning on the job: Online lifelong and con- ing, volume 24, Springer, 1996. tinual learning, in: Procs. of the AAAI Conference on Artificial Intelligence, volume 34, 2020. [24] D. Aparício, et al., Arms: Automated rules man- agement system for fraud detection, arXiv preprint arXiv:2002.06075 (2020). [25] Y. Ovadia, et al., Can you trust your model's un- certainty? evaluating predictive uncertainty under dataset shift, in: Procs. of NIPS, 2019. [26] D. Wu, et al., A highly accurate framework for self- labeled semisupervised classification in industrial applications, IEEE TII 14 (2018). [27] S. J. Pan, Q. Yang, A survey on transfer learning, IEEE TKDE 22 (2009). [28] Y. Liu, et al., A secure federated transfer learning framework, Procs. of IS 35 (2020). [29] K. Swersky, et al., Multi-task bayesian optimization, Procs. of NIPS 26 (2013). [30] Y. Cao, et al., Efficient repair of polluted machine learning systems via causal unlearning, in: Procs. of Asia CCS, 2018. [31] M. Casimiro, et al., Lynceus: Cost-efficient tuning and provisioning of data analytic jobs, in: Procs. of ICDCS, 2020. [32] P. Mendes, et al., TrimTuner: Efficient optimiza- tion of machine learning jobs in the cloud via sub- sampling, in: MASCOTS, 2020. [33] X. Zhou, et al., A Framework to Monitor Machine Learning Systems Using Concept Drift Detection, Springer, 2019. [34] Z. Yang, M. H. Asyrofi, D. Lo, BiasRV: Uncovering biased sentiment predictions at runtime, CoRR abs/2105.14874 (2021). arXiv:2105.14874. [35] E. Bartocci, et al., Specification-based monitoring of cyber-physical systems: a survey on theory, tools and applications, in: Lectures on Runtime Verifica-