Software Configuration Diagnosis – A Survey of Existing Methods and Open Challenges Artur Andrzejak1 and Gerhard Friedrich2 and Franz Wotawa3 Abstract. As software systems become more complex and feature- ware, we focus on methods and tools that have been developed within rich, configuration mechanisms are needed to adapt them to differ- the area of software configuration. Dealing with software configura- ent execution environments and usage profiles. As a consequence, tion only allows for extracting and straightforwardly using informa- failures due to erroneous configuration settings are becoming more tion from programs, which would be hardly obtained when consid- common, calling for effective mechanisms for diagnosis, repair, and ering hardware. As a consequence, there are many approaches that prevention of such issues. In this paper, we survey approaches for di- work exclusively in the software configuration domain. Neverthe- agnosing software configuration errors, methods for debugging these less, there are also approaches that can be generalized to serve di- errors, and techniques for testing against such issues. In addition, we agnosis of system configuration as well. Especially, when it comes outline current challenges of isolating and fixing faults in configu- to large software comprising million lines of source code and also to ration settings, including improving fault localization, handling the cases where source code is not available, approaches have to follow case of multi-stack systems, and configuration verification at run- a more black-box oriented approach. This approach also enables di- time. agnosis in case of hardware or systems in general where hard- and software is investigated. 1 Introduction In more detail, given a program, its configuration parameters (or settings), and an execution environment, a software configuration er- Tackling software configuration errors is recognized as an important ror comes forward when the parameters assume incorrect values. The research problem which has been investigated by many groups from configuration parameters might specify multiple aspects of system academia and industry, e.g., see [51]. In a recent study [52], the au- behavior, including adaptation to execution environment (paths, net- thors report empirical findings on the impact of configuration errors work settings, ..), functionality (enabled/disabled components, log- in practice. In particular, a study of over 500 real-world configura- ging, ...), performance and resource policies (cache sizes, number tion issues revealed that this type of problems constituted the largest of threads, ..), security settings, and others. Consequently, erroneous percentage (31%) of high-severity support requests. Moreover, a sig- configuration settings can cause failures of multiple types: complete nificant portion of these issues (16% to 47%) rendered systems fully crashes, partially disabled functionality, performance issues, inap- unavailable or caused severe performance degradation. Also other propriate resource usage, or security threads. A frequent scenario of studies [30] and incident reports [5] confirm that detecting and cor- a configuration error are parameter values which do not fit to the spe- recting configuration errors in software is of a great importance for cific execution environment. For example, we specified a path to a practical applications. working directory of the application but the user executing the pro- In this paper, we focus on providing an overview of current re- gram do not have write access to this directory, causing the program search in the area of software configuration diagnosis comprising to crash (or at least to terminate with an exception). fault detection, fault localization, and correction. Besides discussing In the context of this survey, we consider the configuration error research articles dealing with software configure errors, we further diagnosis problem in its most general form: detecting the root causes, discuss open issues and challenges that are worth being tackled in fu- i.e. isolating the configuration parameters with inappropriate values, ture research activities. While the excellent survey [51] has a broader and providing means for repair in terms of identifying correct val- scope and also includes aspects such as configuration-free/easy-to- ues or value ranges for these parameters (or adapting the execution configure systems, hardening against configuration errors, automat- environment). This definition implies that we do not target diagno- ing deployment and monitoring etc., we consider in this paper pri- sis of ”traditional” software bugs, since we assume that a repair is marily diagnosis aspects. We also cover the most recent state-of- possible without code changes. Note that it might be difficult to de- the-art work like diagnosing cross-stack configuration errors [32]. In cide whether a failure should be attributed to a configuration problem summary, this survey attempts to offer a compact and focused intro- or a software bug, and this challenge remains one of the open issues duction to this research area, thus serving as a good starting point for (see Section 3). For example, if a failure-triggering sequence of state- further contributions. ments in a faulty program is executed only because of a certain pa- Although, there has been work also dealing with configurations rameter setting, the subsequent failure might appear to be caused by and configuration errors for systems comprising hardware and soft- a configuration error. 1 Heidelberg University, Germany, email: artur.andrzejak@informatik.uni- We organize this paper as follows: We first discuss in Section 2 heidelberg.de previous research works dealing with software configuration diagno- 2 University Klagenfurt, Austria, email: Gerhard.Friedrich@aau.at sis. In the following Section 3 we present open research challenges 3 TU Graz, Institute for Software Technology, Austria, email: that have not been tackled so far. We discuss threats to validity in wotawa@ist.tugraz.at 2 Sec. 4. Finally, we summarize the content and the findings of this Linking configuration options and code regions. Approaches in paper (Section 5). this group attempt to find a correspondence between a configuration option and code regions impacted by this option. Frequently, such techniques exploit static [43] or dynamic program slicing [14]. In 2 Previous Work on Software Configuration program slicing, one attempts to find the set of all code locations Diagnosis which might influence a target statement (so-called seed), or all code locations which might be influenced by a seed statement. Hence, In this section, we discuss research work that has been published in there approaches are mainly applicable in the software configuration the area of software configuration diagnosis. We obtained the papers setting and may not be generalizable to deal with hardware configu- searching relevant digital libraries from IEEE and ACM. We further ration diagnosis. focussed on the most recent work in this area not older than 10 years. ConfAnalyzer [29] builds a map from each program point to the Hence, we do not claim the survey to comprise all papers in the con- options that might cause an error at that point by static data-flow text of software configuration errors (for a more comprehensive col- analysis. For diagnosis, it treats a configuration option as the root lection see [51]). However, the presented papers are intended to give cause if its value flows into the crashing point. The approach does an overview of the current research directions in software configura- not require from users to install or use additional tools, but it can use tion diagnosis and methods and techniques used for this purpose. logs and stack traces to reduce the rate of false positives. In order to present the discussed papers in an accessible way, we ConfDiagnoser [57, 56] uses static analysis, dynamic profiling, classify the paper accordingly to the following categories: (i) diag- and statistical analysis to link the undesired behavior that are repre- nosing single-layer configuration errors, (ii) diagnosing cross-stack sented by predicates to configuration options. When these predicates configuration errors, (iii) diagnosing using configuration knowledge, indicate behavior deviating from the one known for correct profiles, and (iv) other aspects of software configuration diagnosis. Single- ConfDiagnoser lists the relevant configuration options as suspects. layer configuration errors are errors found in one-component ap- Work [58] presents a technique and a tool to troubleshoot con- plications like MySQL, Hive, or Spark. Typically, such applica- figuration errors caused by software evolution. The approach uses tions have one common configuration file/database and are devel- dynamic profiling, execution trace comparison, and static analysis to oped as an integral project. Cross-stack configuration errors occur in link the undesired behavior to its root cause - a configuration option multi-component applications or software stacks like LAMP (Linux, which needs to be changed in the new software version. Apache Web Server, MySQL, PHP, Wordpress/Drupal), J2EE, or ConfDoctor [7] is an approach based on static analysis to diag- MEAN. nose configuration defects. It does not require users to execute an The rational behind these categories is the following. Most previ- instrumented program or to reproduce errors, which is an essential ous work is available for diagnosing single-layer configuration errors advantage compared to previous approaches. The only run-time in- and this case offers an opportunity for an overview of existing diag- formation required is the stack trace of a failure. An evaluation on nosis approaches. Diagnosis of cross-stack configuration errors pose JChord, Randoop, Hadoop, and Hbase shows that the approach could additional challenges. In some cases, the source code of stack com- successfully diagnose 27 out of 29 errors, with 20 of them ranked ponents might not be available, precluding usage of general program first. analysis techniques. More frequently, cross-stack configuration er- Authors of [25] propose a lightweight dynamic analysis technique rors are frequently caused by a mismatch between the configuration that automatically discovers a program’s interactions, i.e., logical for- settings within separate components [32, 33]. To diagnose such is- mulae that give developers information about how a system’s config- sues, knowledge about the interactions between the components is uration option settings map to particular code coverage. It is evalu- needed. ated on 29 programs spanning five languages and could find precise In case of the availability of formal knowledge about configura- interactions based on a very small fraction of the number of possible tions, i.e., configuration rules or constraints, diagnosis can be per- configurations. formed using this knowledge. Such formal knowledge bases may be applicable for single-layer or cross-stack applications. Finally, there are other aspects that cannot be assigned to one of Data flow analysis. ConfAid [3] applies dynamic information flow the former categories, for example testing configurable systems or analysis techniques to track tokens from specified “configuration optimization of software based on configuration parameters. sources” and analyze dependencies between the tokens and the er- ror symptoms, pinpointing which tokens are root causes. Sherlog [53] uses static analysis to infer control and data infor- 2.1 Diagnosing Single-Layer Configuration Errors mation in case of a failure. It analyses source code by exploiting in- formation from run-time logs and computes what must or may have Single-layer programs are typically written in a single programming happened during the failed run. One deficiency of this tool is that it language and often the source code is available. Hence, static and dy- may require guidance from developers about which function should namic program analysis techniques can be applied to obtain a map- be symbolically executed. ping from configuration options to code regions. This information Paper [17] introduces Lotrack, an extended static taint analysis ap- can be exploited for localizing the root cause behind configuration proach and tool to automatically track configuration options. It de- errors. Consequently, a lot of approaches for diagnosis configuration rives a configuration map that explains for each code fragment under errors in such programs have been proposed. which configurations it may be executed. 3 Supervised learning approaches. Relatively few authors propose Sayagh et al. [33] perform a qualitative analysis of over 1,000 con- to use machine learning approaches based on supervised learning figuration errors to understand their impact and complexity. Based (i.e. mainly classification). This can be explained by the fact that it is on this data they develop a slicing-based approach to identify error- difficult to obtain or generate training data with appropriate structure inducing configuration options in heterogeneous software stacks. So and in sufficient amount. Similarly to the challenges of mutation test- far it is the only approach which attempts to provide a complete, end- ing, if training samples are generated, faults injected in the configu- to-end process for diagnosing cross-stack configuration errors. ration files might not trigger a failure or have unrealistic properties. Work [4] focuses on finding configuration inconsistencies between Also, since a configuration file might contain hundreds of options, layers in complex, multi-component software. The proposed tech- a training set is likely to containt only few faulty cases per option, nique (based on static analysis) can handle software that is written giving rise to the unbalanced class problem. in multiple programming languages and has a complex preference Authors of [41] use machine learning to predict whether a configu- structure. ration error is responsible for a failure and if yes, what is the category In [31] the authors target the identification of configuration depen- of the error. To obtain training data, faults are injected into configu- dencies in multi-tiered enterprise applications. It provides a method ration files and the resulting error category is manually labeled. for analyzing existing deployments to infer the configuration depen- Work [38] exploits statistical decision tree analysis to determine dencies in a probabilistic sense. This yields rank-ordered list of de- possible misconfigurations in data center environments. The authors pendencies so that administrators can consult it and systematically further improve the accuracy of this approach via a pattern modifica- identify the true dependencies. tion method. Authors of [12] attempt to quantify the challenges that config- urability of complex, multi-component systems creates for software testing and debugging. It analyzes a highly-configurable industrial Replay-based techniques. One category of well-known tools [44, application and two open source applications. They notice that all 37, 20] are the replay-based diagnosis techniques. They treat the sys- three applications consist of multiple programming languages, lim- tem as a black box to automatically run the system with possible iting the applicability of static analysis. Furthermore, they find out configurations values without damaging the rest of the system until that there many access points and methods to modify configurations, fixing the misconfiguration. This class of techniques relies on having and that the configuration state of an application on failure cannot be a working configuration. Otherwise, it can not be applied. Besides, determined only from persistent data. they require users with more domain knowledge. Signature-based approaches. Another family of tools mine a large 2.3 Rules, Constraints and Fixing their Violations amount of configuration data from different instances to infer rules Once configuration knowledge can be described using constraints or about options and use these rules to identify software misconfigura- rules they can be used for diagnosis as well. The use of such knowl- tions. edge is neither restricted to single-layer nor cross-stack applications EnCore [55] and CODE [54] belong to this category of work. En- in general. Hence, methods and techniques based on rules and con- Core takes into account the interaction between the configuration set- straints, which can also be seen as models of the applications, would tings and the executing environment, as well as the correlations be- provide a more general account to solve the software configuration tween configuration entries. It learns configuration rules from a given error problem. In this section, we distinguish methods for learning set of sample configurations and pinpoints configuration anomalies knowledge, fixing violations, and inconsistency detection between based on these rules. different software artifacts. Analogously, some tools such as Strider [42] or PeerPressure [40] adopt statistical techniques to compare values of configuration op- tions in a problematic system with those in other systems to infer the Learning constraints and rules. Several existing approaches ex- root cause of a failure. All these techniques require substantial effort tract configuration models [42, 40, 54, 50, 55] and leverage them for to collect the baseline data. configuration debugging, mainly via detecting value anomalies and rule violations. The categories of extracted data constituting the models typically 2.2 Diagnosing cross-stack configuration errors include the primitive and semantic data types of configuration op- tions (e.g., integer, file path, port number, URL), the value ranges of Configuration options in multi-layer architectures (e.g., LAMP, options (minimum and maximum integer values or a list of accept- J2EE, or MEAN “software stacks”) might easily contradict each able values), the control dependencies (i.e., usage of parameter Q other or be hard to trace to each other. Therefore, configuration error relies on the setting of another parameter P ), and value relationships diagnosis in such architectures is particularly challenging [51]. On (e.g., value of parameter S should be greater than that of parameter the other hand, so far there are very few research approaches or tools T ). EnCore [55] additionally considers the properties of the execu- targeting this scenario [33]. tion environment as a part of their models. Sayagh and Adams [32] conducted an empirical study on multi- CODE [54] takes a unique approach and uses dynamic execution layer configuration options across Wordpress (WP) plugins, WP, and information as the model content, namely sequences of (Windows) the PHP engine. They discover a large and increasing number of con- registry accesses and derived rules. Using these rules for efficient figuration options used by WP and its plugins. In addition, over 85% filtering of even large lists of events, CODE can detect not only con- of these options are used by at least two plugins at the same time. figuration errors but also deviant program executions. It requires no 4 source code, application-specific semantics, or heavyweight program tomizing the behavior and initial settings of software applications, analysis. server processes, and operating systems. Their distinctive property SPEX [50] analyzes source code to infer configuration option con- is that each option is processed, defined, and described in different straints and use these constraints to diagnose software misconfigura- parts of a software project - namely in code, in configuration file, and tions, to expose misconfiguration vulnerabilities, and to detect error- in documentation. This creates a challenge for maintaining project prone configuration design and handling. consistency as it evolves. It also promotes inconsistencies leading to misconfiguration issues in production scenarios. Build-time configuration settings. Another category of work ad- Confalyzer [30] uses static analysis to extract a list of configura- dresses configurations and their constraints used at compilation and tion option from source code and from associated options documen- build time. Such configurations determine whether certain product tation. Confalyzer first marks configuration APIs in the configura- features (e.g. logging, debugging) are activated, or even which soft- tion classes. Then it identifies calls to these APIs in the program by ware components are included in the shipped product. The later as- building a call graph and obtains option names by reading values of pect is relevant e.g., for software product lines. parameters of these calls. Works [22], [23] propose a static analysis approach to extract PrefFinder [11] proposed by Jin et al., uses static analysis and dy- (build-time) configuration constraints from C code. Despite of its namic analysis techniques to extract configuration options and stores simplicity, it has high precision (77% - 93% in the studied systems) them in a database for query and use. and can recover 28% of existing constraints. A further study of the The SCIC approach [4] exploits Confalyzer to implement the func- authors reveals that configuration constraints enforce correct runtime tionality of extracting configuration options in the key-value model behavior, improve users’ configuration experience, and prevent cor- and the tree-structured model. ner cases. Work [6] proposes an approach for detection of inconsistencies between source code and documentation based on static analysis. Fixing violations of configuration constraints. The problem of It identifies source code locations where options are read and for fixing a configuration that violates one or more constraints is ad- each such location retrieves the name of the option. Inconsistencies dressed in [47, 48]. The authors introduce to this purpose the concept are then detected by comparing the results against the option names of a range fix, which specifies the options to change the ranges of val- listed in documentation. ues for these options. They also design and evaluate an algorithm that automatically generates range fixes for a violated constraint. Empiri- 2.4 Other Aspects cal studies shows that the range fix approach provides mostly simple yet complete sets of fixes and has a moderate running time in the There are other papers dealing with diagnosis of software configura- order of seconds. tion errors not falling into the previous categories like testing, end- Configurable software (e.g., Linux OS, eCos) can have very high user support and performance optimization, which we discuss in this number of options (variables) and constraints. E.g., Linux has over subsection. 6,000 variables and 10,000 constraints; eCos has over 1,000 variables and 1,000 constraints. Such systems typically use variability model- Testing of highly configurable systems. Paper [18] presents an ing languages and configuration tools (called configurators). Exam- initial study on the potential of using statistical testing techniques for ples of variability languages include Linux Kconfig, eCos CDL, and improving the efficiency of test selection for configurable software. feature models. With variability modeling languages and configura- The study aims to answer whether statistical testing can reduce the tors, errors can be detected early, but users still have to resolve the effort of localizing the most critical software faults, seen from user errors, which is also not an easy task: the constraints in variability perspective. models can be very complex and highly interconnected. Therefore, Authors of [19] analyze program traces to characterize and iden- researchers have proposed automated approaches that suggest a list tify where interactions occur on control flow and data. They find that of fixes for an error. A fix is a set of changes that, when performed the essential configuration complexity of these programs is indeed on the configuration, resolve the current error. However, the recom- much lower than the combinatorial explosion of the configuration mended fixes in these approaches are sometimes large in number and space indicates. size. For example, fix lists for eCos configurations contain up to nine Work [36] proposes S-SPLat, a technique that combines heuristic fixes, and some fixes change up to nine variables. sampling with symbolic search to explore enormous space of config- In this context, work [39] proposes a method to reduce the size urations for testing of software product lines. and complexity of error fixes by introducing a concept of dynamic A more general approach for testing configurable systems includ- priorities. The basic idea is to first generate one fix and then to grad- ing software is combinatorial testing [15, 16]. There the underlying ually reach the desirable state based on user feedback. To this end, assumption is that it is not necessarily one configuration parameter the approach (1) automatically translates user feedback into a set of that reveals a fault but a certain combination of parameters. Combi- implicit priority levels on variables, using five priority assignment natorial testing assures to compute all combinations for any arbitrary and adjustment strategies and (2) efficiently identifies potentially de- subset of configuration parameters of arity k. In the context of com- sirable fixes that change only the variables with low priorities. binatorial testing, the resulting test suite is said being of strength k. There are many algorithms and tools for combinatorial testing [13]. Detecting inconsistencies between code, documentation, and For a survey on combinatorial testing we refer the interested user configuration files. Configuration options are widely used for cus- to [26]. 5 Configuration and debugging support for end-users. A tech- 3 Challenges in Configuration Diagnosis nique to detect inadequate (i.e., missing or ambiguous) diagnostic messages for configuration errors issued by a configurable software Based on the survey of papers presented in the previous section, we system is proposed in [59]. It injects configuration errors and uses are able to identify several still open challenges. A general challenge natural language processing to analyze the resulting diagnostic mes- that immediately arises is to distinguish whether an application fail- sages. It then identifies messages which might be unhelpful in diag- ure is due to a fault in the configuration setup or code defect in the nosis or even negatively impact this process. program. This is a common problem when applying configuration Authors of [49] study configuration settings of real-world users debugging tools, which usually assumes a certain cause. If we want from multiple projects and reveal patterns of unnecessary complex- to come up with a general approach for software configuration di- ity in configuration design. The authors also provide a few guidelines agnosis, we have to adapt diagnosis to identify the underlying root to reduce the configuration space. Finally, the existing configuration cause. navigation methods are studied in terms of their effectiveness in deal- A method that is able to separate these causes would take the cur- ing with the over-designed configuration. rent configuration, the program, the description of the execution en- Work [28] introduces ConfSeer, a system which recommends to vironment, and the passing/failing tests as input. Based on these in- users suitable knowledge base articles which are likely to describe puts the possible causes of a failure are provided as output. In order to user’s current configuration problem and its fix. To this end, Conf- come up with such an approach, it is necessary to have a close look at Seer takes the snapshots of configuration files from a user machine various configuration diagnosis problems, given consequently raise as input, then extracts the configuration parameter names and value to the another challenge, i.e., providing an open repository of various settings from the snapshots and matches them against a large set of configuration diagnosis problems that can be accessed by researchers KB articles. If a match is found, ConfSeer pinpoints the configuration in this field. error with its matching KB article. The described system powers the Such a general repository for software configuration diagnosis recommendation engine behind Microsoft Operations Management should include a larger set of different programs from single-layer Suite. to cross-stack applications together with configuration errors com- ing from different sources, test suites, and ideally also configuration knowledge bases. The repository should cover programs of different Optimizing performance via configuration settings. In [24], a sizes and from different domains capturing currently available soft- rank-based approach to efficient creation of performance models is ware to allow comparing different configuration diagnosis methods introduced. Such models can be exploited for finding an optimally and techniques. performing configuration of a software system. Besides these two general challenges, there are other challenges Authors of [10] conducted an empirical study on four popular soft- that are more specific to the applications (single-layer versus cross- ware systems by varying software configurations and environmental stack) or the tasks to be tackled (i.e., fault localization and repair conditions, to identify the key knowledge pieces that can be exploited versus fault detection). In the following, we illustrate some of these for transfer learning for constructing performance models of config- more specific challenges in detail. urable software systems. Paper [35] proposes a multi-objective evolutionary algorithm to Diagnosis of single-layer applications Despite the fact that there find the optimal solutions and addresses the configuration optimiza- have been various methods already published in this domain, there tion problem for software product lines. are still some open issues. Finally, the work described in [27] employs random sampling and recursive search in a configuration space to find optimally performing • Transfer techniques from functional fault localization: In case of configurations for an anticipated workload in software product lines. software debugging, there are various methods available going be- yond program analysis including spectrum-based fault localiza- tion [1, 2] among others. In this approach, code regions are ranked 2.5 Survey Summary (essentially) according to the number of times there are executed by passing or by failing tests (intuition: if a code line is executed There are lots of papers dealing with configuration diagnosis of sin- primarily by failing tests, it is more likely to contribute to a fail- gle layer applications often employing program analysis techniques ure). For a detailed look at current debugging techniques we re- but also making use of machine learning or replay methods. In case fer the interested reader to Wong et al.’s survey [46]. In particu- of more complicated applications comprising interacting and con- lar spectrum-based fault localization offers superior performance figurable software components there have been less papers dealing compared to static and dynamic program analysis applied to de- with concrete solutions. One approach that can be used in both cases bugging. The open research question that is, whether spectrum- of software is to make use of formalized knowledge about config- based fault localization can be efficiently used for software con- urations, i.e., the configuration parameters, their domains, and rules figuration diagnosis as well. specifying limitations and relationships among parameters. It would • Study and exploit the trade-off between the type of data from users be interesting to investigate whether classical approaches to diagno- required for diagnosis (as well as the effort of obtaining this data, sis of knowledge-bases like [8, 45, 9, 34] can also be successfully e.g., via instrumentation) and the achieved accuracy. The research applied for configuration diagnosis. Other aspects, discussed in this goals that would go into this direction include: section include testing configurations, end-user support, and perfor- – For each type of diagnosis data (from static analysis to diag- mance optimization. nosis data dynamically created from instrumentation and also 6 for combinations) understand and quantify the degree of likely cation environment, but are probably more comprehensive if this penalties (e.g., in terms of accuracy) of using only this data for is also taken into account. diagnosis. Specifically, characterize error types which can be or cannot be diagnosed for each type of diagnosis data (when Consequently, this discussion gives rise to the following goals: using state-of-the art debugging approaches). • Attempt automated test generation that considers the state of the – For each “class” of diagnosis data, attempt to improve the cor- application environment and the configuration settings (maybe responding state-of-the art diagnosis methods in terms of types implicitly). Such tests would adapt to environment changes and of errors they are able to debug. This can be done e.g., by an target only the above-mentioned mismatch between environment in-depth analysis why they fail for some error types and by pro- and configuration. In order to avoid confusion with the meaning of viding substrates/replacements for the missing diagnosis data. traditional testing, we might call this “configuration verification” step instead of testing. Diagnosing of cross-stack configuration errors In the case of • Generate tests that verify only the consistency of configurations cross-stack applications, there is not so much work available. Impor- between layers of a multi-stack system. In this case a test failure tant open research challenges include: should indicate only an inconsistency, not a lack of adaptation to the production environment. For example, a test could only verify • Exploit work on consistency checking to detect potential incon- the consistency of configurations across layers, not execute the sistencies between different stack layers. whole application. • Leverage existing work on extraction of rules and constraints to • Generate tests which verify the correctness of application’s be- model dependencies between layers. Then use the techniques for havior independently of the configuration settings. For example, discovery and fixing of constraint violations to diagnose (and pos- an application should produce the same behavior independently sibly repair) cross-stack configuration errors. of the exact path to input/output/libraries, number of used threads • As a further application of extracted rules, configurator-like tools (in some range), used compiler (or its flags) etc. (as used for configuring operating systems) could be used for safe • Generate tests that improve the outcome of fault localization. configuration of cross-stack systems. There it would be necessary to identify those tests that can dis- • Create models of expected behavior (given a current global con- tinguish different computed root causes (see e.g., [21]). figuration) of each layer from the perspective of each layer. Di- vergences in the behavior might indicate potential configuration 4 Threats to Validity inconsistencies or errors. For example, given the current config- uration of a database-layer (specifying n1 database connections), Several threats to validity of this paper exist. The main one is the risk also the PHP-layer should allow n1 database connections. How- of omitting important contributions to this field. To mitigate this risk, ever, if the expected behavior of PHP-layer, based on its own con- we have created lists of relevant works using several processes de- figuration, allows only n2 < n1 database-connections, then an scribed below. We then merged and pruned the results according to inconsistency between these two behavioral models is indicated. the rank of the publishing venue and originality (i.e. works proposing a novel or distinctive approach were included even if published in a It is worth noting that it is quite important which dependencies workshop). In the first literature collection process, we searched for or interaction between layers can be observed or recorded. More- publications containing the word ”configuration” that were published over, in the context of these challenges the application of model- in selected high-quality venues (ICSE, ASE, ISSTA, FSE, ISSRE, based approaches for diagnosing (configuration) knowledge-base, ICSME, ICPC, IEEE Trans. Software Eng., and some others) in the e.g., [8, 45, 9, 34], might be worth being considered. last five years; for each found publication, we verified via abstract whether a publication indeed targets configuration error (diagnosis). Testing-related challenges and goals In case of testing, we are In the second process, we read the related work sections of the pre- interested in detecting faults caused by configuration settings. There viously identified works, and created a list of papers discussed there, the motivation is to improve testing approaches specifically for de- which are of relevance (here, also less prestigious venues were con- tecting faults in system configurations ideally during software devel- sidered). Finally, we screened the survey [51] for checking that no opment. To clarify the meaning of “software testing” in context of important contribution was omitted. configuration (errors) we should consider that an application failure Another threat to validity is the possibility to misinterpret any of in this context does not necessarily imply that there is a defect in the discussed papers (e.g. due to different understanding of terms), code (as in traditional testing). Such a failure rather indicates that: and state here inaccurate claims. To reduce this risk, we have studied each described contribution in a depth sufficient to avoid a misinter- • There is a mismatch between the state of the application environ- pretation. Besides of this, information from related work section to ment (operating system, file system, hardware, location of input verify our interpretation was used where available. data, libraries, network properties, remote components, etc.) and the configuration settings. This implies that a test for this type of 5 Conclusion error must take into consideration the environment. • There is an inconsistency between configuration values, either In this paper, we presented a survey on methods and techniques used within a single layer or between layers in a multi-layer applica- for detecting, localizing, and correcting faults in the context of soft- tion. The corresponding tests might be independent of the appli- ware configurations. We distinguished the different cases of software 7 configuration diagnosis for single-layer and cross-stack applications [13] Sunint Kaur Khalsa and Yvan Labiche, ‘An orchestrated survey of as well as methods used in case of available configuration knowledge available algorithms and tools for combinatorial testing’, in 25th Inter- national Symposium on Software Reliability Engineering, pp. 323–334, and further aspects. From the survey we were able to identify some (2015). still open challenges and research questions including distinguish- [14] Bogdan Korel and Janusz Laski, ‘Dynamic Program Slicing’, Informa- ing different variants of potential root causes, the lack of repositories tion Processing Letters, 29, 155–163, (1988). of application-cases for validating and comparing research results as [15] D. R. Kuhn, R. N. Kacker, and Y. Lei, ‘Combinatorial testing’, in En- well as the need for new fault localization and testing methods. cyclopedia of Software Engineering, ed., Phillip A. Laplante, Taylor & Francis, (2012). The motivation for this paper is to provide a solid basis for fu- [16] D. Richard Kuhn, Renee Bryce, Feng Duan, Laleh Sh. Ghandehari, ture research in this area and to identify some important challenges Yu Lei, and Raghu N. Kacker, ‘Combinatorial testing: Theory and prac- in software configuration diagnosis worth being tackled. We also in- tice’, in Advances in Computers, volume 99, 1–66, Elsevier, (2015). dicated some relationships with work on diagnosis of configuration [17] Max Lillack, Christian Kästner, and Eric Bodden, ‘Tracking Load-time Configuration Options’, in 29th ACM/IEEE International Conference knowledge bases and other approaches of software debugging that on Automated Software Engineering, ASE ’14, pp. 445–456, New York, might stimulate this field. Because of the growing interest in provid- NY, USA, (2014). ACM. ing programs comprising a stack of other programs that themselves [18] Dusica Marijan, ‘Improving Configurable Software Testing with Statis- can be configured, we see a growing need for research in this area. tical Test Selection’, in International Workshop on Formal Methods for Analysis of Business Systems, ForMABS 2016, pp. 5–8, New York, NY, USA, (2016). ACM. REFERENCES [19] J. Meinicke, C. P. Wong, C. Kästner, T. Thüm, and G. Saake, ‘On essential configuration complexity: Measuring interactions in highly- [1] Rui Abreu, Peter Zoeteweij, Rob Golsteijn, and Arjan J. C. van configurable systems’, in 2016 31st IEEE/ACM International Con- Gemund, ‘A practical evaluation of spectrum-based fault localization’, ference on Automated Software Engineering (ASE), pp. 483–494, Journal of Systems and Software, 82(11), 1780–1792, (2009). (September 2016). [2] Rui Abreu, Peter Zoeteweij, and Arjan J. C. van Gemund, ‘Spectrum- [20] James Mickens, Martin Szummer, and Dushyanth Narayanan, ‘Snitch: based multiple fault localization’, in ASE 2009, 24th IEEE/ACM In- Interactive Decision Trees for Troubleshooting Misconfigurations’, in ternational Conference on Automated Software Engineering, Auckland, 2Nd USENIX Workshop on Tackling Computer Systems Problems with New Zealand, November 16-20, 2009, pp. 88–99. IEEE Computer So- Machine Learning Techniques, pp. 8:1–8:6, Cambridge, MA, (2007). ciety, (2009). USENIX Association. [3] Mona Attariyan and Jason Flinn, ‘Automating Configuration Trou- [21] Nica Mihai, Nica Simona, and Wotawa Franz, ‘On the use of mutations bleshooting with Dynamic Information Flow Analysis’, in 9th USENIX and testing for debugging’, Software: Practice and Experience, 43(9), Conference on Operating Systems Design and Implementation, pp. 1– 1121–1142, (2013). 11, Vancouver, BC, Canada, (2010). USENIX Association. [22] S. Nadi, T. Berger, C. Kästner, and K. Czarnecki, ‘Where Do Configura- [4] Farnaz Behrang, Myra B. Cohen, and Alessandro Orso, ‘Users Be- tion Constraints Stem From? An Extraction Approach and an Empirical ware: Preference Inconsistencies Ahead’, in 2015 10th Joint Meeting Study’, IEEE Transactions on Software Engineering, 41(8), 820–841, on Foundations of Software Engineering, ESEC/FSE 2015, pp. 295– (August 2015). 306, New York, NY, USA, (2015). ACM. [23] Sarah Nadi, Thorsten Berger, Christian Kästner, and Krzysztof Czar- [5] Jon Brodkin. Why Gmail Went Down: Google Misconfigured Load necki, ‘Mining Configuration Constraints: Static Analyses and Empiri- Balancing Servers. https://goo.gl/Hdga7H. Accessed: 5 June cal Results’, in 36th International Conference on Software Engineering, 2018. ICSE 2014, pp. 140–151, New York, NY, USA, (2014). ACM. [6] Z. Dong, A. Andrzejak, D. Lo, and D. Costa, ‘ORPLocator: Identify- [24] Vivek Nair, Tim Menzies, Norbert Siegmund, and Sven Apel, ‘Using ing Read Points of Configuration Options via Static Analysis’, in 2016 Bad Learners to Find Good Configurations’, in 2017 11th Joint Meeting IEEE 27th International Symposium on Software Reliability Engineer- on Foundations of Software Engineering, ESEC/FSE 2017, pp. 257– ing (ISSRE), pp. 185–195, (October 2016). 267, New York, NY, USA, (2017). ACM. [7] Z. Dong, A. Andrzejak, and K. Shao, ‘Practical and accurate pinpoint- [25] ThanhVu Nguyen, Ugur Koc, Javran Cheng, Jeffrey S. Foster, and ing of configuration errors using static analysis’, in 2015 IEEE Interna- Adam A. Porter, ‘iGen: Dynamic Interaction Inference for Configurable tional Conference on Software Maintenance and Evolution (ICSME), Software’, in 2016 24th ACM SIGSOFT International Symposium on pp. 171–180, (September 2015). Foundations of Software Engineering, FSE 2016, pp. 655–665, New [8] A Felfernig, G Friedrich, D Jannach, and M Stumptner, ‘Consistency- York, NY, USA, (2016). ACM. based diagnosis of configuration knowledge bases’, Artificial Intelli- [26] Changhai Nie and Hareton Leung, ‘A survey of combinatorial testing’, gence, 152(2), 213–234, (2004). ACM Computing Surveys, 43(2), (January 2011). [9] A. Felfernig, M. Schubert, and C. Zehentner, ‘An efficient diagnosis al- [27] Jeho Oh, Don Batory, Margaret Myers, and Norbert Siegmund, ‘Find- gorithm for inconsistent constraint sets’, Artificial Intelligence for En- ing Near-optimal Configurations in Product Lines by Random Sam- gineering Design, Analysis and Manufacturing, 26(1), 53–62, (2 2012). pling’, in 2017 11th Joint Meeting on Foundations of Software Engi- [10] Pooyan Jamshidi, Norbert Siegmund, Miguel Velez, Christian Kästner, neering, ESEC/FSE 2017, pp. 61–71, New York, NY, USA, (2017). Akshay Patel, and Yuvraj Agarwal, ‘Transfer Learning for Performance ACM. Modeling of Configurable Systems: An Exploratory Analysis’, in 32Nd [28] Rahul Potharaju, Joseph Chan, Luhui Hu, Cristina Nita-Rotaru, Ming- IEEE/ACM International Conference on Automated Software Engineer- shi Wang, Liyuan Zhang, and Navendu Jain, ‘ConfSeer: Leveraging ing, ASE 2017, pp. 497–508, Piscataway, NJ, USA, (2017). IEEE Press. Customer Support Knowledge Bases for Automated Misconfiguration [11] Dongpu Jin, Myra B. Cohen, Xiao Qu, and Brian Robinson, Detection’, Proc. VLDB Endow., 8(12), 1828–1839, (August 2015). ‘PrefFinder: Getting the Right Preference in Configurable Software [29] Ariel Rabkin and Randy Katz, ‘Precomputing Possible Configuration Systems’, in 29th ACM/IEEE International Conference on Automated Error Diagnoses’, in 2011 26th IEEE/ACM International Conference Software Engineering, ASE ’14, pp. 151–162, New York, NY, USA, on Automated Software Engineering, pp. 193–202, Washington, DC, (2014). ACM. USA, (2011). IEEE Computer Society. [12] Dongpu Jin, Xiao Qu, Myra B. Cohen, and Brian Robinson, ‘Config- [30] Ariel Rabkin and Randy Katz, ‘Static Extraction of Program Config- urations Everywhere: Implications for Testing and Debugging in Prac- uration Options’, in 33rd International Conference on Software Engi- tice’, in Companion Proceedings of the 36th International Conference neering, ICSE ’11, pp. 131–140, New York, NY, USA, (2011). ACM. on Software Engineering, ICSE Companion 2014, pp. 215–224, New York, NY, USA, (2014). ACM. 8 [31] Vinod Ramachandran, Manish Gupta, Manish Sethi, and Soudip Roy [46] W. Eric Wong, Ruizhi Gao, Yihao Li, Rui Abreu, and Franz Wotawa, Chowdhury, Determining Configuration Parameter Dependencies via ‘A survey on software fault localization’, IEEE Trans. Software Eng., Analysis of Configuration Data from Multi-tiered Enterprise Appli- 42(8), 707–740, (2016). cations’, in 6th International Conference on Autonomic Computing, [47] Y. Xiong, H. Zhang, A. Hubaux, S. She, J. Wang, and K. Czarnecki, ICAC ’09, pp. 169–178, New York, NY, USA, (2009). ACM. ‘Range Fixes: Interactive Error Resolution for Software Configuration’, [32] M. Sayagh and B. Adams, ‘Multi-layer software configuration: Em- IEEE Transactions on Software Engineering, 41(6), 603–619, (June pirical study on wordpress’, in 2015 IEEE 15th International Working 2015). Conference on Source Code Analysis and Manipulation (SCAM), pp. [48] Yingfei Xiong, Arnaud Hubaux, Steven She, and Krzysztof Czarnecki, 31–40, (September 2015). ‘Generating Range Fixes for Software Configuration’, in 34th Inter- [33] Mohammed Sayagh, Noureddine Kerzazi, and Bram Adams, ‘On national Conference on Software Engineering, ICSE ’12, pp. 58–68, Cross-stack Configuration Errors’, in 39th International Conference on Piscataway, NJ, USA, (2012). IEEE Press. Software Engineering, ICSE ’17, pp. 255–265, Piscataway, NJ, USA, [49] Tianyin Xu, Long Jin, Xuepeng Fan, Yuanyuan Zhou, Shankar Pasu- (2017). IEEE Press. pathy, and Rukma Talwadker, ‘Hey, You Have Given Me Too Many [34] Kostyantyn Shchekotykhin, Gerhard Friedrich, Patrick Rodler, and Knobs!: Understanding and Dealing with Over-designed Configuration Philipp Fleiss, ‘Sequential diagnosis of high cardinality faults in in System Software’, in 2015 10th Joint Meeting on Foundations of knowledge-bases by direct diagnosis generation’, in ECAI ’14, pp. 813– Software Engineering, ESEC/FSE 2015, pp. 307–319, New York, NY, 818, (2014). USA, (2015). ACM. [35] K. Shi, ‘Combining Evolutionary Algorithms with Constraint Solving [50] Tianyin Xu, Jiaqi Zhang, Peng Huang, Jing Zheng, Tianwei Sheng, for Configuration Optimization’, in 2017 IEEE International Confer- Ding Yuan, Yuanyuan Zhou, and Shankar Pasupathy, ‘Do Not Blame ence on Software Maintenance and Evolution (ICSME), pp. 665–669, Users for Misconfigurations’, in Twenty-Fourth ACM Symposium on (September 2017). Operating Systems Principles, pp. 244–259, Farminton, Pennsylvania, [36] S. Souto, M. D’Amorim, and R. Gheyi, ‘Balancing Soundness and (2013). ACM. Efficiency for Practical Testing of Configurable Systems’, in 2017 [51] Tianyin Xu and Yuanyuan Zhou, ‘Systems Approaches to Tackling IEEE/ACM 39th International Conference on Software Engineering Configuration Errors: A Survey’, ACM Comput. Surv., 47(4), 70:1– (ICSE), pp. 632–642, (May 2017). 70:41, (July 2015). [37] Ya-Yunn Su, Mona Attariyan, and Jason Flinn, ‘AutoBash: Improving [52] Zuoning Yin, Xiao Ma, Jing Zheng, Yuanyuan Zhou, Lakshmi N. Configuration Management with Operating System Causality Analy- Bairavasundaram, and Shankar Pasupathy, ‘An Empirical Study on sis’, in Proceedings of Twenty-first ACM SIGOPS Symposium on Oper- Configuration Errors in Commercial and Open Source Systems’, in ating Systems Principles, pp. 237–250, Stevenson, Washington, USA, Twenty-Third ACM Symposium on Operating Systems Principles, SOSP (2007). ACM. ’11, pp. 159–172, New York, NY, USA, (2011). ACM. [38] T. Uchiumi, S. Kikuchi, and Y. Matsumoto, ‘Misconfiguration detection [53] Ding Yuan, Haohui Mai, Weiwei Xiong, Lin Tan, Yuanyuan Zhou, and for cloud datacenters using decision tree analysis’, in Network Opera- Shankar Pasupathy, ‘SherLog: Error Diagnosis by Connecting Clues tions and Management Symposium (APNOMS), 2012 14th Asia-Pacific, from Run-time Logs’, in Fifteenth Edition of ASPLOS on Architectural pp. 1–4, (September 2012). Support for Programming Languages and Operating Systems, ASPLOS [39] Bo Wang, Leonardo Passos, Yingfei Xiong, Krzysztof Czarnecki, XV, pp. 143–154, New York, NY, USA, (2010). ACM. Haiyan Zhao, and Wei Zhang, ‘SmartFixer: Fixing Software Config- [54] Ding Yuan, Yinglian Xie, Rina Panigrahy, Junfeng Yang, Chad Ver- urations Based on Dynamic Priorities’, in 17th International Software bowski, and Arunvijay Kumar, ‘Context-based Online Configuration- Product Line Conference, SPLC ’13, pp. 82–90, New York, NY, USA, error Detection’, in 2011 USENIX Conference on USENIX Annual (2013). ACM. Technical Conference, pp. 28–28, Portland, OR, (2011). USENIX As- [40] Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, and Yi-min sociation. Wang, ‘Automatic Misconfiguration Troubleshooting with PeerPres- [55] Jiaqi Zhang, Lakshminarayanan Renganarayana, Xiaolan Zhang, Niyu sure’, in In OSDI, pp. 245–258, (2004). Ge, Vasanth Bala, Tianyin Xu, and Yuanyuan Zhou, ‘EnCore: Exploit- [41] Mengliao Wang, Xiaoyu Shi, and K. Wong, ‘Capturing Expert Knowl- ing System Environment and Correlation Information for Misconfig- edge for Automated Configuration Fault Diagnosis’, in 2011 IEEE uration Detection’, in 19th International Conference on Architectural 19th International Conference on Program Comprehension (ICPC), pp. Support for Programming Languages and Operating Systems, pp. 687– 205–208, (June 2011). 700, Salt Lake City, Utah, USA, (2014). ACM. [42] Yi-min Wang, Chad Verbowski, John Dunagan, Yu Chen, Helen J. [56] Sai Zhang, ‘ConfDiagnoser: An Automated Configuration Error Di- Wang, and Chun Yuan, ‘STRIDER: A Black-box, State-based Ap- agnosis Tool for Java Software’, in 2013 International Conference on proach to Change and Configuration Management and Support’, in In Software Engineering, ICSE ’13, pp. 1438–1440, Piscataway, NJ, USA, Usenix LISA, pp. 159–172, (2003). (2013). IEEE Press. [43] Mark Weiser, ‘Program slicing’, IEEE Transactions on Software Engi- [57] Sai Zhang and Michael D. Ernst, ‘Automated diagnosis of software con- neering, 10(4), 352–357, (July 1984). figuration errors’, in ICSE’13, 34th International Conference on Soft- [44] Andrew Whitaker, Richard S. Cox, and Steven D. Gribble, ‘Configu- ware Engineering, San Francisco, CA, USA, (May 2013). ration Debugging As Search: Finding the Needle in the Haystack’, in [58] Sai Zhang and Michael D. Ernst, ‘Which Configuration Option Should 6th Conference on Symposium on Opearting Systems Design & Imple- I Change?’, in 36th International Conference on Software Engineering, mentation - Volume 6, pp. 6–6, San Francisco, CA, (2004). USENIX ICSE 2014, pp. 152–163, New York, NY, USA, (2014). ACM. Association. [59] Sai Zhang and Michael D. Ernst, ‘Proactive Detection of Inadequate [45] Jules White, David Benavides, Douglas C. Schmidt, Pablo Trinidad, Diagnostic Messages for Software Configuration Errors’, in Int. Symp. Brian Dougherty, and Antonio Ruiz Cortés, ‘Automated diagnosis of on Software Testing and Analysis (ISSTA), pp. 12–23, NY, USA, (2015). feature model configurations’, Journal of Systems and Software, 83(7), ACM. 1094–1107, (2010).