16 Towards a Statistical System Analysis Bernd Heidergott Professor of Stochastic Optimization Department of Econometrics and Operations Research VU Amsterdam University, the Netherlands Academic applied probability/operations research is mainly focused on the mathe- matical analysis of models that find their motivation in the outside (read, non-academic) world. In preparing a real-life problem for mathematical analysis, a ”model” has to be distilled, and once this is done, reality is replaced by this model, which is subsequently analyzed with much energy and analytical rigor. However, hardly ever are the exact model specifications known, and defining parameters of the model under consideration, such as arrival rates in queueing networks, failure rates of servers in reliability models, or demand rates in inventory systems, are only revealed to the analyst by statistics. The classical approach for dealing with such parameter insecurity is to integrate out the sys- tem performance with respect to the assumed/estimated distribution of the unknown parameter. We believe that in order to achieve a better understanding of model/parameter inse- curity a closer look into the way ”randomness” is used in the analysis of a given model is of importance. Randomness is an ubiquitous phenomenon. Without going too much into detail on the philosophical aspects of the concept of randomness and probability, one can loosely state that randomness is encountered as (1) lack of knowledge, or (2) variability in repeated realizations of a phenomenon. For example, (1) covers the so-called parameter insecurity and/or model insecurity. Indeed, often either the true distribution of a random variable used in a model is not known (=model insecurity) or the distributional parame- ters, such as mean, variance etc. are not known (=parameter insecurity). Statistics can then be used to narrow down the possible range of distribution models or range of param- eter values, but reaching certainty is epistemologically impossible. This in contrast to (2), where in principle laws of large numbers and ergodic theorems are available that allow to produce reliable measurements for which mathematically supported quality assessments are possible. The concept described in (1) relates to subjective probabilities, whereas the (2) relates to the frequentialistic interpretation of probability. Consider, for the sake of exposition, the following simple problem. Let Xθ be random variable with cumulative distribution function Fθ , where θ denotes a parameter of the distribution, for example, the mean or the variance. Suppose we are interest in estimating the mean value of Xθ , denoted by µθ , and we perform a computer simulation to sample n independently and identically distributed realization of Xθ , denoted by Xθ (i), 1 ≤ i ≤ n. Then, the sample 1 17 average n 1X X̄θ = Xθ (i) n i=1 is a natural estimator for the mean. The above estimator is however deceivingly simple as it assumes that θ is known, i.e., we know the correct value of θ. Now assume that we do not know the exact value of θ. To formalize, let θ0 denote the true value of θ and suppose that for the simulation we use, due to lack of better knowledge, θ1 . Then, the error we make in estimating µθ0 is µθ0 − X̄θ = µθ0 − µθ + µθ − X̄θ , | {z } | {z } (1) (2) where the first error is due to our lack of knowledge and the second error can be controlled through the sample size. Much research in applied probability and statistics is targeted at reducing the second error. Although the presence of the type (1) error is acknowledged in the literature, how to deal with type (1) error is still an open question. The area of perfromability analysis is devoted to finding models for the lack of knowledge based on entropy. Starting point is expert knowledge about typical behavior of θ and taking the distributional model that maximizes the entropy with respect to the predefined charac- teristics provides a distributional model for θ, i.e., θ is now considered a random variable. Alternatively, a statistical estimator for θ may be available. Then, the sample-distribution of the estimator can be used as distributional model for θ. Consider, for example, the case where θ is estimated through a sample-mean of identical and independently distributed random variables, and assume that the sample-size is sufficient for the strong law of large numbers to hold, then we can model θ as θ(ω) = θ̄ + N (ω), where θ̄ is the sample aver- age, i.e., point estimator, and N (ω) is standard normal random variable. Put differently, statistics allows to build a distributional model for θ. This lecture will elaborate on the above distributional model for parameter insecurity and is aimed at stimulating a discussion on the relation between statistics and applied probability/operations research. This lecture will advocate supporting the analyst by studying the risk incurred by parameter insecurity. Rather than taking an entirely sta- tistical point of view by dismissing ”model building” at all, we want to integrate the data-driven statistical nature of model building into the analytical analysis. We will dis- cuss an analytical framework for doing so that allows for separating (i) the (analytical) analysis of the system from (ii) the statistical model for the parameter insecurity. We present a series of numerical examples illustrating our approach. 2