Bayesian Predictive Modelling: Application to Aircraft Short-Term Conflict Alert System V. Schetinin L. Jakaite W. Krzanowski Computer Science Dept. Computer Science Dept. College of Engineering, University of Bedfordshire University of Bedfordshire Mathematics and Physical Sciences Luton, LU1 3JU, UK Luton, LU1 3JU, UK University of Exeter Exeter, EX4 4QF, UK Abstract models of probabilistic inference and evaluate factors that cause uncertainty in predictions. DTs are hierarchical Bayesian Model Averaging (BMA), computa- structures of splitting and terminal nodes that recursively tionally feasible using Markov Chain Monte split data. The size of a DT model is determined by the Carlo (MCMC), is a well-known method for re- number of its terminal nodes (Chipman, 1998; Denison, liable estimation of predictive distributions. The 2002). use of decision tree (DT) models for the averag- There are two phases during MCMC approximation. At the ing enables experts not only to estimate a pre- first, so-called burn-in, phase the MCMC generates the pa- dictive posterior but also to interpret models of rameters of a DT in order to explore areas of its maximal interest and estimate the importance of predictor likelihood on the given set of observed data. At the second, factors that are assumed to contribute to the pre- so-called post burn-in phase, samples of a DT model are diction. The MCMC method generates parame- collected for averaging according to the Bayesian method- ters of DT models in order to explore their pos- ology. It has also been shown that the most accurate results terior distributions and to draw samples from the of BMA are achieved when prior information on DT mod- models. However, these samples can often over- els is available for the MCMC approximation (Chipman, represent DT models of an excessive size, which 1998; Denison, 2002). in cases of real-world applications affects the re- sults of BMA. When this happens, it is unlikely For interpretation purposes, a single DT which provides the for a DT model that provides Maximum a Poste- Maximum a Posteriori probability (MAP) could be selected riori probability to explain the observed data with from a set of DT models that were accepted during the post high accuracy. We propose a new technology in burn-in phase (Domingos, 1998). The other approach to order to estimate and interpret predictive posteri- finding a single explanatory model is based on the idea of ors. In our experiments with aircraft short-term clustering DT models in a two-dimensional space that is conflict alerts, we show how this technology can represented by size and fitness of DT models (Chipman, be used for analysing uncertainties in detections 1998). of conflicts. According to the Bayesian methodology, samples collected during the post burn-in phase have to be diverse in order to achieve the best accuracy of approximation of predictive 1 INTRODUCTION density. However, in practice the desired diversity of DT models cannot be achieved in reasonable computing time In many cases of engineering applications, such as air- when prior information on the models is absent or incom- traffic control, estimation of uncertainty in predictions is plete (Domingos, 2000; Denison, 2002). of crucial importance, e.g. (Majeske, 2012; Ayusoa, 2012). For such applications, the methodology of Bayesian Model Possible reasons of this are as follows. First, the likelihood Averaging (BMA) has been shown to provide the most ac- distribution could be multimodal, which limits MCMC in curate estimates of uncertainty. The BMA methodology exploring the full posterior distribution (Robert, 2009). has been made computationally feasible with the use of Second, MCMC is limited in exploring all possible DT Markov Chain Monte Carlo (MCMC) approximation, e.g. structures because of the hierarchical structure of DT mod- (Green, 1995; Robert, 2009). els (Denison, 2002). A side effect of this results in sam- pling DT models that contain an excessive number of The use of decision trees (DTs) models within BMA is nodes. Consequently, the ensemble of DT models collected preferable for applications when experts aim to interpret 54 during the MCMC sampling, as well as any single DT mined by the airport radar. Their negative values specify model that is selected for interpretation of the ensemble, the radar position on the lateral plane. The third coordinate will underperform. To mitigate the negative effect, a tech- Z is height in feet. The alert cycles here are marked by the nique has been suggested for selecting a single DT model filled (red) circle, while the normal cycles are shown by the which has been tested in a clinical application (Schetinin, unfilled circles. In the lateral plane X-Y , the aircraft start 2007). their flights at positions indicated here by 1 and 2. In this paper, we explore the potential of the Bayesian ap- This figure shows that after the 18th radar cycle the sys- proach for an air-traffic control problem known as Short- tem detects a series of 5 alarm cycles during which both Term Conflict Alert (STCA) detection, where it is critically aircraft pilots, being warned by the operator, attempt to re- important to analyse uncertainty intervals in the detection solve the conflict. The distance between the aircraft criti- of conflicts. The approach is verified on real data that have cally decreases from 2100 to 1200. The following 5 cycles been made available by the UK National Air Traffic Ser- are false negative errors as the distance keeps decreasing to vices, (NATS, 2002). First we show that Bayesian mod- 900, and the system is expected to continue detecting the elling of the STCA system can explain 89% of decisions alarm. The system triggers the alarm only at the 28th cy- that the STCA system has made on these data. We demon- cle when the distance decreases to a minimum of 500. In strate that the Bayesian approach allows us to estimate un- this case the series of 5 false negative error cycles cannot certainty in detection of conflicts, which is necessary for be explained without analysis of factors of uncertainty. specifying possible areas of improvement of the STCA sys- tem. The use of DT models allows us to estimate the impor- tance of predictor variables in terms of their contributions to the conflict detection. Finally we show how DT mod- els can be used to find conditions under which the STCA system makes false detections. To achieve this goal we pro- pose a technique for selecting a single DT model from an ensemble of models collected during BMA. The rest of the paper is organized as follows. Section 2 in- troduces the STCA problem and describes the data that are used in our experiments. Section 3 briefly introduces the methodology of BMA and MCMC approximation with DT models. The details of the proposed technique and experi- ments are described in Sections 4 and 5. Finally Section 6 concludes the paper. 2 PROBLEM OF SHORT-TERM CONFLICT ALERT STCA systems are used in airports to warn dispatchers when the distance between two aircraft, landing or taking Figure 1: Example of Conflict Prediction Problem off, is critically short in a given alert zone (Prandini, 2000; Brooker 2005). The STCA system is therefore expected to In this paper we aim to model the STCA system in order to detect conflicts as accurately as possible in the presence of find possible solutions to the problem. For the modelling uncertainty in the data that are provided by the airport op- we use primary data about aircraft positions and velocities, eration service. In this context, it is of crucial importance which are received by the system as part of the flight infor- to estimate predictive posterior probability distributions of mation. All flight information is updated each radar cycle, decisions made by the STCA system. The availability of a in our case every 6 seconds. model that can accurately model the detection of conflicts In our research we use these data as follows. The posi- will allow experts to analyse factors of uncertainty in the tions are used for calculating the distances dx , dy , and dz detection of conflicts. between aircraft 1 and 2 along axes X, Y , and height axis The primary information about aircraft movements comes Z, respectively. Velocities Vx,∗ , Vy,∗ and Vz,∗ of the air- from airport radar. Fig. 1 shows the traces of two aircraft in craft are given on axes X, Y , and Z. We assume that the 3-dimension system of coordinates X, Y , and Z. The distance between aircraft 1 and 2 is important information first two coordinates define the position of an aircraft on for detecting conflicts in the airport environment when air- the X-Y lateral plane with a scale factor, s, that is deter- craft change positions in X, Y and Z during landing or 55 taking off. For this reasonqdistance d is calculated in a 3- of MCMC implementation of BMA over DT models. dimensional space as d = d2x + d2y + d2z . We assign here a scale factor s = 1ft, as s has not been specified for the 3.1 MCMC IMPLEMENTATION flight data available for our research. The secondary infor- mation about times T1 and T2 in the lateral plane for the Except for trivial cases the Bayesian methodology of aver- aircraft 1 and 2 could be also taken into account. aging over DTs can be feasibly implemented with MCMC approximation. For the approximation, the parameters, θ, The above assumptions allow us to generate the 12 input of a DT candidate are drawn from the given proposal dis- variables listed in Table 1. Here, negative values reflect the tributions. A candidate is accepted or rejected according to positions of aircraft in the radar coordinate system. the Bayes rule calculated on the given data D. For the m- In our research we use operational data about traces of air- dimensional input vector x, data D and parameters θ, the craft pairs. A trace is represented by a sequence of radar predictive posterior distribution p(y|x, D), y ∈ {1, . . . , C}, cycles as described above. Each cycle in the sequence rep- is resents the aircraft movements and is labelled as normal or Z alert. We aim to use these data for modelling the STCA p(y|x, D) = p(y|x, θ, D)p(θ|D)dθ system within the Bayesian framework in order to quanti- N (1) tatively estimate the uncertainty in detection of conflicts. 1 X ≈ p(y|x, θ(i) , D), We assume that this uncertainty is dependent first on the N i=1 flight parameters, such as aircraft distances and velocities, and second on the accuracy of the radar data. The use of where p(y|x, θ, D) is the posterior distribution given a DT models will allow us first to estimate the importance of model with parameters θ and data D; p(θ|D) is the pos- predictor variables and second to specify conditions under terior distribution of parameters θ conditioned on data D; which the system makes false decisions of conflicts. For N is the number of samples taken from the posterior distri- interpretation of the results of BMA, we will finally select bution, and C is the number of classes. a single DT model in order to find new insights into false In practice, DT models are learnt from data and so their detections. dimensionality (or number of nodes) is variable. The Re- versible Jump (RJ) extension of MCMC makes possible the Table 1: Predictor Variables approximation over such models (Green, 1995). Given pri- VARIABLE NOTATION MIN MAX ors and a sufficient number of samples, the RJ MCMC tech- nique explores the posterior distribution and takes samples x1 dx -48.11 51.70 of model parameters. x2 dy -52.93 35.78 The exploration of DT models of variable size has been x3 dz -10297 8760 efficiently made by using the following moves (Denison, x4 d 2.40 10297.06 2002): x5 Vx,1 -691 584 x6 Vy,1 -806 473 Birth moves randomly split the data points falling in one of x7 Vz,1 -83.10 96.66 the terminal nodes by a new splitting node with a variable x8 Vx,2 -444 599 and rule drawn from the corresponding priors. x9 Vy,2 -527 426 Death moves randomly pick a splitting node with two ter- x10 Vz,2 -95.05 11.53 minal nodes and assign it as a single terminal with the x11 T1 0 9 united data points. x12 T2 0 9 Change-split moves randomly pick a splitting node and as- sign it a new splitting variable and rule drawn from the cor- responding priors. 3 BAYESIAN MODEL AVERAGING Change-rule moves randomly pick a splitting node and as- OVER DECISION TREES sign it a new rule drawn from a given prior. The first two moves lead to a change in the dimensional- DTs are known as hierarchical models consisting of split- ity of parameters. The other moves explore the distribution ting and terminal nodes. The DT models are said to be within the current dimensionality. In particular, the change- binary if the splitting nodes divide data points into two dis- split move makes “large” jumps which potentially increase joint subsets. The terminal node assigns a data point to one the chance of sampling from a maximal posterior. By con- of the possible classes, the probability of which is dominant trast, the change-rule move makes “local” jumps in order (Breiman, 1984). This section is mainly focused on details to explore the details of an area of interest. 56 As the birth and death moves change the dimensionality, range of values within which the size of a new data parti- the Bayesian rule includes a ratio R to achieve the condi- tion will exceed 2pmin , where pmin is the minimal number tion for reversibility of Markov Chain. For the birth moves, of data points allowed in a partition. This prior is adapted R is written as follows: to the range of a data partition. The new splitting threshold qj 0 proposed for variable j and partition i is drawn from a q(θ|θ0)p(θ0) R= , (2) uniform distribution: qj 0 ∼ U (xi,j i,j min , xmax ). q(θ0|θ)p(θ) When the change move is applied to a node that is close where q(θ|θ0) and q(θ0|θ) are the proposal distributions, θ0 to the DT root, distributions of data points in its terminal and θ are (k + 1) and k-dimensional vectors of DT param- nodes can be greatly changed, and one or more terminal eters, respectively, and p(θ) and p(θ0) are the probabilities nodes can contain fewer data points than pmin . If there of the DT with parameters θ and θ0, respectively. is one such node, this node is swept from the DT and the The above probability p(θ) is defined by a DT structure as move is counted as a death move. In cases when there is follows (Denison, 2002): more than one such node, the move is deemed unavailable. k−1 ! Y 1 1 k 1 4 SELECTION OF A SINGLE DECISION p(θ) = var , (3) i=1 N (si ) m Sk K TREE MODEL where N (svar i ) is the number of possible values of si var As discussed in the Introduction, experts need to interpret that can be assigned as a new splitting rule, Sk is the num- an ensemble of DT models collected during MCMC sam- ber of possible structures of a DT with k terminal nodes, pling as a single DT. Although such a model will likely ex- and K is the maximal number of terminal nodes. plain the observed data less accurately, experts will have an The proposal distribution is defined as follows: opportunity to look at new insights into data. For selection of a single DT model from an ensemble, the MAP and the dk+1 Maximum a Posterior Weight (MAPW) techniques have q(θ|θ0) = , (4) DQ1 been proposed as described in (Domingos, 1998; Chip- man, 1998). A drawback of these techniques is that a where DQ1 = DQ + 1 is the number of splitting nodes DT model can be selected from any oversized DT mod- whose both branches are terminal nodes. els which are present in the ensemble and as a result this The MCMC sampler will accept birth and death moves model will under-perform. The idea of a new approach is with rates Rb and Rd as follows: based on quantitative estimates of classification confidence as described in (Krzanowski et al, 2006). Classifiers that dk+1 k Sk were included in the ensemble produce different outputs Rb = , (5) bk DQ1 Sk+1 for a given input, and each of them is considered as having voted for positive or negative output. The counts over all bk DQ Sk Rd = . (6) votes will therefore reflect the difficulty (or confidence) of dk−1 (k − 1) Sk−1 assigning a given input to a class of interest. If the prior on the number of splitting nodes is given prop- Within this approach, we can define an ensemble of N DT erly, most samples are expected to be drawn from the pos- models and then count the number Ni of the classifiers that terior that is related to areas of interests. If such a prior is assign a given input to classes i, i = 1, . . . , C. Therefore unavailable, a DT model will grow excessively and most for a given class i and a given input, the consistency of the of the samples will be drawn from posterior distributions ensemble is calculated as a ratio γ = N N . Its value has a i that are calculated for oversized DT models. As a result, maximum of 1.0, when all the classifiers assign a given in- the estimates of the predictive distribution will be biased put to one class. The minimum value of confidence is 1/C, (Denison, 2002). when the classifiers assign the input to the all C classes with an equal probability. So for a given input the classifi- 3.2 SWEEPING STRATEGY OF MCMC cation confidence of the ensemble is estimated by the ratio γ whose value is proportional to the accuracy of classifica- In practice, priors on DT structures are often unavail- tion. able, and the MCMC sampler cannot efficiently control DT We can then define a threshold confidence ratio γ0 : 1/C ≤ structures, which leads to poor mixing. However, the DT γ0 < 1, for which the cost of misclassifications is consid- structure can be better controlled with a sweeping strategy ered acceptably small on the given labelled data. The out- of the MCMC approximation as proposed in (Schetinin, come of the ensemble is said to be confident if γ ≥ γ0 . 2007). The main idea behind this strategy is to assign the prior probability of splitting DT nodes dependent on the Having counted the number of confident and correct out- 57 comes on the observed labelled data set, we can select a off at the Heathrow, June 1998. The traces were selected single DT that covers the maximal number of the labelled with high alertness. The number of cycles in a trace was de- data instances that were classified as confident and correct pendent on the aircraft velocities, and their average number while the number of misclassifications of the remaining ex- was around 40. Each trace was split into two parts, training amples is kept minimal. Then the DTs with a maximal cov- and testing, to evaluate the performance within the repeated erage are selected from the ensemble, and finally a single random sub-sampling validation over 5 runs. DT model that has a minimal number of splitting nodes is chosen. The main steps of the selection technique are as follows. 5.2 MCMC IMPLEMENTATION 1. Given an ensemble of DT models, select a set of DT The BMA was run with a uniform prior on DT models as models, S1, that cover a maximal number of the data there was no information about possible DT structures. The instances classified as confident and correct with a minimal number pmin was set equal to 5. The proposal given confidence level γ0 . probabilities for the death, birth, change-split and change- rules were set to 0.1, 0.1, 0.2, and 0.6, respectively. The 2. Find the instances that were correctly classified by the numbers of burn-in and post burn-in samples were set to DT ensemble and denote these instances as D1. 100,000 and 10,000, respectively. The sampling rate was 3. Among the set S1 find DT models that provide a min- set equal to 7. The proposal variance was set to 4.0 in or- imal misclassification rate on the data D1. Denote the der to achieve an acceptance rate of updating the Markov found set as S2. chain around 0.52, which indicates an efficient MCMC im- plementation. With these settings, the BMA performance 4. Among the set S2, find DT models whose size is min- within the random sub-sampling validation is 88.6 ± 1.3%. imal. A set of such DT models, S3, includes at least Fig. 2 depicts samples of log likelihood values (upper one DT model. plots), the numbers of DT nodes (middle plots) and the 5. Randomly select a DT model from the set S3. distributions of DT nodes for the burn-in (left) and post burn-in (right side) phases. The above procedure finds a single DT model of interest that covers a maximal number of the data instances clas- sified as confident and correct with a given confident level γ0 . The resultant model is selected to be of minimal size, which reduces the risk of overfitting unlike existing tech- niques. 5 EXPERIMENTS In this section we describe experimental results obtained with the proposed BMA technology on real STCA data. First we show that using the BMA technology we can achieve an accuracy of modelling the STCA system around 89%. Second we estimate the importance of predictor vari- ables that are used for modelling the system. Third we demonstrate the proposed technique for selecting a single DT model that is required for interpenetration and finding conditions under which the STCA system can improve ac- curacy of detections. Finally we show an example of esti- Figure 2: Samples of Log Likelihood and DT Sizes mating uncertainties in detection of conflicts, which allows us to demonstrate the ability of the proposed technology to identify areas of possible improvement of the STCA sys- tem. We can see that in the burn-in phase the Markov chain started with log likelihood value around 1000 converges to 5.1 STCA DATA the stationary value that oscillates around 175. In the post burn-in phase the log likelihood continues to oscillate be- In our experiments we used 2,526 radar cycles that repre- tween 200 and 150. The lower plots show that the average sent traces of 66 aircraft pairs that were landing or taking number of DT nodes was around 46. 58 5.3 FEATURE IMPORTANCE During the post burn-in phase DT parameters are changed within the given priors on the proposal distribution, and as a result the accepted DT models include different predictor variables. The frequencies of use of these variables reflect the information about their importance - we assume that a variable with a greater frequency makes more important contribution to the classification. The frequencies were calculated within the random sub- sampling validation and are shown for all 12 variables in Fig. 3. Table 2 lists these variables in the order of their im- portance. In this table we see that the three most important features are x8 , the speed of the second aircraft on the X- axis, x1 , the distance between aircraft pair on the X-axis, and x9 , the speed of the second aircraft on the Y -axis. Figure 3: Feature Importance within Random Sub- Sampling Validation By contrast, the variables x11 and x12 , which give the times T1 and T2 since the last correlated plot in the lateral plane for the aircraft, are used with a much lower frequency, and terminals of interest can be converted into a set of n rules we conclude that they make the smallest contribution. in the form if xi ≥ qi then . . . , i = 1, . . . , n, which is tractable by experts. Table 2: Importance of Variables in Terms of Frequency of The desired model can be found by applying the technique Use described in Section 4 as a Sure Correct (SC) DT model. VARIABLE FREQUENCY This model is compared with two other DT models that were selected by the existing techniques, MAP and MAPW, x8 0.168 discussed in Section 4. The comparison is made in terms x1 0.137 of misclassification rate within the random sub-sampling x9 0.120 validation and shown in Fig. 4. We see that the SC DT x6 0.110 model more often outperforms the other two models. The x4 0.095 average accuracy is 87.6. x5 0.090 x3 0.078 x2 0.061 x10 0.050 x7 0.042 x11 0.001 x12 0.008 5.4 SINGLE DECISION TREE MODELS The total number of DT models that were collected dur- ing the MCMC post burn-in phase was 10,000. In theory, Bayesian averaging over an ensemble of models should outperform any single model that is taken for interpreta- tion purposes. In our case, we expect to find a single DT model whose performance is maximally close to that ob- tained with the ensemble average. Such a DT model is re- quired for interpretation purposes and for specifying con- ditions under which the STCA system makes wrong deci- sions. Figure 4: Test errors of SC, MAP, and MAPW Decistion Having identified mistaken decisions made by the system Tree Models within Random Sub-Sampling Validation on the given data, we can use the selected DT model to specify terminal nodes into which these decisions fall. The For comparison we also used the CART technique and 59 found that the average accuracy was 87.0, which is compet- a Bayesian framework. The use of DT models was intro- itive with the above SC DT model. The CART technique duced in order to provide a possible interpretation of fac- has been run with the Gini diversity index as splitting cri- tors that can affect the reliability of STCA decisions. In terion, using the same number (pmin = 5) of data points these experiments, no prior information about possible DT allowed in terminals. structures was available. A single DT model was selected from the ensemble of DT 5.5 ESTIMATING UNCERTAINTIES IN models that were collected during MCMC approximation. DETECTION OF CONFLICTS A DT model can be selected as one providing the Max- imum a Posteriori probability. However, we have shown Fig. 5 shows the uncertainty intervals estimated by the pro- that such a DT model tends to be over-sized and so can posed BMA technology for the aircraft pair whose traces underperform. A new technique that is based on estimat- are plotted in Fig. 1. The upper plot shows the distance ing the consistency of DT models included in an ensemble d over radar cycles. The alert cycles here are marked by was implemented and tested on the STCA data. The ex- the plus sign. We see that the STCA system missed de- periments show that this approach outperforms the existing tection of 5 alerts between 23d and 27th cycles. Further- techniques in terms of predictive accuracy. more, the aircraft positions between 38th and 40th cycles become closer and remain within the distance that triggered Thus we can conclude that the proposed Bayesian technol- the alert at the 18th cycle. This probably means that the ogy can be used to find possible ways of improving accu- system missed detection of new alerts. racy of STCA detection. In a more general context, the proposed technology is capable of providing experts with the full probabilistic information that is required for inter- pretation of decision making where safety is of crucial im- portance. Figure 5: Esimates of Uncertainty Intervals for an Aircraft Pair The lower plot shows the estimates of uncertainties in de- cisions made by the proposed Bayesian technology. Boxes here show the summary of predictive posterior probability distributions of alerts. The median probability values ex- ceed the threshold 0.5 between the 16th and 24th cycles. The following 3 cycles are detected with large uncertainty, which indicates a high risk of making wrong decisions. Be- tween the 33rd and 37th cycles the aircraft move away from each other and the probability of conflict decreases. How- ever, between the 38th and 40th cycles they move closer again and we can observe that the uncertainty intervals be- come larger. This example demonstrates the ability of the proposed technology to provide essential information about risks of making wrong decisions. 6 CONCLUSION The MCMC technique proposed for Bayesian averaging over DT models was applied to the STCA problem. In this work we aimed at modelling the STCA system within 60 Acknowledgements C. Robert, and G. Casella (2009). Introducing Monte Carlo methods with R. Springer. The authors are grateful to the anonymous reviewers for useful and constructive comments on the paper. This re- V. Schetinin et al. (2007). Confident Interpretation of search was partly supported by the Engineering and Physi- Bayesian decision trees for clinical applications. IEEE cal Sciences Research Council (EPSRC), GR/R24357/01. Transaction on IT in Biomedicine 11(3) 312-319. References A. Ayusoa, L. Escuderoa, and F. Martn-Campo (2012) A mixed 0-1 nonlinear optimization model and algorith- mic approach for the collision avoidance in ATM: velocity changes through a time horizon. Computers and Opera- tions Research 39(12) 3136-3146. L. Brieman, J. Friedman, R. Olshen, and C. Stone (1984). Classification and Regression Trees, Belmont, CA, Wadsworth. P. Brooker (2005). Airborne collision avoidance systems and air traffic management safety. Journal of Navigation 1 1-16. H. Chipman, E. George, and R. McCullock (1998). Bayesian CART model search, Journal of American Statis- tics 93 935-960. H. Chipman, E. George, and R. McCulloch (1998). Making sense of a forest of trees. In S. Weisberg, (ed.), Symposium on the Interface. Interface Foundation of North America. D. Denison, C. Holmes, B. Malick, and A. Smith (2002). Bayesian Methods for Nonlinear Classification and Re- gression. Wiley. P. Domingos (1998). Knowledge discovery via multiple models. Intelligent Data Analysis 2 187-202. P. Domingos (2000). Bayesian Averaging of classifiers and the overfitting problem, International Conference on Machine Learning, 223-230. Stanford, Morgan Kaufmann. P. Green (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination, Biometrika 82 711-732. W. Krzanowski, et al. (2006). Confidence in classification: A Bayesian approach. Journal of Classification 23(2) 199- 220. K. Majeske, and T. Lauer (2012). Optimizing airline pas- senger prescreening systems with Bayesian decision mod- els, Computers and Operations Research 39(8) 1827-1836. NATS. London Area Control Centre (2002). Short Term Conflict Alerts. Controller briefing note. National Air Traffic Services, London M. Prandini, J. Hu, J. Lygeros, and S. Sastry (2000). A probabilistic framework for aircraft conict detection. IEEE Transactions on Intelligent Transportation Systems 1(4) 199-220. 61