Bayesian Predictive Modelling: Application to Aircraft Short-Term Conflict
                                  Alert System


               V. Schetinin                             L. Jakaite                          W. Krzanowski
         Computer Science Dept.                   Computer Science Dept.                 College of Engineering,
        University of Bedfordshire               University of Bedfordshire         Mathematics and Physical Sciences
          Luton, LU1 3JU, UK                       Luton, LU1 3JU, UK                      University of Exeter
                                                                                          Exeter, EX4 4QF, UK

                        Abstract                                models of probabilistic inference and evaluate factors that
                                                                cause uncertainty in predictions. DTs are hierarchical
     Bayesian Model Averaging (BMA), computa-                   structures of splitting and terminal nodes that recursively
     tionally feasible using Markov Chain Monte                 split data. The size of a DT model is determined by the
     Carlo (MCMC), is a well-known method for re-               number of its terminal nodes (Chipman, 1998; Denison,
     liable estimation of predictive distributions. The         2002).
     use of decision tree (DT) models for the averag-           There are two phases during MCMC approximation. At the
     ing enables experts not only to estimate a pre-            first, so-called burn-in, phase the MCMC generates the pa-
     dictive posterior but also to interpret models of          rameters of a DT in order to explore areas of its maximal
     interest and estimate the importance of predictor          likelihood on the given set of observed data. At the second,
     factors that are assumed to contribute to the pre-         so-called post burn-in phase, samples of a DT model are
     diction. The MCMC method generates parame-                 collected for averaging according to the Bayesian method-
     ters of DT models in order to explore their pos-           ology. It has also been shown that the most accurate results
     terior distributions and to draw samples from the          of BMA are achieved when prior information on DT mod-
     models. However, these samples can often over-             els is available for the MCMC approximation (Chipman,
     represent DT models of an excessive size, which            1998; Denison, 2002).
     in cases of real-world applications affects the re-
     sults of BMA. When this happens, it is unlikely            For interpretation purposes, a single DT which provides the
     for a DT model that provides Maximum a Poste-              Maximum a Posteriori probability (MAP) could be selected
     riori probability to explain the observed data with        from a set of DT models that were accepted during the post
     high accuracy. We propose a new technology in              burn-in phase (Domingos, 1998). The other approach to
     order to estimate and interpret predictive posteri-        finding a single explanatory model is based on the idea of
     ors. In our experiments with aircraft short-term           clustering DT models in a two-dimensional space that is
     conflict alerts, we show how this technology can           represented by size and fitness of DT models (Chipman,
     be used for analysing uncertainties in detections          1998).
     of conflicts.                                              According to the Bayesian methodology, samples collected
                                                                during the post burn-in phase have to be diverse in order to
                                                                achieve the best accuracy of approximation of predictive
1    INTRODUCTION                                               density. However, in practice the desired diversity of DT
                                                                models cannot be achieved in reasonable computing time
In many cases of engineering applications, such as air-
                                                                when prior information on the models is absent or incom-
traffic control, estimation of uncertainty in predictions is
                                                                plete (Domingos, 2000; Denison, 2002).
of crucial importance, e.g. (Majeske, 2012; Ayusoa, 2012).
For such applications, the methodology of Bayesian Model        Possible reasons of this are as follows. First, the likelihood
Averaging (BMA) has been shown to provide the most ac-          distribution could be multimodal, which limits MCMC in
curate estimates of uncertainty. The BMA methodology            exploring the full posterior distribution (Robert, 2009).
has been made computationally feasible with the use of          Second, MCMC is limited in exploring all possible DT
Markov Chain Monte Carlo (MCMC) approximation, e.g.             structures because of the hierarchical structure of DT mod-
(Green, 1995; Robert, 2009).                                    els (Denison, 2002). A side effect of this results in sam-
                                                                pling DT models that contain an excessive number of
The use of decision trees (DTs) models within BMA is
                                                                nodes. Consequently, the ensemble of DT models collected
preferable for applications when experts aim to interpret


                                                           54
during the MCMC sampling, as well as any single DT                mined by the airport radar. Their negative values specify
model that is selected for interpretation of the ensemble,        the radar position on the lateral plane. The third coordinate
will underperform. To mitigate the negative effect, a tech-       Z is height in feet. The alert cycles here are marked by the
nique has been suggested for selecting a single DT model          filled (red) circle, while the normal cycles are shown by the
which has been tested in a clinical application (Schetinin,       unfilled circles. In the lateral plane X-Y , the aircraft start
2007).                                                            their flights at positions indicated here by 1 and 2.
In this paper, we explore the potential of the Bayesian ap-       This figure shows that after the 18th radar cycle the sys-
proach for an air-traffic control problem known as Short-         tem detects a series of 5 alarm cycles during which both
Term Conflict Alert (STCA) detection, where it is critically      aircraft pilots, being warned by the operator, attempt to re-
important to analyse uncertainty intervals in the detection       solve the conflict. The distance between the aircraft criti-
of conflicts. The approach is verified on real data that have     cally decreases from 2100 to 1200. The following 5 cycles
been made available by the UK National Air Traffic Ser-           are false negative errors as the distance keeps decreasing to
vices, (NATS, 2002). First we show that Bayesian mod-             900, and the system is expected to continue detecting the
elling of the STCA system can explain 89% of decisions            alarm. The system triggers the alarm only at the 28th cy-
that the STCA system has made on these data. We demon-            cle when the distance decreases to a minimum of 500. In
strate that the Bayesian approach allows us to estimate un-       this case the series of 5 false negative error cycles cannot
certainty in detection of conflicts, which is necessary for       be explained without analysis of factors of uncertainty.
specifying possible areas of improvement of the STCA sys-
tem. The use of DT models allows us to estimate the impor-
tance of predictor variables in terms of their contributions
to the conflict detection. Finally we show how DT mod-
els can be used to find conditions under which the STCA
system makes false detections. To achieve this goal we pro-
pose a technique for selecting a single DT model from an
ensemble of models collected during BMA.
The rest of the paper is organized as follows. Section 2 in-
troduces the STCA problem and describes the data that are
used in our experiments. Section 3 briefly introduces the
methodology of BMA and MCMC approximation with DT
models. The details of the proposed technique and experi-
ments are described in Sections 4 and 5. Finally Section 6
concludes the paper.


2   PROBLEM OF SHORT-TERM
    CONFLICT ALERT

STCA systems are used in airports to warn dispatchers
when the distance between two aircraft, landing or taking             Figure 1: Example of Conflict Prediction Problem
off, is critically short in a given alert zone (Prandini, 2000;
Brooker 2005). The STCA system is therefore expected to           In this paper we aim to model the STCA system in order to
detect conflicts as accurately as possible in the presence of     find possible solutions to the problem. For the modelling
uncertainty in the data that are provided by the airport op-      we use primary data about aircraft positions and velocities,
eration service. In this context, it is of crucial importance     which are received by the system as part of the flight infor-
to estimate predictive posterior probability distributions of     mation. All flight information is updated each radar cycle,
decisions made by the STCA system. The availability of a          in our case every 6 seconds.
model that can accurately model the detection of conflicts
                                                                  In our research we use these data as follows. The posi-
will allow experts to analyse factors of uncertainty in the
                                                                  tions are used for calculating the distances dx , dy , and dz
detection of conflicts.
                                                                  between aircraft 1 and 2 along axes X, Y , and height axis
The primary information about aircraft movements comes            Z, respectively. Velocities Vx,∗ , Vy,∗ and Vz,∗ of the air-
from airport radar. Fig. 1 shows the traces of two aircraft in    craft are given on axes X, Y , and Z. We assume that
the 3-dimension system of coordinates X, Y , and Z. The           distance between aircraft 1 and 2 is important information
first two coordinates define the position of an aircraft on       for detecting conflicts in the airport environment when air-
the X-Y lateral plane with a scale factor, s, that is deter-      craft change positions in X, Y and Z during landing or


                                                            55
taking off. For this reasonqdistance d is calculated in a 3-     of MCMC implementation of BMA over DT models.
dimensional space as d = d2x + d2y + d2z . We assign here
a scale factor s = 1ft, as s has not been specified for the      3.1   MCMC IMPLEMENTATION
flight data available for our research. The secondary infor-
mation about times T1 and T2 in the lateral plane for the        Except for trivial cases the Bayesian methodology of aver-
aircraft 1 and 2 could be also taken into account.               aging over DTs can be feasibly implemented with MCMC
                                                                 approximation. For the approximation, the parameters, θ,
The above assumptions allow us to generate the 12 input          of a DT candidate are drawn from the given proposal dis-
variables listed in Table 1. Here, negative values reflect the   tributions. A candidate is accepted or rejected according to
positions of aircraft in the radar coordinate system.            the Bayes rule calculated on the given data D. For the m-
In our research we use operational data about traces of air-     dimensional input vector x, data D and parameters θ, the
craft pairs. A trace is represented by a sequence of radar       predictive posterior distribution p(y|x, D), y ∈ {1, . . . , C},
cycles as described above. Each cycle in the sequence rep-       is
resents the aircraft movements and is labelled as normal or
                                                                                          Z
alert. We aim to use these data for modelling the STCA                       p(y|x, D) = p(y|x, θ, D)p(θ|D)dθ
system within the Bayesian framework in order to quanti-                                       N                             (1)
tatively estimate the uncertainty in detection of conflicts.                               1 X
                                                                                       ≈         p(y|x, θ(i) , D),
We assume that this uncertainty is dependent first on the                                  N i=1
flight parameters, such as aircraft distances and velocities,
and second on the accuracy of the radar data. The use of         where p(y|x, θ, D) is the posterior distribution given a
DT models will allow us first to estimate the importance of      model with parameters θ and data D; p(θ|D) is the pos-
predictor variables and second to specify conditions under       terior distribution of parameters θ conditioned on data D;
which the system makes false decisions of conflicts. For         N is the number of samples taken from the posterior distri-
interpretation of the results of BMA, we will finally select     bution, and C is the number of classes.
a single DT model in order to find new insights into false
                                                                 In practice, DT models are learnt from data and so their
detections.
                                                                 dimensionality (or number of nodes) is variable. The Re-
                                                                 versible Jump (RJ) extension of MCMC makes possible the
           Table 1: Predictor Variables                          approximation over such models (Green, 1995). Given pri-
    VARIABLE NOTATION           MIN               MAX            ors and a sufficient number of samples, the RJ MCMC tech-
                                                                 nique explores the posterior distribution and takes samples
    x1              dx                -48.11       51.70         of model parameters.
    x2              dy                -52.93       35.78         The exploration of DT models of variable size has been
    x3              dz               -10297         8760         efficiently made by using the following moves (Denison,
    x4              d                   2.40    10297.06         2002):
    x5              Vx,1                -691         584
    x6              Vy,1                -806         473         Birth moves randomly split the data points falling in one of
    x7              Vz,1              -83.10       96.66         the terminal nodes by a new splitting node with a variable
    x8              Vx,2                -444         599         and rule drawn from the corresponding priors.
    x9              Vy,2                -527         426         Death moves randomly pick a splitting node with two ter-
    x10             Vz,2              -95.05       11.53         minal nodes and assign it as a single terminal with the
    x11             T1                     0           9         united data points.
    x12             T2                     0           9
                                                                 Change-split moves randomly pick a splitting node and as-
                                                                 sign it a new splitting variable and rule drawn from the cor-
                                                                 responding priors.

3   BAYESIAN MODEL AVERAGING                                     Change-rule moves randomly pick a splitting node and as-
    OVER DECISION TREES                                          sign it a new rule drawn from a given prior.
                                                                 The first two moves lead to a change in the dimensional-
DTs are known as hierarchical models consisting of split-        ity of parameters. The other moves explore the distribution
ting and terminal nodes. The DT models are said to be            within the current dimensionality. In particular, the change-
binary if the splitting nodes divide data points into two dis-   split move makes “large” jumps which potentially increase
joint subsets. The terminal node assigns a data point to one     the chance of sampling from a maximal posterior. By con-
of the possible classes, the probability of which is dominant    trast, the change-rule move makes “local” jumps in order
(Breiman, 1984). This section is mainly focused on details       to explore the details of an area of interest.


                                                           56
As the birth and death moves change the dimensionality,            range of values within which the size of a new data parti-
the Bayesian rule includes a ratio R to achieve the condi-         tion will exceed 2pmin , where pmin is the minimal number
tion for reversibility of Markov Chain. For the birth moves,       of data points allowed in a partition. This prior is adapted
R is written as follows:                                           to the range of a data partition. The new splitting threshold
                                                                   qj 0 proposed for variable j and partition i is drawn from a
                           q(θ|θ0)p(θ0)
                     R=                 ,                   (2)    uniform distribution: qj 0 ∼ U (xi,j      i,j
                                                                                                      min , xmax ).
                           q(θ0|θ)p(θ)
                                                                   When the change move is applied to a node that is close
where q(θ|θ0) and q(θ0|θ) are the proposal distributions, θ0       to the DT root, distributions of data points in its terminal
and θ are (k + 1) and k-dimensional vectors of DT param-           nodes can be greatly changed, and one or more terminal
eters, respectively, and p(θ) and p(θ0) are the probabilities      nodes can contain fewer data points than pmin . If there
of the DT with parameters θ and θ0, respectively.                  is one such node, this node is swept from the DT and the
The above probability p(θ) is defined by a DT structure as         move is counted as a death move. In cases when there is
follows (Denison, 2002):                                           more than one such node, the move is deemed unavailable.

                     k−1
                                     !
                     Y       1     1    k 1                        4   SELECTION OF A SINGLE DECISION
           p(θ) =             var            ,         (3)
                     i=1
                         N (si ) m Sk K                                TREE MODEL
where N (svar
            i ) is the number of possible values of si
                                                          var
                                                                   As discussed in the Introduction, experts need to interpret
that can be assigned as a new splitting rule, Sk is the num-       an ensemble of DT models collected during MCMC sam-
ber of possible structures of a DT with k terminal nodes,          pling as a single DT. Although such a model will likely ex-
and K is the maximal number of terminal nodes.                     plain the observed data less accurately, experts will have an
The proposal distribution is defined as follows:                   opportunity to look at new insights into data. For selection
                                                                   of a single DT model from an ensemble, the MAP and the
                                  dk+1                             Maximum a Posterior Weight (MAPW) techniques have
                      q(θ|θ0) =        ,                    (4)
                                  DQ1                              been proposed as described in (Domingos, 1998; Chip-
                                                                   man, 1998). A drawback of these techniques is that a
where DQ1 = DQ + 1 is the number of splitting nodes                DT model can be selected from any oversized DT mod-
whose both branches are terminal nodes.                            els which are present in the ensemble and as a result this
The MCMC sampler will accept birth and death moves                 model will under-perform. The idea of a new approach is
with rates Rb and Rd as follows:                                   based on quantitative estimates of classification confidence
                                                                   as described in (Krzanowski et al, 2006). Classifiers that
                          dk+1 k Sk                                were included in the ensemble produce different outputs
                   Rb =                ,                    (5)
                           bk DQ1 Sk+1                             for a given input, and each of them is considered as having
                                                                   voted for positive or negative output. The counts over all
                          bk   DQ     Sk
                 Rd =                     .                 (6)    votes will therefore reflect the difficulty (or confidence) of
                        dk−1 (k − 1) Sk−1                          assigning a given input to a class of interest.
If the prior on the number of splitting nodes is given prop-       Within this approach, we can define an ensemble of N DT
erly, most samples are expected to be drawn from the pos-          models and then count the number Ni of the classifiers that
terior that is related to areas of interests. If such a prior is   assign a given input to classes i, i = 1, . . . , C. Therefore
unavailable, a DT model will grow excessively and most             for a given class i and a given input, the consistency of the
of the samples will be drawn from posterior distributions          ensemble is calculated as a ratio γ = N   N . Its value has a
                                                                                                              i


that are calculated for oversized DT models. As a result,          maximum of 1.0, when all the classifiers assign a given in-
the estimates of the predictive distribution will be biased        put to one class. The minimum value of confidence is 1/C,
(Denison, 2002).                                                   when the classifiers assign the input to the all C classes
                                                                   with an equal probability. So for a given input the classifi-
3.2   SWEEPING STRATEGY OF MCMC                                    cation confidence of the ensemble is estimated by the ratio
                                                                   γ whose value is proportional to the accuracy of classifica-
In practice, priors on DT structures are often unavail-            tion.
able, and the MCMC sampler cannot efficiently control DT
                                                                   We can then define a threshold confidence ratio γ0 : 1/C ≤
structures, which leads to poor mixing. However, the DT
                                                                   γ0 < 1, for which the cost of misclassifications is consid-
structure can be better controlled with a sweeping strategy
                                                                   ered acceptably small on the given labelled data. The out-
of the MCMC approximation as proposed in (Schetinin,
                                                                   come of the ensemble is said to be confident if γ ≥ γ0 .
2007). The main idea behind this strategy is to assign the
prior probability of splitting DT nodes dependent on the           Having counted the number of confident and correct out-


                                                             57
comes on the observed labelled data set, we can select a          off at the Heathrow, June 1998. The traces were selected
single DT that covers the maximal number of the labelled          with high alertness. The number of cycles in a trace was de-
data instances that were classified as confident and correct      pendent on the aircraft velocities, and their average number
while the number of misclassifications of the remaining ex-       was around 40. Each trace was split into two parts, training
amples is kept minimal. Then the DTs with a maximal cov-          and testing, to evaluate the performance within the repeated
erage are selected from the ensemble, and finally a single        random sub-sampling validation over 5 runs.
DT model that has a minimal number of splitting nodes is
chosen.
The main steps of the selection technique are as follows.         5.2    MCMC IMPLEMENTATION

    1. Given an ensemble of DT models, select a set of DT         The BMA was run with a uniform prior on DT models as
       models, S1, that cover a maximal number of the data        there was no information about possible DT structures. The
       instances classified as confident and correct with a       minimal number pmin was set equal to 5. The proposal
       given confidence level γ0 .                                probabilities for the death, birth, change-split and change-
                                                                  rules were set to 0.1, 0.1, 0.2, and 0.6, respectively. The
    2. Find the instances that were correctly classified by the   numbers of burn-in and post burn-in samples were set to
       DT ensemble and denote these instances as D1.              100,000 and 10,000, respectively. The sampling rate was
    3. Among the set S1 find DT models that provide a min-        set equal to 7. The proposal variance was set to 4.0 in or-
       imal misclassification rate on the data D1. Denote the     der to achieve an acceptance rate of updating the Markov
       found set as S2.                                           chain around 0.52, which indicates an efficient MCMC im-
                                                                  plementation. With these settings, the BMA performance
    4. Among the set S2, find DT models whose size is min-        within the random sub-sampling validation is 88.6 ± 1.3%.
       imal. A set of such DT models, S3, includes at least
                                                                  Fig. 2 depicts samples of log likelihood values (upper
       one DT model.
                                                                  plots), the numbers of DT nodes (middle plots) and the
    5. Randomly select a DT model from the set S3.                distributions of DT nodes for the burn-in (left) and post
                                                                  burn-in (right side) phases.
The above procedure finds a single DT model of interest
that covers a maximal number of the data instances clas-
sified as confident and correct with a given confident level
γ0 . The resultant model is selected to be of minimal size,
which reduces the risk of overfitting unlike existing tech-
niques.

5     EXPERIMENTS

In this section we describe experimental results obtained
with the proposed BMA technology on real STCA data.
First we show that using the BMA technology we can
achieve an accuracy of modelling the STCA system around
89%. Second we estimate the importance of predictor vari-
ables that are used for modelling the system. Third we
demonstrate the proposed technique for selecting a single
DT model that is required for interpenetration and finding
conditions under which the STCA system can improve ac-
curacy of detections. Finally we show an example of esti-               Figure 2: Samples of Log Likelihood and DT Sizes
mating uncertainties in detection of conflicts, which allows
us to demonstrate the ability of the proposed technology to
identify areas of possible improvement of the STCA sys-
tem.                                                              We can see that in the burn-in phase the Markov chain
                                                                  started with log likelihood value around 1000 converges to
5.1    STCA DATA                                                  the stationary value that oscillates around 175. In the post
                                                                  burn-in phase the log likelihood continues to oscillate be-
In our experiments we used 2,526 radar cycles that repre-         tween 200 and 150. The lower plots show that the average
sent traces of 66 aircraft pairs that were landing or taking      number of DT nodes was around 46.


                                                            58
5.3   FEATURE IMPORTANCE

During the post burn-in phase DT parameters are changed
within the given priors on the proposal distribution, and as
a result the accepted DT models include different predictor
variables. The frequencies of use of these variables reflect
the information about their importance - we assume that
a variable with a greater frequency makes more important
contribution to the classification.
The frequencies were calculated within the random sub-
sampling validation and are shown for all 12 variables in
Fig. 3. Table 2 lists these variables in the order of their im-
portance. In this table we see that the three most important
features are x8 , the speed of the second aircraft on the X-
axis, x1 , the distance between aircraft pair on the X-axis,
and x9 , the speed of the second aircraft on the Y -axis.         Figure 3: Feature Importance within Random Sub-
                                                                  Sampling Validation
By contrast, the variables x11 and x12 , which give the times
T1 and T2 since the last correlated plot in the lateral plane
for the aircraft, are used with a much lower frequency, and       terminals of interest can be converted into a set of n rules
we conclude that they make the smallest contribution.             in the form if xi ≥ qi then . . . , i = 1, . . . , n, which is
                                                                  tractable by experts.
Table 2: Importance of Variables in Terms of Frequency of         The desired model can be found by applying the technique
Use                                                               described in Section 4 as a Sure Correct (SC) DT model.
             VARIABLE FREQUENCY                                   This model is compared with two other DT models that
                                                                  were selected by the existing techniques, MAP and MAPW,
              x8               0.168                              discussed in Section 4. The comparison is made in terms
              x1               0.137                              of misclassification rate within the random sub-sampling
              x9               0.120                              validation and shown in Fig. 4. We see that the SC DT
              x6               0.110                              model more often outperforms the other two models. The
              x4               0.095                              average accuracy is 87.6.
              x5               0.090
              x3               0.078
              x2               0.061
              x10              0.050
              x7               0.042
              x11              0.001
              x12              0.008


5.4   SINGLE DECISION TREE MODELS

The total number of DT models that were collected dur-
ing the MCMC post burn-in phase was 10,000. In theory,
Bayesian averaging over an ensemble of models should
outperform any single model that is taken for interpreta-
tion purposes. In our case, we expect to find a single DT
model whose performance is maximally close to that ob-
tained with the ensemble average. Such a DT model is re-
quired for interpretation purposes and for specifying con-
ditions under which the STCA system makes wrong deci-
sions.                                                            Figure 4: Test errors of SC, MAP, and MAPW Decistion
Having identified mistaken decisions made by the system           Tree Models within Random Sub-Sampling Validation
on the given data, we can use the selected DT model to
specify terminal nodes into which these decisions fall. The       For comparison we also used the CART technique and


                                                            59
found that the average accuracy was 87.0, which is compet-     a Bayesian framework. The use of DT models was intro-
itive with the above SC DT model. The CART technique           duced in order to provide a possible interpretation of fac-
has been run with the Gini diversity index as splitting cri-   tors that can affect the reliability of STCA decisions. In
terion, using the same number (pmin = 5) of data points        these experiments, no prior information about possible DT
allowed in terminals.                                          structures was available.
                                                               A single DT model was selected from the ensemble of DT
5.5   ESTIMATING UNCERTAINTIES IN                              models that were collected during MCMC approximation.
      DETECTION OF CONFLICTS                                   A DT model can be selected as one providing the Max-
                                                               imum a Posteriori probability. However, we have shown
Fig. 5 shows the uncertainty intervals estimated by the pro-   that such a DT model tends to be over-sized and so can
posed BMA technology for the aircraft pair whose traces        underperform. A new technique that is based on estimat-
are plotted in Fig. 1. The upper plot shows the distance       ing the consistency of DT models included in an ensemble
d over radar cycles. The alert cycles here are marked by       was implemented and tested on the STCA data. The ex-
the plus sign. We see that the STCA system missed de-          periments show that this approach outperforms the existing
tection of 5 alerts between 23d and 27th cycles. Further-      techniques in terms of predictive accuracy.
more, the aircraft positions between 38th and 40th cycles
become closer and remain within the distance that triggered    Thus we can conclude that the proposed Bayesian technol-
the alert at the 18th cycle. This probably means that the      ogy can be used to find possible ways of improving accu-
system missed detection of new alerts.                         racy of STCA detection. In a more general context, the
                                                               proposed technology is capable of providing experts with
                                                               the full probabilistic information that is required for inter-
                                                               pretation of decision making where safety is of crucial im-
                                                               portance.


Figure 5: Esimates of Uncertainty Intervals for an Aircraft
Pair

The lower plot shows the estimates of uncertainties in de-
cisions made by the proposed Bayesian technology. Boxes
here show the summary of predictive posterior probability
distributions of alerts. The median probability values ex-
ceed the threshold 0.5 between the 16th and 24th cycles.
The following 3 cycles are detected with large uncertainty,
which indicates a high risk of making wrong decisions. Be-
tween the 33rd and 37th cycles the aircraft move away from
each other and the probability of conflict decreases. How-
ever, between the 38th and 40th cycles they move closer
again and we can observe that the uncertainty intervals be-
come larger. This example demonstrates the ability of the
proposed technology to provide essential information about
risks of making wrong decisions.


6     CONCLUSION

The MCMC technique proposed for Bayesian averaging
over DT models was applied to the STCA problem. In
this work we aimed at modelling the STCA system within


                                                         60
Acknowledgements                                               C. Robert, and G. Casella (2009). Introducing Monte Carlo
                                                               methods with R. Springer.
The authors are grateful to the anonymous reviewers for
useful and constructive comments on the paper. This re-        V. Schetinin et al. (2007). Confident Interpretation of
search was partly supported by the Engineering and Physi-      Bayesian decision trees for clinical applications. IEEE
cal Sciences Research Council (EPSRC), GR/R24357/01.           Transaction on IT in Biomedicine 11(3) 312-319.


References

A. Ayusoa, L. Escuderoa, and F. Martn-Campo (2012)
A mixed 0-1 nonlinear optimization model and algorith-
mic approach for the collision avoidance in ATM: velocity
changes through a time horizon. Computers and Opera-
tions Research 39(12) 3136-3146.
L. Brieman, J. Friedman, R. Olshen, and C. Stone
(1984). Classification and Regression Trees, Belmont, CA,
Wadsworth.
P. Brooker (2005). Airborne collision avoidance systems
and air traffic management safety. Journal of Navigation 1
1-16.
H. Chipman, E. George, and R. McCullock (1998).
Bayesian CART model search, Journal of American Statis-
tics 93 935-960.
H. Chipman, E. George, and R. McCulloch (1998). Making
sense of a forest of trees. In S. Weisberg, (ed.), Symposium
on the Interface. Interface Foundation of North America.
D. Denison, C. Holmes, B. Malick, and A. Smith (2002).
Bayesian Methods for Nonlinear Classification and Re-
gression. Wiley.
P. Domingos (1998). Knowledge discovery via multiple
models. Intelligent Data Analysis 2 187-202.
P. Domingos (2000). Bayesian Averaging of classifiers
and the overfitting problem, International Conference on
Machine Learning, 223-230. Stanford, Morgan Kaufmann.
P. Green (1995). Reversible jump Markov chain Monte
Carlo computation and Bayesian model determination,
Biometrika 82 711-732.
W. Krzanowski, et al. (2006). Confidence in classification:
A Bayesian approach. Journal of Classification 23(2) 199-
220.
K. Majeske, and T. Lauer (2012). Optimizing airline pas-
senger prescreening systems with Bayesian decision mod-
els, Computers and Operations Research 39(8) 1827-1836.
NATS. London Area Control Centre (2002). Short Term
Conflict Alerts. Controller briefing note. National Air
Traffic Services, London
M. Prandini, J. Hu, J. Lygeros, and S. Sastry (2000). A
probabilistic framework for aircraft conict detection. IEEE
Transactions on Intelligent Transportation Systems 1(4)
199-220.


                                                         61