=Paper= {{Paper |id=Vol-2600/paper8 |storemode=property |title=GiCoMAF: An Artificial Intelligence Algorithm to Utilize Maps for Operators of Unmanned Aerial Vehicles |pdfUrl=https://ceur-ws.org/Vol-2600/paper8.pdf |volume=Vol-2600 |authors=Yuval Zak,Yisrael Parmet,Tal Oron-Gilad |dblpUrl=https://dblp.org/rec/conf/aaaiss/ZakPO20 }} ==GiCoMAF: An Artificial Intelligence Algorithm to Utilize Maps for Operators of Unmanned Aerial Vehicles== https://ceur-ws.org/Vol-2600/paper8.pdf
 GiCoMAF: An Artificial Intelligence Algorithm to Utilize Maps for Operators of
                          Unmanned Aerial Vehicles
                                     Yuval Zak, Yisrael Parmet and Tal Oron-Gilad
                                               Ben-Gurion University of the Negev
                                                         Beersheva, Israel
                                    {zaky@post.bgu.ac.il, iparmet@bgu.ac.il, orontal@bgu.ac.il}



                            Abstract                                    The use of UAVs in the military domain is increasing, due
                                                                     to their ability to perform missions without risking human
  Unmanned Aerial Vehicles (UAV) operators must maintain             operators (Izzetoglu et al. 2015). UAV operators monitor the
  high levels of situation awareness on their area of operation.     payload, often a camera sending video feed, in various mis-
  To achieve this, they use the Command and control (C2) map,
  which are shared among forces, and is regularly overloaded
                                                                     sions (e.g., reconnaissance, guidance of forces; Marusich et
  with data that is irrelevant to their mission. UAV operators’      al. 2016), in addition to multiple tasks (e.g., navigation and
  missions require distilled information at the right timing. Yet,   orientation, flying the vehicle, radio communication; Ever-
  the existing filtering mechanisms of C2 maps are layer-based       aerts 2008). A command and control (C2) map, often in a
  and insufficient. We propose a new approach to automatically       different display, is used for orienting and making sense of
  and dynamically filter information items on the map based          the payload’s outputs. The C2 map shows mission critical
  on environmental and mission context. To achieve this, we          information and intelligence-related information items such
  introduce a three-tier artificial intelligence (AI)-based algo-    as markings of potential targets, location of allied forces and
  rithm (GiCoMAF), where we delineate the use of machine             so forth. A cognitive work analysis of UAV operators work-
  learning (ML) models to support UAV missions. For the Gi-          flows, emphasizes frequent and continuous use of the C2
  CoMAF development, tagged data was collected in simulated
  experimental runs with professional UAS operators. Differ-
                                                                     map during missions (Back et al. 2019). The C2 map, how-
  ent types of ML models were evaluated and fitted into the          ever, being shared among military elements, is showcasing
  algorithm. The models achieved a relatively high accuracy          information that is irrelevant to the UAV operators. It has
  at modeling human preference and area of interest. The ap-         been indicated (Endsley 2000; Sandom 2000) that informa-
  proach presented in this study can be further implemented to       tion overload is a contributor to poor Situation Awareness
  support other operators in time-critical spatial-temporal prob-    high workload and low overall performance. According to
  lems.                                                              Back et al. (2019), the information clutter in C2 maps may
                                                                     often lead operators to neglect the map, and rely solely on
                                                                     the payload’s feed, and may also lead to fatal results as the
                     1     Introduction                              tragic incident of February 2010. Answering Adams’ (2015)
February 2010, Afghanistan, an American helicopter fired             call for incorporating human factors limitations in the design
on three suspected trucks, killing 23 innocent civilians, and        of UAVs, it serves as an incentive to address the information
wounding 12. The attack was approved based on informa-               overload problem. Some advanced solutions for improving
tion provided by operators of an unmanned aerial vehicle             UAV operators’ SA had been suggested, e.g. using synthetic
(UAV) who did not report the presence of civilians in the            vision (Calhoun et al. 2005). Such solutions, however, may
trucks (Filkins 2010). In a later news report, Shanker and           require costly adjustments of the vehicle’s payload.
Richtel (2011) cite Army officials claiming that the leading            For decluttering, most C2 maps are based on information
cause for the tragic incident was information overload that          layers. Layers can manually or automatically (via a set of
lead to poor Situation Awareness (SA). SA can be defined             rules) be hidden, shown or dimmed. The layer mechanism
as one’s perception of the environment around him or her at          reduces the information overload problem by hiding or dim-
any given point in time (Endsley 1988). Thus, although there         ming layers, yet, at the same time, it may cause information
were evidences for children in the trucks, the UAV operators         deprivation due to an inherent tradeoff; any action performed
“did not adequately focus on them amid the swirl of data”.           on a layer affects it entirely. Thus, it is impossible to hide
                                                                     irrelevant information items in a layer, while showing rele-
Copyright c 2020 held by the author(s). In A. Martin, K. Hinkel-
                                                                     vant ones of the same layer. Therefore, aspiring to solve the
mann, H.-G. Fill, A. Gerber, D. Lenat, R. Stolle, F. van Harmelen
(Eds.), Proceedings of the AAAI 2020 Spring Symposium on Com-        C2 map information overload tradeoff, Zak, Oron-Gilad, and
bining Machine Learning and Knowledge Engineering in Practice        Parmet (2018) called for ’breaking’ the layer mechanism
(AAAI-MAKE 2020). Stanford University, Palo Alto, California,        and addressing the information at the information items’
USA, March 23-25, 2020. Use permitted under Creative Commons         level. Thus, instead of filtering layers of information items,
License Attribution 4.0 International (CC BY 4.0).                   the filter will handle each information item individually (as
Figure 1: Left. An illustration of a layer-based filter, where entire layers can be turned on/off. Right. An information-item level
based filter, where each single information item can be considered as a separate layer and handled individually.


illustrated in Figure 1). This can be achieved by an algorithm       els have been used in UAV, GIS and C2 domains (e.g. Azak
that dynamically and automatically filters information items         and Bayrak 2008; Bao 2016; Choi and Cha 2019; Dzieci-
on the C2 map by their importance and relevance to UAV               uch et al. 2017; Noh and Jeong 2010; Rapaport 2015), but
operator’s current mission. Automating this process, how-            to our knowledge, not as the interface between the C2 map
ever, is challenging as it can inadvertently reduce operators’       and the operators. For our research, we looked for models
SA and performance, especially regarding items chosen not            that can be used for classification (to classify information
to be shown (Endsley and Kiris 1995).                                item’s importance) or logistic regression (for defining the
   Two guidelines led the development of the automatic and           field of relevance, as detailed later). Table 1 describes four
dynamic algorithm. First, working at an information item             ML techniques, suitable for classification and regression and
level requires advanced techniques. Those techniques rise            therefore applicable towards GiCoMAF development.
from the artificial intelligence (AI) domain. Second, as the            There are other ML models that can be used for the pur-
problem is both spatial and temporal, the solution should            pose of this study. For example, Long Short-Term Mem-
adopt perceptual concepts inspired by Gibson and Crooks’             ory (LSTM), is a common deep learning technique for time
(1938) field of safe travel. The field of safe travel, was de-       series data, that can be used for multi-stream data as well
fined as the spatial field where it was safe to steer a car. It      (Behera, Keidel, and Debnath 2019; Bouaziz et al. 2017).
was dependent on driver state, vehicle and environment, and          Given that each information item in the C2 map can be per-
it was constantly changing as the vehicle moved and the con-         ceived as a standalone time series, this model was consid-
text of driving changed. Gibson and Crooks referred to the           ered for this study too. However, the exact number of par-
field’s intuitiveness as affordance. In Section 2 we use these       allel streams (i.e. the number of the information items) is
guidelines for the development of the GiCoMAF (Gibsonian             constantly changing as more information items are added to
Command and Control Map AI Filter algorithm) but first,              the map. Therefore, the LSTM model was neglected.
in Sections 1.1 and 1.2, we review Machine Learning (ML)
models that were considered for the AI implementation and            1.2   Research Goals
delineate the research goals.
                                                                     Acknowledging UAV operators’ C2 map needs, this study
                                                                     aims to introduce the GiCoMAF solution for distilling the
1.1   Machine-Learning (ML) Models for C2                            information presented on the map to show mission relevant
      Systems                                                        important information items, while minimizing or hiding
The military domain is seeking for techniques to incorpo-            less important or distracting items. The first goal is to de-
rate AI into C2 systems and Machine Learning (ML) mod-               velop an algorithm that automatically and dynamically fil-
         Table 1: A short description of machine learning models suitable for classification and regression modeling.
 Model        Description
 Lasso Re- In Generalized Linear Models (GLM) the relationship between the predicting variables and the independent
 gression     variable is a linear function. It is mostly used to predict a single continuous numeric value, but variations allow
 (LR)         it to be used for classification (multinomial linear regression; Greene 2012) or for predicting a probability
              (logistic regression; Walker and Duncan 1967). Lasso is a method of performing feature selection alongside
              GLM by introducing a coefficients size penalty as a constraint. The effect of the penalty on the regression is
              controlled by a user defined λ parameter (Tibshirani 1996; Tibshirani et al. 2012).
 Neural       NN is a composite of input, output, and middle layers (a.k.a. ’hidden layers’) presuming to mimic the work of
 Networks neurons in the human brain. Each layer consists of an undefined number of neurons, which sum the input
 (NN)         received from the previous layer using an activation function f , and ’fire’ the result to the next layer. NNs can
              handle non-linear problems (Kanevski et al. 2004).
 Random       Decision tree is a prediction for non-linear problems where each node of the graph represents a predicting
 Forest       variable. The returned value is the final node the decision led to (Quinlan 1986). The number of possible
 (RF)         results is limited by the number of possible values of the label, and by the depth of the tree (maximum number
              of nodes a branch can have). Increasing the number of possible results can be done by increasing the tree’s
              depth, but it may result in undesired patterns of overfitting. Random Forest overcomes this limitation by
              producing multiple trees, each one is trained on a randomly selected subset of the same dataset. The returned
              value is an average of the predictions of all trees for a regression type model, and the highest frequent value
              for the a classification type model (Breiman 2001).
 XGBoost      Gradient boosting is an iterative ensemble of trees model, where in each iteration a decision tree is learned
              using the residuals of the previous iteration (Friedman 2002). XGBoost is an open source package
              implementing an efficient scalable tree boosting system, incorporating a regularized model to prevent
              overfitting (Chen and Guestrin 2016).


ters the information items on the C2 map. The automatic part        operator’s area of interest be modeled on the map? Tier III
of the algorithm, addresses the filtering at the information        aims at the dynamicity property of the algorithm, and an-
items’ level. The dynamic part of the algorithm addresses           swers the question how often should the automatic filter be
the evolving environmental context of the area of operation.        updated?
The second goal is to delineate the use of ML models in                The construction of the GiCoMAF is illustrated in Fig-
the construction of the algorithm by demonstrating how its          ure 2. The distinction between Tiers I and II is impor-
construction can be achieved using ML models. This goal is          tant. Tier I predicts each information item’s importance. The
attained using importance and relevance labelled data col-          scale is inclusive, i.e., it provides an indication of how im-
lected empirically from UAV operators. Their inputs enable          portant it is to show the information item on the map, and an
the ML models to learn the operators’ contextual informa-           indication of how important it is not to show the item. The
tion needs, and to showcase the algorithm’s feasibility. This       incentive behind this logic is that some non-important infor-
paper describes the process of developing the GiCoMAF al-           mation items may be harmless and operators will be oblivi-
gorithm using ML models. A third goal, not described in this        ous to their presence, while others may be disturbing. More-
paper, is to evaluate the update rate of the GiCoMAF empir-         over, predicted information item’s importance should not be
ically, exploring its effect on operators’ mental workload,         handled in the same way throughout the map space, and this
situation awareness and perception of the experience.               is where Tier II comes into play. Consider an information
   In the following Sections the definition of the GiCoMAF          item with a neutral predicted importance, i.e., not important
algorithm is first outlined. Then, the process of acquiring the     but not disturbing. While the item per se is not considered
ML models constructing the algorithm’s tiers is detailed, in-       disturbing, possibly, if it is within the operators’ area of in-
cluding the data collection, manipulation, and models eval-         terest, they may be more sensitive to disruptions, and the
uation. Lastly, the discussion Section discusses how to com-        item can inadvertently cause clutter or quickly become dis-
bine all tiers into an operating filter algorithm.                  turbing. To avoid such cases, it may be better to filter out the
                                                                    item. Outside the area of interest, however, leaving a neutral
              2   Developing GiCoMAF                                item may be a good strategy, as its importance may rise as
The GiCoMAF – Gibsonian Command and Control Map AI                  the mission evolves. Hence non-important items for the im-
Filter algorithm, consists of two tiers, each answers a dif-        mediate context, may be valuable to foresee future evolve-
ferent research question. The integration of the tiers creates      ment of the situation and prepare for it. Therefore, Tier I of
the filter rule, and incorporates the outcomes of these ques-       predicting the information items’ importance is not enough,
tions into the workflow of the operators. Tier I aims at the        and the algorithm should model the operators’ area of inter-
information item level, and answers the question what is the        est as derived in Tier II. The final decision rule, hence, is
perceived importance of each information item? Tier II aims         a combination of these two tiers. Tier I and II of the algo-
at the map as a whole, and answers the question how can the         rithm are based on ML models, thus, by learning examples
from UAV operators, the algorithm predicts and executes an         of interest’, where each area has a different level of interest.
automatic filter for new operators and in new unknown sce-         For example, Figure 3 illustrates multilevel areas of interest,
narios. ML models require tagged examples to be learned            where in the center of the polygons occurs the mission, and
from. Therefore, an experiment which emulated the work             therefore the smallest area around this focus has the highest
of UAV operator in operational scenarios was designed and          interest. The surroundings do withhold interest to the oper-
conducted to collect tagged data from UAV operators (Zak,          ator since they may affect the mission’s center. Their rele-
Parmet, and Oron-Gilad 2019a, 2019b). Tier III defines the         vance decreases as they get farther than the mission’s center.
rate at which the filter rule of Tiers I and II should be ap-      The ’area of interest’ is modeled using the concept of field
plied. At this stage of development Tier III cannot be based       of relevance, an adaptation of Gibson and Crooks (1938).
on ML models, and represent a pure cognitive issue. Since          The field of relevance depicts operators’ areas of interest
the scope of this study was to delineate the construction of       based on the environment, mission, operators’ behavior, etc.
the GiCoMAF algorithm using ML models, the process of              Moreover, it corresponds with the affordance property as the
studying the cognitive effect of various filter update rates       field of relevance highlights areas that are intuitively more
and setting the optimal rate is due to future research.            focused upon by the operators. Due to the probabilistic char-
                                                                   acteristic of ML models, it was decided to adopt the quality
2.1   Tier I – Information Item Importance                         map approach of Morse, Engh, and Goodrich 2010, and to
Tier I aims to model information items’ importance as per-         model the field of relevance as a pseudo-Gaussian heatmap,
ceived by the operators. An information item’s importance          where each spatial element on the map gets a value between
may vary based on environmental context, mission, and              0 (no relevance) and 1 (high relevance) corresponding to the
characteristics. Generally, operators want important infor-        probability of that spatial point to be in the operator’s area
mation items to be shown on the map. If an information             of interest.
item is not perceived as important, its presence can be dis-          Similar to Tier I, prediction in this tier is done using a ML
turbing and then operators would prefer that it will not to be     model, deducing from insightful environmental and mission
shown, or not disturbing and operators may be impartial to         related measures that describe the context. In this tier those
its presence on the map. Therefore, Tier I attempts to pre-        measures are in respect to a spatial element (e.g., a square
dict and classify information items’ importance level, into        of 10 meters2 ), without relating to any particular informa-
a four tics scale, based on the context: Positive importance       tion item within the area of interest. For example, a mission
represents important (1) and very important (2) informa-           related measure can be the average time a spatial element
tion items. Zero importance represent non-important infor-         is in the UAV payload’s field of view, assuming higher av-
mation items, that operators have no preference to whether         erage time may indicate higher relevance of that element.
they should be shown or not. Negative importance represents        An environmental measure can be the density of informa-
information items that disturb and distract operators from         tion items around a certain element. Assuming higher den-
their mission context.                                             sity around a spatial element indicates upon the probability
   The prediction is done using a ML model. The model,             of some operational event happening at that location, and in
once trained, has the ability to deduce item importance from       turn higher relevance. Similar to Tier I, the examples of rela-
insightful environmental and mission related measures that         tions are for simplifying purposes, and the exact ML model
describe the context. For example, the average distance of         (Table 1) was determined through an empirical process de-
an information item from the UAV payload may describe              tailed in Section 3.
its mission related context. The density of information items
around an information item, for example, may reflect the en-                 3    Constructing the GiCoMAF
vironmental context. Higher density raises the probability of      In the construction of the filter, an experiment emulating
some operational event happening at that location. A ML            the work of UAV operators in the military domain was ex-
model can find the relationship between an information item        ecuted. Participants, professional military UAV operators,
and the route, and then predict the importance of the infor-       were asked to perform a mission of supporting a ground bat-
mation item; and it can determine that as the environment is       talion in urban battlefield scenarios. The data collected using
denser, the probability of showing non-important items in-         the feedbacks they provided during and after the experiment
creases. These examples of relations between derived mea-          was used to construct the ML models of Tiers I and II. The
sures like distance and density and the predicted value are        process is illustrated in Figure 2. The research was approved
given for simplifying purposes, and the real relationship that     by the Institutional Review Board at Ben Gurion University.
emerge from the ML model may be more complicated. Fur-             Informed consent was obtained from each participant.
thermore, selecting the most suitable ML model (Table 1)
was determined empirically as detailed in Section 3.               3.1   Data Tagging Experiment
                                                                   Data for Tiers I and II were collected in a set of experimen-
2.2   Tier II – Operator’s Field of Relevance                      tal runs, detailed in Zak, Parmet, and Oron-Gilad 2019a and
Tier II of the algorithm addresses the areas of interest for the   2019b. The experiment aimed to emulate the work of UAV
operators in the environment. It is essentially a spatial prob-    operator in the military domain. Using a designated sys-
lem that can be represented using geospatial measures on the       tem developed for this task (UCES – UAV Command and
map (e.g. polygons, heatmaps, etc.). The ’area of interest’ is     Control Experiment System), a battlefield scenario was de-
not necessarily a singular area, and can be phrased as ’areas      veloped by subject matter experts with a UAV mission to
Figure 2: The construction process of the GiCoMAF. Tier I and II provide the decision rule for what should be presented on
the C2 map. Tier III is related to the update rate of the filter and how it affects operators’ mission performance, workload and
experience.


                                                                   assist a ground battalion in conquering an urban neighbor-
                                                                   hood. The ‘five paragraph order’, a common military stan-
                                                                   dard of writing a fight plan of the mission, was outlined on
                                                                   the C2 map (Figure 4) and programmed in the VR-Forces
                                                                   simulation engine. The mission was 12 minutes long, rep-
                                                                   resenting a sequence of events that in real life settings may
                                                                   take several hours. Thirteen professional military UAV op-
                                                                   erators performed a reconnaissance mission as if they were
                                                                   acting in a real-world battlefield. The UCES allowed them
                                                                   to control the vehicle’s payload, observe the battlefield from
                                                                   an aerial point of view, and get real-time information from a
                                                                   C2 map. Occasionally at specific points, the scenario paused,
                                                                   and a scoring session had started. They were asked to label
                                                                   two types of information using the UCES map: tagging in-
                                                                   dividual information items’ importance; and drawing their
                                                                   current contextual area of interest as polygons on the map.
                                                                   There were 88 scoring sessions in total, an average of 6.5
                                                                   sessions for each experimental run. The data collected from
                                                                   the runs was put together into two datasets. The dataset con-
                                                                   taining the information item’s importance tags was the in-
                                                                   put for Tier I, and the dataset containing the area of interest
                                                                   polygons was the input for Tier II. Then, ML models for the
Figure 3: An illustration of multilevel areas of interest, as      two tiers were developed. The processes and outcomes are
derived by operators in an operational context. Each polygon       detailed in Sections 3.2 and 3.3.
represents a level of interest, increasing from the outside-in.
                                                                   3.2   Tier I ML Model – Information Items’
                                                                         Importance
                                                                   The data collected in the experiment was divided into two
                                                                   tables. The first table was an event diary, where each row
                                                                            Table 2: Entropy analysis for each model.
                                                                                               Predicted Value
                                                                          Model
                                                                                      Negative     0         1      2
                                                                          LR            NA        NA        NA    0.921
                                                                          NN            NA        NA       0.891 0.917
                                                                          RF           0.716     0.875 0.739 0.661
                                                                          XGBoost      0.999     0.999 0.999        1




                                                                    ing set and the 3 remaining as the validation set. The final
                                                                    model for each technique was built using the derived opti-
                                                                    mal parameters set. Figure 5 illustrates the results for two
                                                                    types of errors: (a) weighted error, the average k-fold result
                                                                    for the optimal parameters set, the training set, and the val-
                                                                    idation set; and (b) sign error (percent of misclassifications
                                                                    of important/very important as disturbing, and vice versa),
                                                                    for all three sets. From Figure 5 it is evident that RF shows
                                                                    patterns of overfitting, and multinomial LR and NN perform
                                                                    better than XGBoost in terms of weighted error. However,
Figure 4: A mockup of the C2 map, including the fight plan          in terms of sign error XGBoost outperforms the others. Fig-
for the simulation’s scenario (in dark blue). The symbols are       ure 6 delves into the differences among the models by illus-
standard NATO symbols, where light blue represents allied           trating a multiclass receiver operating characteristic (ROC)
forces, red represents enemy forces, and green represents           graphs (Hand and Till 2001) on the validation set. In this
neutral entities.                                                   figure, each line represents the ROC of the comparison of
                                                                    two possible values, and the area under the curve (AUC) is
                                                                    the average of all AUCs. It can explain the differences be-
                                                                    tween weighted error and sign error patterns of the NN and
represents one event in one experimental run (e.g., UAV             LR. As seen in the plot, NN and LR have almost no sepa-
movement, forces movement, cross-fire event, etc.). This ta-        ration between the predicted values, probably because mis-
ble contained approximately 90,000 rows. The second table           classifying very important information items as disturbing
is a tagging table, where each row represents an information        had the highest penalty. Thus, both models tend to classify
item in a scoring session in an experimental run and its per-       all information items as ‘very important’. Moreover, accord-
ceived importance as was reported by the participant. This          ing to Figure 6, RF had slightly better AUC, although Fig-
table contained 1,203 rows, where each information item has         ure 5 suggests it was overfitting. A third analysis to evaluate
inherent attributes like its location, type, the time it was last   the models’ performance was to look at how each model
modified, etc. However, item attributes are insufficient for        was certain in the prediction of the validation set. In the pre-
introducing mission context to the ML models. Therefore,            diction process, each model provides a probability for each
the two tables had to be joined into one training dataset. For      level, where the predicted value is set by choosing the level
that, the event diary had to be manipulated into insightful         with the highest probability. It is expected that a more ‘deci-
variables. Thirty-five derived measures were calculated from        sive’ model would provide relatively high probability for the
the event diary to describe the environment and the mission.        predicted value and relatively low probabilities for the other
The objective of the ML model was to classify information           values. The
items’ importance. Misclassifying a non-important informa-                   P entropy for that prediction, defined by the for-
                                                                    mula − i pi loga pi (where pi is the probability of predict-
tion item as important does not have the same implication as        ing a given value i, and a is the number of possible values),
misclassifying an important information item as disturbing.         would then be close to zero. The result of the entropy anal-
Therefore, the error function could not be merely classifica-       ysis as given in Table 2, show the average entropy for each
tion accuracy, and weights were given as penalty for mis-           model and predicted value. It is evident from Table 2 that
classifications based on the sensitivity and specificity of the     LR assigns only the value ‘2’, and NN only the values ‘1’
misclassification.                                                  and ‘2’. All four models have relatively high entropy scores,
   All four techniques described in Table 1 were tested in          sometimes close to one, indicating that the models’ classifi-
a classification configuration. Model technique parameters          cation decisions, based on the levels’ probabilities, are hung
were optimized using the k-fold cross validation technique,         on the fluctuation of an . I.e., and all four models are not
where k = 10. During this process, 22,945 models were               very decisive. The only exception is the RF model, which
built in total for all four techniques. Then, data were divided     had a relatively lower entropy, although still closer to one
by participant; 10 randomly chosen participants as the train-       than zero.
Figure 5: ML model performance result for Tier I. Left. The average weighted error of the k-fold result of the optimal parameters
set (red), the weighted error of the training (green) and the validation (blue) sets. Right. The corresponding sign error of the
k-fold and the sets. One can see that LR and NN models provide less weighted error in comparison to RF and XGBoost, but
more sign error.


3.3   Tier II ML Model – Field of Relevance                        value was logistic (a number in [0,1]). Model performance
                                                                   was measured using root mean square error (RMSE). Due to
The objective of the model was to define the field of rele-        the large scale of data and computing resource limitations,
vance by predicting the probability of each point on the map       408 models were built in total in four k-fold cross valida-
to be in the area of interest. To facilitate the model construc-   tion processes, where k = 3. Figure 7 illustrates the aver-
tion, the C2 map was divided into a grid of cells, where each      age k-fold RMSE results for the optimal parameters set, and
cell represents an area of 25x25 meters on the ground, 9,016       the RMSE of the training and validation sets. While logis-
cells in total. Each row in the dataset represents one cell in     tic LR and NN have better results on the validation set in
a scoring session of an experimental run, with the predicted       terms of RMSE, both XGBoost and RF perform better on
label of (1) if the cell was in the drawn area of interest in      both the training set and k-fold results. Figure 8 delves into
that scoring session, and (0) otherwise. The derived mea-          the results and illustrates a ’rotated confusion matrix’. The
sures were calculated with respect to each cell (e.g. the av-      x-axis represents the predicted value, binned in intervals of
erage distance of a cell from the UAV route, etc.).                0.01. The y-axis represents the average original value of the
   Similar to Tier I, the data was divided into two tables; an     rows corresponding to the bins of the x-axis. The size of the
event diary and a table with the reported feedback from the        points represents the relative number of records in each bin.
participants. In this tier, each row in the reported feedback      A perfect model would provide points aligning with the di-
table represents coordinates of the area of interest’s poly-       agonal of the plot. According to Figure 8, both LR and NN
gon of a scoring session in an experimental run. Each scor-        collapse and do not provide good differentiation. RF pro-
ing session consisted of one polygon. Subtracting two cases        vides something close to binary results, which may conflict
where the participant forgot to draw an area of interest, this     with the field of relevance concept. Therefore, although not
table had 86 rows. Also, the events diary was manipulated          perfectly aligning with the diagonal, XGBoost performs bet-
into 20 derived measures.                                          ter than the other three models. An illustration of XGBoost’s
   The objective of the model was to define the field of rele-     performance relative to the runner-up RF model, in two scor-
vance by predicting the probability of each point on the map       ing session of two different participants is given in Figure 9.
to be in the area of interest. To facilitate the model construc-   Figure 9 demonstrates how each operator marked the field
tion, the map was divided into a grid of cells, where each         of relevance, at the same stage showing that the structure
cell represents an area of 25x25 meters on the ground, 9,016       of the field of relevance depends on the environment, the
cells in total. Thus, each row in the dataset represents a cell    mission, and operators’ individual characteristics and pref-
in a scoring session of an experimental run, with the pre-         erences. The XGBoost model seems to handle these charac-
dicted label of (1) if the cell was in the drawn area of inter-    teristics well.
est in that scoring session, and (0) otherwise. The derived
measures were calculated with respect to each cell (e.g. the
average distance of a cell from the UAV route, etc.).                                    4   Discussion
   The same four ML techniques that were used in Tier I            Operators of UAVs work in uncertain and dynamic envi-
were used in the regression configuration, but the predicted       ronments. Their main focus is on managing their vehicle’s
                                                                  Figure 7: ML model performance result for Tier II. The av-
                                                                  erage RMSE of the k-fold result of the optimal parameters
                                                                  set, and the RMSE of the training and the validation sets. LR
Figure 6: Multiclass ROC curves for the four ML models            and NN have better results on the validation set, both XG-
on the validation set, with the average AUC for each model.       Boost and RF perform better on the training set and k-fold
X-axis – False Positive Rate (1 – Specificity). Y-axis – True     results.
Positive Rate (Sensitivity). The width of each line represents
the number of observations used for the development of the
line in the curve.                                                However, since the RF had extremely low training error, for
                                                                  both error types, it may be overfitted. Further study on the ef-
                                                                  fects of each model on UAV operators’ mission performance
payload (e.g., camera video-feed) while they are required to      is needed to examine if the effect of overfitting is negligi-
maintain orientation and awareness to their current location      ble. Therefore, to be cautious, the XGBoost model was cho-
and its surroundings, and plan ahead. In order to develop         sen for further development. The XGBoost model, however,
the essential SA, they continuously and constantly interact       had an accuracy sign error of 25.9%, and its decisions were
with a C2 map (see Back et al. (2019) for a Cognitive Work        hanging on the thread of an  (Table 2). This magnitude of
Analysis). The map, is often overloaded with data that is         the error is reasonable when modeling human preferences
irrelevant to their mission, and like the operational environ-    and performance (as seen in previous studies, e.g. Agichtein
ment things change rapidly. The existing layer-based map          et al. 2016; Guimerà et al. 2012; Liu, Bian, and Agichtein
filter mechanism causes a tradeoff dilemma for the opera-         2008), especially when having a limited number of observa-
tor –showing an entire layer with its relevant and irrelevant     tions. The XGBoost results of Tier II were visually evaluated
information, or hiding an entire layer and losing important       and provided the best and most accurate pseudo-Gaussian
information. Furthermore, as noted in Back et al. the existing    heatmap for the field of relevance. as illustrated in Figures 8
filtering mechanisms require operators to manually interact       and 9. It should be noted that accuracy can be further im-
with the interface, highlight or hide layers of information at    proved by testing other ML models and parameters. Indeed,
the busiest times of their mission. Thus, currently, obtaining    due to resource limitations, not all suitable models could be
information from C2 maps pose high demands on operators,          tested. Future study can attempt to improve the results of this
and therefore, they often cope with impaired SA. This study       study by applying additional ML models and techniques.
proposes a solution to dynamically and automatically adjust          The two tiers delineate the use of ML models in the GiCo-
the C2 map display to operators’ needs by introducing an          MAF algorithm . Yet, is not clear cut how to combine Tiers
AI-based dynamic and automatic algorithm that filters the         I and II into the GiCoMAF algorithm. Several options for
information on the C2 map at the individual items level (as       how the tier combination protocols are being discussed in
opposed to layers), as described in section 2, and laying the     the following section.
foundation for the algorithm by delineating the use of ML
models in its construction as detailed in Figure 2.
                                                                  4.1   Putting It All Together – Combining Tiers I
    The data tagging experiment was designed to collect la-
                                                                        and II
belled data towards the construction of the ML models of
Tiers I and II; predicting information items’ importance and      The exact combination of Tiers I and II into the GiCoMAF
predicting the field of relevance, respectively. Using the col-   algorithm may depend on the mission, the environment, and
lected data, four different models were optimized and devel-      even the organization that the operators are part of. This sec-
oped for each tier. In Tier I both XGBoost and RF had good        tion suggests three possible combination approaches; how-
results on the validation set, with RF on the upper hand.         ever, the final setup is not limited to these approaches and
                                                                   at points with high relevance (implying more information
                                                                   items would be shown), and rather high at points with low
                                                                   relevance (only very important information items would be
                                                                   shown).
                                                                       Negative Correlation – In contrast to the positive corre-
                                                                   lation approach, this approach assumes that there should be
                                                                   a negative correlation between the field of relevance and in-
                                                                   formation density. I.e., the more relevant a spatial element
                                                                   is, the more attention operators direct to it, and therefore
                                                                   fewer information items should be shown to avoid disrup-
                                                                   tion. Thus, as illustrated in Figure 10b, there should be a
                                                                   negative relation between the field of relevance and infor-
                                                                   mation items’ importance, in similar but opposite approach
                                                                   to the relation described in the positive correlation.
Figure 8: Predicted values binned into intervals of 0.01 (X-           Binary Relevance Decision – This approach assumes that
axis), and the average original values (Y-axis) of correspond-     only important information items should be shown on the
ing records. A perfectly fit model should be aligned with the      map, and only in areas which are currently most relevant to
diagonal line. The XGBoost model seems to be the better            the operators. Therefore, there are only two constant thresh-
predictor of the four techniques.                                  olds in the algorithm – one to decide the minimum impor-
                                                                   tance an information item should have in order to be shown,
                                                                   and one to decide the minimum relative relevance a spatial
                                                                   point should have in order to set the importance threshold
                                                                   into motion. Figure 10c illustrates the approach.
                                                                       For a further study, we propose to examine these three
                                                                   approaches and choose the fittest approach for the context
                                                                   that was addressed in this paper (i.e., the same mission type
                                                                   and profile of participants). For that a new multistage sce-
                                                                   nario similar in nature to the experimental scenarios detailed
                                                                   in this paper is required. Using the ML models developed
                                                                   in Tiers I and II, each scenario can be run in one of the
                                                                   three combination approaches, as well as running with no
                                                                   filter at all. In the first experiment, participants will be intro-
                                                                   duced to all of the approaches and different thresholds setups
Figure 9: The field of relevance simulated in one scoring ses-     would be tested. Then, using the optimal threshold for each
sion in the validation set. The predicted field of relevance is    approach, new operators would participate in a two-stages
illustrated in blue and the operator’s reported area of interest   crossover experiment with eight possible treatments – the
is shown in red. Subfigures (a)-(b) and (c)-(d) represent two      three combination approaches and no filter at all.
different participants, in the same part of the scenario. (a)-
(c) are the predictions of the XGBoost model, (b)-(d) are of                             5    Conclusions
RF model.                                                          This study aimed to introduce a solution to the information
                                                                   overload of C2 maps used by UAV operators. By solving op-
                                                                   erators’ need for distilled information at the right time and in
each organization/user/system case may dictate the devel-          the right place, it is expected that operators will benefit more
opment of a tailored solution. These approaches provide a          from the C2 map at lower efforts, and overall mission per-
show/no-show decision rule based on relevance and impor-           formance will improve. The study had met its goals. First,
tance thresholds. The optimal approach used in the algo-           a solution was achieved by introducing the three-tier Gibso-
rithm, as well as its thresholds, should be continuously eval-     nian Command and Control Map AI Filter algorithm (GiCo-
uated through a process of experiments.                            MAF). Then, ML models were developed and evaluated us-
   Positive Correlation – This approach assumes that there         ing tagged data collected in an experiment. The results of the
should be a positive correlation between the field of rele-        experiment are encouraging. The ML models that emerged
vance and the information density. Thus, the more relevant         from the first experiment were satisfying, and indicated high
a spatial element is, the more information should be pre-          accuracy and usability of the algorithm, a step towards a so-
sented around it. To avoid distraction, the farther away from      lution to the information overload problem, and high work-
the spatial element, the less information should be shown.         load of UAV operators. The combination of the tiers into a
Figure 10a illustrates this approach where there is a negative     single algorithm is yet to be fully defined, and should be de-
relation between the field of relevance and the information        termined by an additional set of experiments we recommend
items’ importance. In order to use this approach, an adap-         performing in future studies.
tive show/no-show threshold should be set based on infor-             The contribution of this study lays in various aspects.
mation items’ importance, where the threshold is rather low        First, this study targets a long-neglected field of improving
Figure 10: An illustration of three possible algorithm combination approaches for Tier I and II. The heatmap represents the field
of relevance, dimmed information items represent items that would not be shown to the operator.


the use of C2 maps by operators. Second, by using ML mod-          — Big Data Analytics For Military Intelligence. Pointer,
els in the construction of the GiCoMAF algorithm, this study       Journal of the Singapore Armed Forces 42(1):51–65.
utilizes AI concepts and techniques to lay the foundation of       Behera, A.; Keidel, A.; and Debnath, B. 2019. Context-
an algorithm that is targeted for improving human perfor-          driven Multi-stream LSTM (M-LSTM) for Recognizing Fine-
mance and human cognitive aspects of workload and SA.              Grained Activity of Drivers, volume 11269 LNCS. Springer
And third, successful construction of the algorithm can im-        International Publishing.
prove mission performance and enhance the cognitive abil-
                                                                   Bouaziz, M.; Morchid, M.; Dufour, R.; Linarès, G.; and
ities of operators to perform spatial-temporal tasks beyond
                                                                   De Mori, R. 2017. Parallel Long Short-Term Memory for
the UAV domain. Algorithms of this form can be further im-
                                                                   multi-stream classification. 2016 IEEE Workshop on Spoken
plemented in any domains where an overloaded spatial and
                                                                   Language Technology, SLT 2016 - Proceedings 218–223.
temporal information has to be filtered, e.g., emergency dis-
patching systems, information guided surgeries, search and         Breiman, L. 2001. Random Forests. Machine Learning
rescue missions, air traffic control, etc.                         45(1):5–32.
                                                                   Calhoun, G. L.; Draper, M. H.; Abernathy, M. F.; Patzek,
                6   Acknowledgments                                M.; and Delgado, F. 2005. Synthetic vision system for im-
                                                                   proving unmanned aerial vehicle operator. In Proc. SPIE
This research is partially funded by the ”Negev” scholar-          5802, Enhanced and Synthetic Vision 2005, volume 5802,
ship, and by the George Shrut Chair in Human Performance           219–230.
Management at Ben-Gurion University of the Negev. Corre-
sponding author – Yuval Zak                                        Chen, T., and Guestrin, C. 2016. XGBoost: A Scalable Tree
                                                                   Boosting System. In KDD ’16 Proceedings of the 22nd ACM
                                                                   SIGKDD International Conference on Knowledge Discov-
                       References                                  ery and Data Mining, 785–794.
Adams, J. A. 2015. Cognitive Task Analysis for Unmanned            Choi, S. Y., and Cha, D. 2019. Unmanned aerial vehicles
Aerial System Design. In Handbook of Unmanned Aerial               using machine learning for autonomous flight; state-of-the-
Vehicles. Dordrecht: Springer Netherlands. 2425–2441.              art. Advanced Robotics 33(6):265–277.
Agichtein, E.; Brill, E.; Dumais, S.; and Ragno, R. 2016.          Dzieciuch, I.; Reeder, J.; Gutzwiller, R.; Gustafson, E.;
Learning User Interaction Models for Predicting Web                Coronado, B.; Martinez, L.; Croft, B.; and Lange, D. S.
Search Result Preferences. In Proceedings of the 29th an-          2017. Amplifying Human Ability through Autonomics and
nual international ACM SIGIR conference on Research and            Machine Learning in IMPACT. In Proc. SPIE 10194, Micro-
development in information retrieval - SIGIR ’06. Seattle,         and Nanotechnology Sensors, Systems, and Applications IX.
Washington, USA: ACM.                                              Endsley, M. R., and Kiris, E. O. 1995. The Out-of-the-Loop
Azak, M., and Bayrak, A. E. 2008. A new approach for               Performance Problem and Level of Control in Automation.
Threat Evaluation and Weapon Assignment problem, hybrid            Human Factors 37(2):381–394.
learning with multi-agent coordination. In 2008 23rd In-           Endsley, M. R. 1988. Situation awareness global assessment
ternational Symposium on Computer and Information Sci-             technique (SAGAT). Aerospace and Electronics Confer-
ences, ISCIS 2008.                                                 ence, 1988. NAECON 1988., Proceedings of the IEEE 1988
Back, Y.; Zak, Y.; Parmet, Y.; and Oron-Gilad, T. 2019. Us-        National 789–795.
ing Cognitive Work Analysis to Understand UAS Operators’           Endsley, M. R. 2000. Theoretical Underpinnings of Situa-
Map Display Needs. Submitted to Requirements Engineer-             tion Awareness: A Critical Review. In Situation Awareness
ing, April 2019.                                                   Analysis and Measurement. Lawrence Erlbaum Associates.
Bao, T. 2016. Swimming In Sensors , Drowning In Data               3–32.
Everaerts, J. . 2008. The Use of Unmanned Aerial Vehi-          Sandom, C. 2000. Operator Situational Awareness and Sys-
cles (UAVs) for Remote Sensing and Mapping. The Inter-          tem Safety. In IEE One Day Seminar on Systems Depen-
national Archives of the Photogrammetry, Remote Sensing         dency on Humans, volume 2000, 5–5. IEE.
and Spatial Information Sciences XXXVII(Part B1):1187–          Shanker, T., and Richtel, M. 2011. In New Military, Data
1192.                                                           Overload Can Be Deadly.
Filkins, D. 2010.     Operators of Drones Are Faulted in        Tibshirani, R.; Bien, J.; Friedman, J.; Hastie, T.; Simon,
Afghan Deaths.                                                  N.; Taylor, J.; and Tibshirani, R. J. 2012. Strong Rules
Friedman, J. H. 2002. Stochastic gradient boosting. Com-        for Discarding Predictors in Lasso-type Problems. Journal
putational Statistics and Data Analysis 38(4):367–378.          of the Royal Statistical Society. Series B (Methodological)
                                                                74(2):245–266.
Gibson, J. J., and Crooks, L. E. 1938. A Theoretical Field-
Analysis of Automobile-Driving. The American journal of         Tibshirani, R. 1996. Regression Shrinkage and Selection via
psychology 51(3):453–471.                                       the Lasso. Journal of the Royal Statistical Society. Series B
                                                                (Methodological) 58(1):267–288.
Greene, W. H. 2012. Econometric Analysis (Seventh ed.).
                                                                Walker, S. H., and Duncan, D. B. 1967. Estimation of the
Boston: Pearson Education.
                                                                probability of an event as a function of several variables in-
Guimerà, R.; Llorente, A.; Moro, E.; and Sales-Pardo, M.       dependent. Biometrika 54(1 and 2):167–179.
2012. Predicting Human Preferences Using the Block Struc-       Zak, Y.; Oron-Gilad, T.; and Parmet, Y. 2018. Operator
ture of Complex Social Networks. PLoS ONE 7(9):3–9.             Workload Reduced in Unmanned Aerial Vehicles : Making
Hand, D. J., and Till, R. J. 2001. A Simple Generalisation      Command and Control ( C2 ) Maps More Useful. In Pro-
of the Area Under the ROC Curve for Multiple Class Clas-        ceedings of the Human Factors and Ergonomics Society An-
sification Problems. Machine Learning 45(2):171–186.            nual Meeting.
Izzetoglu, K.; Ayaz, H.; Hing, J. T.; Shewokis, P. A.; Bunce,   Zak, Y.; Parmet, Y.; and Oron-Gilad, T. 2019a. Making
S. C.; Oh, P. Y.; and Onaral, B. 2015. Uav Operators            Command and Control Maps More Useful for Operators of
Workload Assessment by Optical Brain Imaging Technol-           Unmanned Aerial Systems : A Preliminary Model and Em-
ogy (fNIR). In Handbook of Unmanned Aerial Vehicles.            pirical Approach. In HSI2019.
Dordrecht: Springer Netherlands. 2475–2500.                     Zak, Y.; Parmet, Y.; and Oron-Gilad, T. 2019b. Towards the
Kanevski, M.; Parkin, R.; Pozdnukhov, A.; Timonin, V.;          Development of a Display Filter Algorithm for Command
Maignan, M.; Demyanov, V.; and Canu, S. 2004. Environ-          and Control (C2) Maps for Operators of Unmanned Aerial
mental data mining and modeling based on machine learning       Systems. In Extended Abstracts of the 2019 CHI Confer-
algorithms and geostatistics. Environmental Modelling and       ence on Human Factors in Computing Systems, CHI EA ’19.
Software 19(9):845–855.                                         Glasgow, UK: ACM.
Liu, Y.; Bian, J.; and Agichtein, E. 2008. Predicting In-
formation Seeker Satisfaction in Community Question An-
swering. Proceedings of the 31st annual international ACM
SIGIR conference on Research and development in informa-
tion retrieval - SIGIR ’08 483.
Marusich, L. R.; Bakdash, J. Z.; Onal, E.; Yu, M. S.;
Schaffer, J.; ODonovan, J.; Ho llerer, T.; Buchler, N.; and
Gonzalez, C. 2016. Effects of Information Availabil-
ity on Command-and-Control Decision Making: Perfor-
mance, Trust, and Situation Awareness. Human Factors:
The Journal of the Human Factors and Ergonomics Society
58(2):301–321.
Morse, B. S.; Engh, C.; and Goodrich, M. A. 2010. UAV
video coverage quality maps and prioritized indexing for
wilderness search and rescue. In 2010 5th ACM/IEEE In-
ternational Conference on Human-Robot Interaction (HRI),
227–234. Osaka: IEEE.
Noh, S., and Jeong, U. 2010. Intelligent Command and
Control Agent in Electronic Warfare Settings. International
Journal of intelligent Systems 25:514–528.
Quinlan, J. R. 1986. Induction of Decision Trees. Machine
Learning 1(1):81–106.
Rapaport, A. 2015. ”Quite a few Terrorists lost their lives
owing to Big Data”.