=Paper= {{Paper |id=Vol-2819/session3paper1 |storemode=property |title=On Interactive Machine Learning and the Potential of Cognitive Feedback (https://youtu.be/g3ypFFLX8jo) |pdfUrl=https://ceur-ws.org/Vol-2819/session3paper1.pdf |volume=Vol-2819 |authors=Chris Michael,Dina Acklin,Jaelle Scheuerman }} ==On Interactive Machine Learning and the Potential of Cognitive Feedback (https://youtu.be/g3ypFFLX8jo)== https://ceur-ws.org/Vol-2819/session3paper1.pdf
       On Interactive Machine Learning and the Potential of Cognitive Feedback

                                 Chris J. Michael, Dina Acklin, and Jaelle Scheuerman
                                                U.S. Naval Research Laboratory
                                                  1005 Balch Blvd, Code 7343
                                        Stennis Space Center, Mississippi 39529. U.S.A.
                 chris.michael@nrlssc.navy.mil, dina.acklin@nrlssc.navy.mil, jaelle.scheuerman@nrlssc.navy.mil



                              Abstract                                     the system. Though undoubtedly useful for commercial big-
                                                                           data problems, there are many scenarios – especially in de-
   In order to increase productivity, capability, and data exploita-       fense – where applying AML falls short in practice. For
   tion, numerous defense applications are experiencing an inte-           instance, applications at the tactical edge may suffer from
   gration of state-of-the-art machine learning and AI into their
                                                                           smaller quantities of labeled examples for training. More-
   architectures. Particular to defense applications, having a hu-
   man analyst in the loop is of high interest due to quality con-         over, classifiers may struggle to adapt to changes in data con-
   trol, accountability, and complex subject matter expertise not          text quickly enough to be considered viable by an analyst,
   readily automated or replicated by AI. However, many appli-             particularly in scenarios where the mission demands quick
   cations are suffering from a very slow transition. This may be          turn-around time. Many of these issues may be mitigated by
   in large part due to lack of trust, usability, and productivity,        emerging implementations of the interactive machine learn-
   especially when adapting to unforeseen classes and changes              ing (IML) paradigm, which capitalizes on human input in
   in mission context. Interactive machine learning is a newly             order to improve machine learning implementations (Fails
   emerging field in which machine learning implementations                and Olsen Jr 2003). Unlike approaches that leverage AML,
   are trained, optimized, evaluated, and exploited through an in-         IML implementations allow for classifiers to very quickly
   tuitive human-computer interface. In this paper, we introduce
                                                                           train and apply newly discovered information with the help
   interactive machine learning and explain its advantages and
   limitations within the context of defense applications. Fur-            of a human subject-matter expert, which we refer to in this
   thermore, we address several of the shortcomings of interac-            article as the analyst.
   tive machine learning by discussing how cognitive feedback                 In general, IML may be described as a machine learning
   may inform features, data, and results in the state of the art.         implementation where one or more analysts iteratively im-
   We define the three techniques by which cognitive feedback
                                                                           prove a model for automation by manipulating an interface
   may be employed: self reporting, implicit cognitive feedback,
   and modeled cognitive feedback. The advantages and disad-               that is tightly coupled to the desired task at hand. There are
   vantages of each technique are discussed.                               four main components to any IML implementation. The first
                                                                           component is the data associated with the task. Examples
                                                                           of such data include remotely sensed imagery, textual infor-
     The Emergence of Interactive Machine                                  mation such as reports, and spatiotemporal tracks of moving
                 Learning                                                  objects. The second component, referred to in this study as
                                                                           the machine, is the mathematical model that tries to estimate
The vast majority of modern-day research in machine learn-                 or automate the desired task. Ostensibly, this can be seen as
ing presents algorithms and implementations that do not                    a black-box, but we will discuss the properties of a success-
consider human interaction. For example, the flourishing                   ful IML classifier later in the article. The third component of
field of deep learning research is evaluated mainly by classi-             IML is the Human-Computer Interface (HCI). The HCI may
fication accuracy over large curated datasets and generative               be as conventional as software receiving input through a key-
models. This approach, referred to as Automatic Machine                    board and mouse, which is what we assume in this article, or
Learning (AML) or sometimes conventional machine learn-                    as specialized as vehicle controls, immersive environments,
ing, forgoes the integration of dynamic human feedback into                and brain interfaces. The application is designed to allow
                                                                           immediate and intuitive presentation of the machine’s classi-
 This will certify that all author(s) of the above article/paper are em-   fication on a manageable set of data. This data is then either
ployees of the U.S. Government and performed this work as part of          confirmed or manipulated to be correct by the analyst, who
their employment, and that the article/paper is therefore not subject
                                                                           is the last but most important component of an IML system.
to U.S. copyright protection. No copyright. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY              In this article, we discuss IML within the context of improv-
4.0). In: Proceedings of AAAI Symposium on the 2nd Workshop                ing productivity and decision making for an analyst with
on Deep Models and Artificial Intelligence for Defense Applica-            a very specific task that requires subject-matter expertise.
tions: Potentials, Theories, Practices, Tools, and Risks, November         Though, as exemplified above, IML may be deployed in a
11-12, 2020, Virtual, published at http://ceur-ws.org                      wide variety of ways, we feel that deployment in this context
has the greatest potential for impact in defense applications.    three application areas that are analyst-driven: region digiti-
There are several studies that provide excellent perspectives     zation, textual translation, and video annotation. These ex-
of the current state of the art in IML outside of this scope      amples demonstrate the potential for IML to improve both
(Dudley and Kristensson 2018; Wu, Weld, and Heer 2019;            the machine performance and the user experience with au-
Robert et al. 2016).                                              tonomy.
   A common architecture for IML implementations is                  Geographic region digitization is a highly demanded yet
shown in Figure 1. The data on which the analyst must per-        arduous task whereby regions such as bodies of water and
form a task may either be completely available in a database      other land cover are digitized from remotely sensed images,
or sequentially available as a stream. Active learning may be     usually within a Geographic Information System (Hossain
used to pull the most effective data points from this database    and Chen 2019). Once digitized, regions may be represented
for labeling, as will be discussed in the next section. A ma-     in mapping products for geospatial situational awareness,
chine for predicting the data is then used to present guesses     climate-level studies, land surveys, and many other applica-
for the task at hand to the analyst. The analyst must verify      tions. Although numerous AML approaches to region dig-
each of these guesses and correct any mistakes via the HCI.       itization have been presented in the literature, they are not
Once the verification step is completed for the current iter-     widely adopted in practice. This is most likely due to the
ation, the machine will immediately learn from the correc-        all-or-nothing yield of AML approaches: If the machine in-
tions and/or confirmations. The process will then repeat by       correctly digitizes a region, it may be more burdensome for
the machine gathering data examples and presenting guesses        an analyst to correct than to start from scratch. Therefore, an
to the user once again. When the time comes for the analyst       analyst may prefer to digitize manually to circumvent frus-
to leave their duty station, the machine model may optimize       tration and presumably lower their workload. In order to ad-
on the data that has been labeled in order to maximize its ac-    dress these shortcomings, an IML implementation for region
curacy. This way, the most effective machine will be avail-       digitization, named the Geospatial Region Application In-
able once the analyst returns to duty. It is important to note    terface Toolkit (GRAIT), is presented as a human-machine
that the machine may be deployed as a centralized general-        team application (Michael et al. 2019). The authors address
purpose classifier that combines the work done by multiple        the all-or-nothing approach to region digitization with an
analysts, or it may be deployed locally to be custimized to-      IML implementation where a region is digitized iteratively.
wards the individual analyst.                                     In each iteration, the machine guesses the placement of a
   The focus of this article is to introduce IML within           certain number of vertices of the contour and presents them
the context of analyst-driven applications relevant to de-        to the analyst for verification. For each vertex presented,
fense while highlighting research gaps, the most important        the analyst may either correct its placement by clicking and
of which involves incorporation of cognitive feedback. We         dragging it to an appropriate location or simply confirm its
choose not to discuss manual model interactions such as           correct placement by not interacting with it. The analyst in-
feature selection (Raghavan, Madani, and Jones 2006) or           dicates via button press when all the vertices of the current
model selection (Talbot et al. 2009), which are processes         iteration are corrected or confirmed. The machine will then
whereby analysts directly optimize machine models. Rather,        train on the finalized vertex locations, and the process will
we choose to present implementations that can be used ef-         continue until the region is completely digitized. In order to
fectively by an analyst who is a subject matter expert for the    prevent inducing too high of a cognitive load on the ana-
task at hand and not knowledgeable in machine learning or         lyst, an uncertainty model is used to estimate the probabil-
statistical theory. Defense analysts hold invaluable subject-     ity of incorrect vertex placement and limit each iteration to
matter expertise for the mission, and it is unreasonable to       around 2 incorrectly placed vertices. Results show that with
assume that they must learn or worry about data-scientific        no prior training data, the IML implementation accurately
concepts. Because intuitive HCIs may be designed to be con-       places 84% of vertices correctly in 4 separate image sets of
gruent to their task, IML has great potential to leverage the     4 images each.
power of modern-day ML while not burdening the analyst               Another area where IML approaches show promise is that
with parameter tuning, data curation, or any of the other bur-    of textual language translation, commonly referred to as ma-
dens implicit to AML.                                             chine translation. While bodies of work in this field attempt
   The next section will describe three examples of IML im-       to replace human translators with machine models, many of
plementations that highlight the current state of the art. The    which are AML implementations (Koehn 2009), the current
section that follows will iterate through several advantages,     state of the art is far from perfect. As with region digiti-
shortcomings, and gaps in the state of the art. In the penulti-   zation, fully-automatic approaches may hinder rather than
mate section, we specify the ways in which cognitive feed-        help the performance of a translator at times when too many
back may be used to address the shortcomings and gaps of          mistranslated words may induce excessive cognitive load.
IML with respect to defense applications. Finally, we con-        Because of this, many approaches to machine translation
clude with commentary on prospects for future research.           are realized through a human-machine team. An IML ap-
                                                                  proach to machine translation aims to remedy these issues by
    Interactive Machine Learning in Action                        implementing iterative learning and modeling the informa-
In order to frame a more detailed discussion of IML, we           tiveness of each machine translation at a fine-grained level
now describe several IML implementations that have been           (González-Rubio and Casacuberta 2014). In this approach,
presented in peer-reviewed literature. We specifically choose     an initial guess of a sentence translation is given to the user
                         Data                    Machine                       Interface                  Analyst
                                                                                 (HCI)
                                                                  Guess                       Verify

                                    Active                        Online
                                   Learning                      Learning

                                                                  Refine                      Correct


                                                  Optimization



                      Figure 1: A common architecture for interactive machine learning implementations.


based on a metric of informativeness. The user will then                  utilizing ML. Numerous defense applications suffer from
make corrections to the guess by changing the first incor-                a shortage of labeled training examples due to a lack of
rect letter of the translation. The machine in turn suggests              crowd sourcing and the ever changing state of platform tech-
a new translation under this assumption. This process con-                nologies among other reasons. As such, deep models rely-
tinues, with the machine immediately training on corrected                ing on large amounts of labeled examples cannot be ade-
data for future translations. Results show that employing this            quately trained. IML addresses the shortage of training data
IML-based method produces twice the translation quality, a                by providing an interface that allows for incorrect classifi-
metric specific to machine translation, per user interaction              cations to be immediately corrected and integrated into the
over AML approaches.                                                      machine model. In fact, several IML implementations may
   Lastly, IML implementations have emerged for the dif-                  work well with no prior labeled data, which is usually re-
ficult task of video annotations, where the amount of data                ferred to as the cold start problem (Lika, Kolomvatsos, and
generated per day has far surpassed the ability of analysts to            Hadjiefthymiades 2014). Additionally, the HCI allows for
inspect. When successful, annotated video allows for critical             correction through an intuitive interface that potentially re-
advantages such as the ability to search for events, quantify             duces the burden of data labeling. This allows an analyst to
behavioral analytics, and study natural phenomena. Though                 leverage their current subject-matter expertise – that of the
many AML approaches to video analytics exist, they are typ-               application and data context – and circumvents the need to
ically tied to certain features of interest within some con-              play the role of data scientist.
strained context (Ananthanarayanan et al. 2017). In cases
where context may change and the features of interest are                    Defense problems must be very adaptable to context
unknown, AML implementations for automatic video anno-                    changes from one region of interest to the next. In order to
tation may be rendered incorrect or infeasible. This is espe-             accommodate this, any autonomy must immediately adapt
cially true in cases where context has changed or features of             to such changes at the pace of the analyst. Therefore, IML
interest are unknown beforehand. An IML implementation                    implementations typically apply active learning and online
of video annotation named Janelia Automatic Animal Be-                    learning techniques in order to improve effectiveness. Active
havior Annotator (JAABA) demonstrates a semi-automatic                    learning research entails the study of uncertainty or similar-
approach to assess animal behavior (Kabra et al. 2013).                   ity metrics in order to develop a mathematical understand-
JAABA allows for a user to annotate a video frame with an                 ing of the likelihood that a machine will classify future data
arbitrary label, for instance jump. Then, using trajectory in-            points correctly (Quionero-Candela et al. 2009). The field
formation extracted from the video, the machine trains on                 of online machine learning involves models that may train
the given label and presents classification results both at the           in stride to adapt to new situations quickly while optimizing
level of the current video and a database of numerous an-                 exploration vs. exploitation (Bottou 1998).
imal videos. The machine also provides confidence levels
for each classification to guide further labeling by the user.               Problems related to defense must sometimes be deployed
This process is repeated iteratively until an ideal classifier            at the tactical edge. In such situations, computational re-
is attained. JAABA was used to create the first ML-driven                 sources and downtime may be scarce. IML directly ad-
behavior classifier over a diverse set of animals.                        dresses this problem, since most IML implementations are
   With these three examples in mind, a more detailed expla-              meant to be deployed on desktop computers. In all three ex-
nation of the advantages, limitations, and gaps of IML will               amples of IML presented in the previous section, online and
follow.                                                                   active learning strategies are employed to iteratively build
                                                                          high-performance classifiers. Active learning is also used to
      Advantages, Shortcomings, and Gaps                                  gage the load of examples presented to the user, both by cor-
                                                                          relating uncertainty to the probability of an incorrect classi-
Advantages                                                                fication and by providing a priority for the analyst to man-
The advantages of IML approaches directly address many                    age their own work flow. Both GRAIT and JAABA support
of the shortcomings that defense applications exhibit when                cold-start cases.
Shortcomings                                                       is placed correctly, results focus more on vertex placement
                                                                   accuracy and do not consider multiple load levels (e.g. the
Perhaps the most obvious shortcoming if IML is that the
                                                                   number of expected incorrect vertices is set to two for the
HCI and machine implementation must be tightly coupled
                                                                   entire study). Human factors research is also slated as future
to a specific application. This entails much more effort in
                                                                   work. Both of these studies appreciate that there must be
the development of applications, since they must be built
                                                                   thresholds of cognitive load taken into account by the IML
and studied uniquely towards an explicit work flow. This
                                                                   system for a successful implementation, but it is apparent
differs greatly from AML approaches, where for the most
                                                                   that human-factors research is inevitable.
part implementations are general-purpose and specificity is
implied through parameterization and classes for labeling.
Studies define a general-purpose methodology for HCI, but              The Implications of Cognitive Feedback
this research is young and remains mostly theoretical in na-       Due to its interactive nature, IML most certainly is a human-
ture (Meza Martı́nez and Maedche 2019).                            in-the-loop endeavor. Several studies have highlighted diffi-
    Deep models of machine learning exhibit very impres-           culties that may arise from trust, safety, and quality (Dudley
sive results relating to throughput of data and classification     and Kristensson 2018; Groce et al. 2013; Gillies et al. 2016;
times. IML implementations currently lag behind in these           Turchetta, Berkenkamp, and Krause 2019). This section is
results. This is in part due to the nature of online machine       devoted to discussing the potential of researching and inte-
learning; namely, the need to have tight classification and        grating models of human cognition as feedback for IML,
training cycles. However, research is trending more towards        which is not often mentioned in the state of the art. We
online and active learning problems, and IML-inspired clas-        also make the argument that cognitive feedback directly ad-
sifiers with competent performance are emerging (Langford,         dresses the shortcomings of IML. The topic of cognitive
Li, and Strehl 2007; Lu, Shi, and Jia 2013).                       feedback is especially useful for defense-related problems,
    A further issue with IML is that overfitting may occur         where trust, safety, and quality of ML implementations is a
more frequently since data is generally labeled iteratively.       prerequisite for adoption. Without analyst-driven cognitive
Overfitting occurs when prior training data causes the model       feedback, an IML system can very quickly fall flat, which is
to correlate too tightly to features that do not justify the       illustrated in the following region digitization example.
desired outcome. For example, one of the geographic sites             Consider the analyst using GRAIT to digitize the fourth
in the GRAIT study is Johnson Lake, WA. The first three            image of Johnson Lake as explained in the previous sec-
images show the shoreline in roughly the same location.            tion. Recall that the machine is overfit, and thus its model
The fourth image shows the lake with a receded shoreline.          for uncertainty is undershot. Because of this, the machine
Though the shoreline may be spotted by an analyst clearly in       places 10 vertices, 8 of which are incorrectly placed. If the
the fourth image, the classifier overfit to spaital features and   analyst continues, they will spend more time correcting the
thus incorrectly identified the shoreline. This also caused        misplaced vertices than manually digitizing the lake without
the uncertainty calculations for the image to be undershot.        the help of the machine.
AML approaches to overfitting typically require optimiz-              This example is simple, but it highlights one of the detri-
ing machine parameters or adding diversity to datasets, both       mental problems of IML implementations: Overfitting is in-
of which typically require large amounts of computation            evitable, and it can induce, rather than relieve, cognitive
and thus long turnaround times not conducive to success-           load. As mentioned previously, reinforcement learning may
ful IML implementations. Therefore, reinforcement meta-            be used to augment the uncertainty or similarity model based
learning, whereby active learning implementations are in-          on the number of corrections the user has to make in any it-
formed by corrections via specialized ML implementations,          eration. However, convergence of such a technique would
may be employed to adapt quickly to situations where over-         involve the user making excessive corrections in order to in-
fitting is inevitable (Bachman, Sordoni, and Trischler 2017).      form the model in this example. Unlike AML, the uncer-
                                                                   tainty and workload involved with IML data must be some-
                                                                   how informed by the analyst.
The Cognitive Gap                                                     Figure 2 shows several situations exemplifying various
Although frequently mentioned as a future direction of             levels of cognitive load when an analyst uses GRAIT to an-
study, perhaps the largest identified gap in IML research is       notate some region of interest. In the first example, the ma-
the lack of formalization and quantification of cognitive im-      chine is very accurate but offers too few vertices for the an-
plications from the analyst. For instance, the IML machine         alyst to verify. In this situation, the analyst is impeded by
translation study (González-Rubio and Casacuberta 2014)           an overshot cognitive load. The analyst must work at the
mentions specifically that the applied technique lessens the       slow pace of the IML implementation, which not only re-
cognitive load of the translator by utilizing cost-sensitive       duces their productivity but may also reduce their attention
metrics such as informativeness. However, the study does           and engagement. The second example shows the ideal situ-
not perform any human-factors research to back support this        ation where GRAIT correctly manages the cognitive load of
claim, though it is mentioned as future work. As another           the analyst. The analyst is expected to be engaged and pro-
example, the study presenting GRAIT uses mathematically            ductive. The last situation shows an example of the IML im-
modeled uncertainty calculations to meter the workload at          plementation undershooting the cognitive load. This causes
each iteration. Though it is shown statistically that these un-    the analyst to become overwhelmed and possibly confused,
certainty calculations correlate to the probability a vertex       slowing their productivity and causing frustration.
                                      ZzZ                                                                        ?!?




Figure 2: Various degrees of engagement with IML region digitization. In the first image, the machine has overshot cognitive
load and thus the analyst’s productivity is hampered. In the second image, the analyst is engaged in the task and the machine is
helping their productivity. In the last image, the machine has undershot the cognitive load and thus the analyst is overwhelmed
and will most likely abandon the IML implementation for the task.


   Incorporation of cognitive load is necessary to avoid the        color,” and the machine may then optimize its classifier and
pitfall of bad cognitive load estimation based on analysis of       uncertainty calculation based on this statement.
data alone. For instance, consider an augmentation to the               As opposed to surveying a user, implicit cognitive feed-
third GRAIT example in the figure by providing the user             back may be collected in real time while analysts interact
with a survey at each iteration. The survey will occur before       with the HCI during closed experimentation. Implicit cogni-
correction and simply ask, “Is this workload too little, too        tive feedback involves collecting physiological data in order
much, or fine?” In this particular situation, the analyst will      to infer cognitive states in a manner that is continuous, ob-
inform the machine that the workload is too much to handle,         jective, and occurs in real time. For example, because pupil-
and the machine may modify its uncertainty model accord-            lary responses are reflective of nervous activity, pupil dila-
ingly (e.g. by adjusting weighting or performing best-fit op-       tion may act as a proxy for measuring task-induced cog-
timization to prior iterations). This very simple solution il-      nitive processes. As such, increases in pupil diameter may
lustrates how cognitive feedback may enable better IML for          be indicative of high cognitive load, attentional processing,
many applications, but this concept may be taken further. In        and decision making (Hess and Polt 1964; Kahneman 1973;
order to promote discussion and research of the possibilities       Hahnemann and Beatty 1967) whereas decreases may reflect
and implications of this concept, we now present a taxon-           fatigue (Lowenstein, Feinberg, and Loewenfeld 1963). This
omy for cognitive feedback to inform IML.                           data may then be correlated with self-reporting to define
   Self-reported cognitive feedback is gathered by surveys          various states of cognitive load. Examples of such biofeed-
eliciting cognitive feedback from the user. An example of           back include readings of skin conductance, heart rate, pupi-
such a survey is the standard NASA-TLX, which allows                lometry, and electroencephalogram (EEG). Often, multiple
a user to report on the general experienced workload of a           physiological measures will be assessed to determine work-
particular task (Hart and Staveland 1988). This could be            load and inform adaptive algorithms, in essence creating
gathered offline during human factors evaluation or online          user models that dynamically adjust to support user needs.
through an interface for self reporting within the HCI. The         For example, such physiological elements were examined to
main advantage of online self reporting cognitive load is the       monitor the workload of operators while performing UAV
simplicity to collect feedback within the HCI. Implementa-          piloting tasks of different levels (Wilson and Russell 2007).
tion of simple interventions, such as providing buttons for         The physiological signals were used as features to train
when a workload is too heavy or too light, are trivial. How-        a neural network to classify workload. Another approach
ever, this approach may be imprecise in complex user envi-          of implicit cognitive feedback is to incorporate cognitive
ronments because sub-components of a task may differen-             cues as features in the machine learning algorithm (Rosen-
tially contribute to workload. In these situations, interven-       feld et al. 2012). For example, in a recent choice compe-
tions may be too simplistic or induce load on an analyst.           tition, researchers incorporated cognitive features derived
   Until now, we’ve discussed the implications of self report-      from behavior into a random forest algorithm. They found
ing on cognitive load, but this technique may provide insight       that this approach significantly outperformed other ML ap-
into more than just the analyst’s ideal workload. The field         proaches that did not incorporate cognitive features (Plon-
of explainable artificial intelligence involves expressing the      sky et al. 2017). A recent study has explored how collecting
machine’s decision making to a human user (Gunning and              and applying cognitive cues as features improves reinforce-
Aha 2019). If a model for explainability is feasible, then the      ment learning algorithms for playing video games (Zhang
user may communicate cognitive information relating to fea-         et al. 2019). In summary, implicit cognitive feedback has
tures as feedback to the model (Teso and Kersting 2019). Re-        the potential to improve IML implementations by gathering
lating back to the example above, the machine may explain           data in closed experimentation to inform cognitive load, un-
its decisions by stating “I believe that historic position of the   certainty/similarity measurements, and inform the machine
shoreline is very important.” The user may then augment the         with features of interest related to a specific task.
belief by stating “The historic position is not as important as         Implicit cognitive feedback may provide invaluable in-
                          Table 1: Taxonomy of Cognitive Feedback for Interactive Machine Learning
         Term              Definition                                Examples
         Self Reporting    Gathered by surveying the analyst.        Online: Buttons in HCI.
                                                                     Offline: Human-factors surveys.
         Implicit          Collection and evaluation of biofeedback Cognitive load of correction via HCI.
                           via closed experimentation.               Load as a function of correction count.
                                                                     Use of cognitive cues as ML features.
         Modeled           Utilization of a cognitive model in       Feedback model of user interaction with HCI.
                           the loop.


sight to IML implementations, but the disadvantage lies in         implementation, though this may take high levels of time
the fact that closed experimentation is often necessary to col-    and effort (Groce et al. 2013; Gillies et al. 2016).
lect biofeedback, control levels of tasking, and survey users
of the HCI with respect to a particular application. Addition-         A Future Driven by Cognitive Feedback
ally, the cognitive state of the user may be more dynamic for
some applications than others. In these situations, modeled        We have presented a summary of interactive machine learn-
cognitive feedback may provide cognitive feedback based            ing along with several examples informing the state of the
on models of user interaction with the HCI. For example,           art. After discussing the advantages of IML, the major short-
simulating human behavior using a computational cognitive          comings and gaps were delineated. Finally, the implications
model is another potential method to provide feedback to an        of cognitive feedback for IML implementations were dis-
IML system. Models of cognition and decision making have           cussed to address the gaps. Though it may seem trivial to
been used to simulate human interactions with interfaces in        study cognitive feedback as it relates to data science for
military contexts (Blasch et al. 2011). Cognitive architec-        human-in-the-loop applications, there is a general lack of
tures represent a modeling paradigm that computationally           such studies in the literature, especially for defense applica-
defines the relationship between underlying biological and         tions. We hope this article will encourage research and de-
cognitive mechanisms to emerging behavior. Architectures,          velopment in more IML for defense applications and more
such as ACT-R (Anderson et al. 2004) and SOAR (Laird,              research in how cognitive feedback may inform IML imple-
Newell, and Rosenbloom 1987), have long been a part of             mentations.
HCI research to simulate users interacting with an interface.
For example, ACT-R models are used for usability testing                                   References
of menus (Byrne 2001), modeling how users detect phish-
ing websites (Williams and Li 2017), and detecting situ-           Alves, F.; Szpak, K. S.; Gonçalves, J. L.; Sekino, K.;
ations with high cognitive load when using a smartphone            Aquino, M.; e Castro, R. A.; Koglin, A.; de Lima Fonseca,
(Wirzberger and Russwinkel 2015). Cognitive architectures          N. B.; and Mesa-Lao, B. 2016. Investigating cognitive effort
have been used with physiological data, such as eye tracking       in post-editing: A relevance-theoretical approach. Eyetrack-
information and fMRI, to map observed behavior the under-          ing and Applied Linguistics 2:109.
lying mental states and brain regions (Tamborello and Byrne        Ananthanarayanan, G.; Bahl, P.; Bodı́k, P.; Chintalapudi, K.;
2007; Borst and Anderson 2015). Cognitive models, com-             Philipose, M.; Ravindranath, L.; and Sinha, S. 2017. Real-
bined with self-reported data from surveys and physiolog-          time video analytics: The killer app for edge computing.
ical data, can provide a starting point for IML systems to         computer 50(10):58–67.
optimize their suggestions for the overall performance of a
                                                                   Anderson, J. R.; Bothell, D.; Byrne, M. D.; Douglass, S.;
human-machine team.
                                                                   Lebiere, C.; and Qin, Y. 2004. An integrated theory of the
   These three different categories of cognitive feedback          mind. Psychological review 111(4):1036–1060.
– self reporting, implicit cognitive feedback, and modeled
cognitive feedback – delineate the possible ways in which          Bachman, P.; Sordoni, A.; and Trischler, A. 2017. Learning
IML implementations may be centered around the analyst.            algorithms for active learning. In Proceedings of the 34th
The categories are summarized in Table 1.                          International Conference on Machine Learning-Volume 70,
                                                                   301–310. JMLR. org.
   Once cognitive feedback has been integrated into IML,
more conventional results such as classification accuracy          Blasch, E. P.; Breton, R.; Valin, P.; and Bosse, E. 2011. User
and overall corrections may be used to evaluate approaches         information fusion decision making analysis with the c-ooda
against their non-cognitive baseline. However, these re-           model. In 14th International Conference on Information Fu-
sults may lack true insight into the purpose of the human-         sion, 1–8. IEEE.
machine team. Measuring the cognitive load on human sub-           Borst, J. P., and Anderson, J. R. 2015. Using the ACT-R
jects with more objective metrics of productivity would pro-       Cognitive Architecture in Combination With fMRI Data. In
vide more insight into the effectiveness of IML implementa-        Forstmann, B. U., and Wagenmakers, E.-J., eds., An Intro-
tions (Alves et al. 2016). Additionally, it is the analyst them-   duction to Model-Based Cognitive Neuroscience. New York,
selves who must also evaluate the effectiveness of an IML          NY: Springer. 339–352.
Bottou, L. 1998. Online learning and stochastic approxima-        Langford, J.; Li, L.; and Strehl, A. 2007. Vowpal wabbit
tions. On-line learning in neural networks 17(9):142.             online learning project.
Byrne, M. D. 2001. ACT-R/PM and menu selection: apply-            Lika, B.; Kolomvatsos, K.; and Hadjiefthymiades, S. 2014.
ing a cognitive architecture to HCI. International Journal of     Facing the cold start problem in recommender systems. Ex-
Human-Computer Studies 55(1):41–84.                               pert Syst. Appl. 41(4):2065–2073.
Dudley, J. J., and Kristensson, P. O. 2018. A review of           Lowenstein, O.; Feinberg, R.; and Loewenfeld, I. E. 1963.
user interface design for interactive machine learning. ACM       Pupillary movements during acute and chronic fatigue: A
Trans. Interact. Intell. Syst. 8(2):8:1–8:37.                     new test for the objective evaluation of tiredness. Investiga-
Fails, J. A., and Olsen Jr, D. R. 2003. Interactive machine       tive Ophthalmology & Visual Science 2(2):138–157.
learning. In Proceedings of the 8th international conference      Lu, C.; Shi, J.; and Jia, J. 2013. Online robust dictionary
on Intelligent user interfaces, 39–45. ACM.                       learning. In Proceedings of the IEEE Conference on Com-
                                                                  puter Vision and Pattern Recognition, 415–422.
Gillies, M.; Fiebrink, R.; Tanaka, A.; Garcia, J.; Bevilacqua,
F.; Heloir, A.; Nunnari, F.; Mackay, W.; Amershi, S.; Lee,        Meza Martı́nez, Miguel Angel; Nadj, M., and Maedche, A.
B.; et al. 2016. Human-centred machine learning. In Pro-          2019. Towards an integrative theoretical framework of inter-
ceedings of the 2016 CHI Conference Extended Abstracts on         active machine learning systems. In Proceedings of the 27th
Human Factors in Computing Systems, 3558–3565. ACM.               European Conference on Information Systems(ECIS).
González-Rubio, J., and Casacuberta, F. 2014. Cost-              Michael, C. J.; Dennis, S. M.; Maryan, C.; Irving, S.; and
sensitive active learning for computer-assisted translation.      Palmsten, M. L. 2019. A general framework for human-
Pattern Recognition Letters 37:124 – 134. Partially Super-        machine digitization of geographic regions from remotely
vised Learning for Pattern Recognition.                           sensed imagery. In Proceedings of the 27th ACM SIGSPA-
                                                                  TIAL International Conference on Advances in Geographic
Groce, A.; Kulesza, T.; Zhang, C.; Shamasunder, S.; Burnett,      Information Systems, SIGSPATIAL ’19, 259–268. New
M.; Wong, W.-K.; Stumpf, S.; Das, S.; Shinsel, A.; Bice,          York, NY, USA: ACM.
F.; et al. 2013. You are the only possible oracle: Effec-
tive test selection for end users of interactive machine learn-   Plonsky, O.; Erev, I.; Hazan, T.; and Tennenholtz, M. 2017.
ing systems. IEEE Transactions on Software Engineering            Psychological Forest: Predicting Human Behavior. In Pro-
40(3):307–323.                                                    ceedings of the Thirty-First AAAI Conference on Artificial
                                                                  Intelligence.
Gunning, D., and Aha, D. W. 2019. Darpa’s explainable
artificial intelligence program. AI Magazine 40(2):44–58.         Quionero-Candela, J.; Sugiyama, M.; Schwaighofer, A.; and
                                                                  Lawrence, N. D. 2009. Dataset shift in machine learning.
Hahnemann, D., and Beatty, J. 1967. Pupillary responses           The MIT Press.
in a pitch-discrimination task. Perception & Psychophysics
2(3):101–105.                                                     Raghavan, H.; Madani, O.; and Jones, R. 2006. Active learn-
                                                                  ing with feedback on features and instances. Journal of Ma-
Hart, S. G., and Staveland, L. E. 1988. Development of nasa-      chine Learning Research 7(Aug):1655–1686.
tlx (task load index): Results of empirical and theoretical
research. In Advances in psychology, volume 52. Elsevier.         Robert, S.; Büttner, S.; Röcker, C.; and Holzinger, A. 2016.
139–183.                                                          Reasoning under uncertainty: Towards collaborative interac-
                                                                  tive machine learning. In Machine learning for health infor-
Hess, E. H., and Polt, J. M. 1964. Pupil size in relation         matics. Springer. 357–376.
to mental activity during simple problem-solving. Science
                                                                  Rosenfeld, A.; Zuckerman, I.; Azaria, A.; and Kraus, S.
143(3611):1190–1192.
                                                                  2012. Combining psychological models with machine learn-
Hossain, M. D., and Chen, D. 2019. Segmentation for               ing to better predict people’s decisions. Synthese 189(1):81–
object-based image analysis (obia): A review of algorithms        93.
and challenges from remote sensing perspective. ISPRS
                                                                  Talbot, J.; Lee, B.; Kapoor, A.; and Tan, D. S. 2009. En-
Journal of Photogrammetry and Remote Sensing 150:115–
                                                                  semblematrix: interactive visualization to support machine
134.
                                                                  learning with multiple classifiers. In Proceedings of the
Kabra, M.; Robie, A. A.; Rivera-Alba, M.; Branson, S.; and        SIGCHI Conference on Human Factors in Computing Sys-
Branson, K. 2013. Jaaba: interactive machine learning for         tems, 1283–1292. ACM.
automatic annotation of animal behavior. Nature methods           Tamborello, F. P., and Byrne, M. D. 2007. Adaptive but non-
10(1):64.                                                         optimal visual search behavior with highlighted displays.
Kahneman, D. 1973. Attention and effort, volume 1063.             Cognitive Systems Research 8(3):182–191.
Citeseer.                                                         Teso, S., and Kersting, K. 2019. Explanatory inter-
Koehn, P. 2009. Statistical machine translation. Cambridge        active machine learning. In http://www. aies-conference.
University Press.                                                 com/accepted-papers/. AAAI.
Laird, J. E.; Newell, A.; and Rosenbloom, P. S. 1987.             Turchetta, M.; Berkenkamp, F.; and Krause, A. 2019. Safe
SOAR: An architecture for general intelligence. Artificial        exploration for interactive machine learning. In Advances in
Intelligence 33(1):1–64.                                          Neural Information Processing Systems, 2887–2897.
Williams, N., and Li, S. 2017. Simulating Human Detection
of Phishing Websites: An Investigation into the Applicabil-
ity of the ACT-R Cognitive Behaviour Architecture Model.
In 2017 3rd IEEE International Conference on Cybernetics
(CYBCONF), 1–8. ISSN: null.
Wilson, G. F., and Russell, C. A. 2007. Performance en-
hancement in an uninhabited air vehicle task using psy-
chophysiologically determined adaptive aiding. Human fac-
tors 49(6):1005–1018.
Wirzberger, M., and Russwinkel, N. 2015. Modeling Inter-
ruption and Resumption in a Smartphone Task: An ACT-R
Approach. Journal of Interactive Media 14(2):147–154.
Wu, T.; Weld, D. S.; and Heer, J. 2019. Local decision
pitfalls in interactive machine learning: An investigation into
feature selection in sentiment analysis. ACM Transactions
on Computer-Human Interaction (TOCHI) 26(4):24.
Zhang, R.; Liu, Z.; Guan, L.; Zhang, L.; Hayhoe, M. M.;
and Ballard, D. H. 2019. Atari-head: Atari human
eye-tracking and demonstration dataset. arXiv preprint
arXiv:1903.06754.