On Interactive Machine Learning and the Potential of Cognitive Feedback Chris J. Michael, Dina Acklin, and Jaelle Scheuerman U.S. Naval Research Laboratory 1005 Balch Blvd, Code 7343 Stennis Space Center, Mississippi 39529. U.S.A. chris.michael@nrlssc.navy.mil, dina.acklin@nrlssc.navy.mil, jaelle.scheuerman@nrlssc.navy.mil Abstract the system. Though undoubtedly useful for commercial big- data problems, there are many scenarios – especially in de- In order to increase productivity, capability, and data exploita- fense – where applying AML falls short in practice. For tion, numerous defense applications are experiencing an inte- instance, applications at the tactical edge may suffer from gration of state-of-the-art machine learning and AI into their smaller quantities of labeled examples for training. More- architectures. Particular to defense applications, having a hu- man analyst in the loop is of high interest due to quality con- over, classifiers may struggle to adapt to changes in data con- trol, accountability, and complex subject matter expertise not text quickly enough to be considered viable by an analyst, readily automated or replicated by AI. However, many appli- particularly in scenarios where the mission demands quick cations are suffering from a very slow transition. This may be turn-around time. Many of these issues may be mitigated by in large part due to lack of trust, usability, and productivity, emerging implementations of the interactive machine learn- especially when adapting to unforeseen classes and changes ing (IML) paradigm, which capitalizes on human input in in mission context. Interactive machine learning is a newly order to improve machine learning implementations (Fails emerging field in which machine learning implementations and Olsen Jr 2003). Unlike approaches that leverage AML, are trained, optimized, evaluated, and exploited through an in- IML implementations allow for classifiers to very quickly tuitive human-computer interface. In this paper, we introduce train and apply newly discovered information with the help interactive machine learning and explain its advantages and limitations within the context of defense applications. Fur- of a human subject-matter expert, which we refer to in this thermore, we address several of the shortcomings of interac- article as the analyst. tive machine learning by discussing how cognitive feedback In general, IML may be described as a machine learning may inform features, data, and results in the state of the art. implementation where one or more analysts iteratively im- We define the three techniques by which cognitive feedback prove a model for automation by manipulating an interface may be employed: self reporting, implicit cognitive feedback, and modeled cognitive feedback. The advantages and disad- that is tightly coupled to the desired task at hand. There are vantages of each technique are discussed. four main components to any IML implementation. The first component is the data associated with the task. Examples of such data include remotely sensed imagery, textual infor- The Emergence of Interactive Machine mation such as reports, and spatiotemporal tracks of moving Learning objects. The second component, referred to in this study as the machine, is the mathematical model that tries to estimate The vast majority of modern-day research in machine learn- or automate the desired task. Ostensibly, this can be seen as ing presents algorithms and implementations that do not a black-box, but we will discuss the properties of a success- consider human interaction. For example, the flourishing ful IML classifier later in the article. The third component of field of deep learning research is evaluated mainly by classi- IML is the Human-Computer Interface (HCI). The HCI may fication accuracy over large curated datasets and generative be as conventional as software receiving input through a key- models. This approach, referred to as Automatic Machine board and mouse, which is what we assume in this article, or Learning (AML) or sometimes conventional machine learn- as specialized as vehicle controls, immersive environments, ing, forgoes the integration of dynamic human feedback into and brain interfaces. The application is designed to allow immediate and intuitive presentation of the machine’s classi- This will certify that all author(s) of the above article/paper are em- fication on a manageable set of data. This data is then either ployees of the U.S. Government and performed this work as part of confirmed or manipulated to be correct by the analyst, who their employment, and that the article/paper is therefore not subject is the last but most important component of an IML system. to U.S. copyright protection. No copyright. Use permitted under Creative Commons License Attribution 4.0 International (CC BY In this article, we discuss IML within the context of improv- 4.0). In: Proceedings of AAAI Symposium on the 2nd Workshop ing productivity and decision making for an analyst with on Deep Models and Artificial Intelligence for Defense Applica- a very specific task that requires subject-matter expertise. tions: Potentials, Theories, Practices, Tools, and Risks, November Though, as exemplified above, IML may be deployed in a 11-12, 2020, Virtual, published at http://ceur-ws.org wide variety of ways, we feel that deployment in this context has the greatest potential for impact in defense applications. three application areas that are analyst-driven: region digiti- There are several studies that provide excellent perspectives zation, textual translation, and video annotation. These ex- of the current state of the art in IML outside of this scope amples demonstrate the potential for IML to improve both (Dudley and Kristensson 2018; Wu, Weld, and Heer 2019; the machine performance and the user experience with au- Robert et al. 2016). tonomy. A common architecture for IML implementations is Geographic region digitization is a highly demanded yet shown in Figure 1. The data on which the analyst must per- arduous task whereby regions such as bodies of water and form a task may either be completely available in a database other land cover are digitized from remotely sensed images, or sequentially available as a stream. Active learning may be usually within a Geographic Information System (Hossain used to pull the most effective data points from this database and Chen 2019). Once digitized, regions may be represented for labeling, as will be discussed in the next section. A ma- in mapping products for geospatial situational awareness, chine for predicting the data is then used to present guesses climate-level studies, land surveys, and many other applica- for the task at hand to the analyst. The analyst must verify tions. Although numerous AML approaches to region dig- each of these guesses and correct any mistakes via the HCI. itization have been presented in the literature, they are not Once the verification step is completed for the current iter- widely adopted in practice. This is most likely due to the ation, the machine will immediately learn from the correc- all-or-nothing yield of AML approaches: If the machine in- tions and/or confirmations. The process will then repeat by correctly digitizes a region, it may be more burdensome for the machine gathering data examples and presenting guesses an analyst to correct than to start from scratch. Therefore, an to the user once again. When the time comes for the analyst analyst may prefer to digitize manually to circumvent frus- to leave their duty station, the machine model may optimize tration and presumably lower their workload. In order to ad- on the data that has been labeled in order to maximize its ac- dress these shortcomings, an IML implementation for region curacy. This way, the most effective machine will be avail- digitization, named the Geospatial Region Application In- able once the analyst returns to duty. It is important to note terface Toolkit (GRAIT), is presented as a human-machine that the machine may be deployed as a centralized general- team application (Michael et al. 2019). The authors address purpose classifier that combines the work done by multiple the all-or-nothing approach to region digitization with an analysts, or it may be deployed locally to be custimized to- IML implementation where a region is digitized iteratively. wards the individual analyst. In each iteration, the machine guesses the placement of a The focus of this article is to introduce IML within certain number of vertices of the contour and presents them the context of analyst-driven applications relevant to de- to the analyst for verification. For each vertex presented, fense while highlighting research gaps, the most important the analyst may either correct its placement by clicking and of which involves incorporation of cognitive feedback. We dragging it to an appropriate location or simply confirm its choose not to discuss manual model interactions such as correct placement by not interacting with it. The analyst in- feature selection (Raghavan, Madani, and Jones 2006) or dicates via button press when all the vertices of the current model selection (Talbot et al. 2009), which are processes iteration are corrected or confirmed. The machine will then whereby analysts directly optimize machine models. Rather, train on the finalized vertex locations, and the process will we choose to present implementations that can be used ef- continue until the region is completely digitized. In order to fectively by an analyst who is a subject matter expert for the prevent inducing too high of a cognitive load on the ana- task at hand and not knowledgeable in machine learning or lyst, an uncertainty model is used to estimate the probabil- statistical theory. Defense analysts hold invaluable subject- ity of incorrect vertex placement and limit each iteration to matter expertise for the mission, and it is unreasonable to around 2 incorrectly placed vertices. Results show that with assume that they must learn or worry about data-scientific no prior training data, the IML implementation accurately concepts. Because intuitive HCIs may be designed to be con- places 84% of vertices correctly in 4 separate image sets of gruent to their task, IML has great potential to leverage the 4 images each. power of modern-day ML while not burdening the analyst Another area where IML approaches show promise is that with parameter tuning, data curation, or any of the other bur- of textual language translation, commonly referred to as ma- dens implicit to AML. chine translation. While bodies of work in this field attempt The next section will describe three examples of IML im- to replace human translators with machine models, many of plementations that highlight the current state of the art. The which are AML implementations (Koehn 2009), the current section that follows will iterate through several advantages, state of the art is far from perfect. As with region digiti- shortcomings, and gaps in the state of the art. In the penulti- zation, fully-automatic approaches may hinder rather than mate section, we specify the ways in which cognitive feed- help the performance of a translator at times when too many back may be used to address the shortcomings and gaps of mistranslated words may induce excessive cognitive load. IML with respect to defense applications. Finally, we con- Because of this, many approaches to machine translation clude with commentary on prospects for future research. are realized through a human-machine team. An IML ap- proach to machine translation aims to remedy these issues by Interactive Machine Learning in Action implementing iterative learning and modeling the informa- In order to frame a more detailed discussion of IML, we tiveness of each machine translation at a fine-grained level now describe several IML implementations that have been (González-Rubio and Casacuberta 2014). In this approach, presented in peer-reviewed literature. We specifically choose an initial guess of a sentence translation is given to the user Data Machine Interface Analyst (HCI) Guess Verify Active Online Learning Learning Refine Correct Optimization Figure 1: A common architecture for interactive machine learning implementations. based on a metric of informativeness. The user will then utilizing ML. Numerous defense applications suffer from make corrections to the guess by changing the first incor- a shortage of labeled training examples due to a lack of rect letter of the translation. The machine in turn suggests crowd sourcing and the ever changing state of platform tech- a new translation under this assumption. This process con- nologies among other reasons. As such, deep models rely- tinues, with the machine immediately training on corrected ing on large amounts of labeled examples cannot be ade- data for future translations. Results show that employing this quately trained. IML addresses the shortage of training data IML-based method produces twice the translation quality, a by providing an interface that allows for incorrect classifi- metric specific to machine translation, per user interaction cations to be immediately corrected and integrated into the over AML approaches. machine model. In fact, several IML implementations may Lastly, IML implementations have emerged for the dif- work well with no prior labeled data, which is usually re- ficult task of video annotations, where the amount of data ferred to as the cold start problem (Lika, Kolomvatsos, and generated per day has far surpassed the ability of analysts to Hadjiefthymiades 2014). Additionally, the HCI allows for inspect. When successful, annotated video allows for critical correction through an intuitive interface that potentially re- advantages such as the ability to search for events, quantify duces the burden of data labeling. This allows an analyst to behavioral analytics, and study natural phenomena. Though leverage their current subject-matter expertise – that of the many AML approaches to video analytics exist, they are typ- application and data context – and circumvents the need to ically tied to certain features of interest within some con- play the role of data scientist. strained context (Ananthanarayanan et al. 2017). In cases where context may change and the features of interest are Defense problems must be very adaptable to context unknown, AML implementations for automatic video anno- changes from one region of interest to the next. In order to tation may be rendered incorrect or infeasible. This is espe- accommodate this, any autonomy must immediately adapt cially true in cases where context has changed or features of to such changes at the pace of the analyst. Therefore, IML interest are unknown beforehand. An IML implementation implementations typically apply active learning and online of video annotation named Janelia Automatic Animal Be- learning techniques in order to improve effectiveness. Active havior Annotator (JAABA) demonstrates a semi-automatic learning research entails the study of uncertainty or similar- approach to assess animal behavior (Kabra et al. 2013). ity metrics in order to develop a mathematical understand- JAABA allows for a user to annotate a video frame with an ing of the likelihood that a machine will classify future data arbitrary label, for instance jump. Then, using trajectory in- points correctly (Quionero-Candela et al. 2009). The field formation extracted from the video, the machine trains on of online machine learning involves models that may train the given label and presents classification results both at the in stride to adapt to new situations quickly while optimizing level of the current video and a database of numerous an- exploration vs. exploitation (Bottou 1998). imal videos. The machine also provides confidence levels for each classification to guide further labeling by the user. Problems related to defense must sometimes be deployed This process is repeated iteratively until an ideal classifier at the tactical edge. In such situations, computational re- is attained. JAABA was used to create the first ML-driven sources and downtime may be scarce. IML directly ad- behavior classifier over a diverse set of animals. dresses this problem, since most IML implementations are With these three examples in mind, a more detailed expla- meant to be deployed on desktop computers. In all three ex- nation of the advantages, limitations, and gaps of IML will amples of IML presented in the previous section, online and follow. active learning strategies are employed to iteratively build high-performance classifiers. Active learning is also used to Advantages, Shortcomings, and Gaps gage the load of examples presented to the user, both by cor- relating uncertainty to the probability of an incorrect classi- Advantages fication and by providing a priority for the analyst to man- The advantages of IML approaches directly address many age their own work flow. Both GRAIT and JAABA support of the shortcomings that defense applications exhibit when cold-start cases. Shortcomings is placed correctly, results focus more on vertex placement accuracy and do not consider multiple load levels (e.g. the Perhaps the most obvious shortcoming if IML is that the number of expected incorrect vertices is set to two for the HCI and machine implementation must be tightly coupled entire study). Human factors research is also slated as future to a specific application. This entails much more effort in work. Both of these studies appreciate that there must be the development of applications, since they must be built thresholds of cognitive load taken into account by the IML and studied uniquely towards an explicit work flow. This system for a successful implementation, but it is apparent differs greatly from AML approaches, where for the most that human-factors research is inevitable. part implementations are general-purpose and specificity is implied through parameterization and classes for labeling. Studies define a general-purpose methodology for HCI, but The Implications of Cognitive Feedback this research is young and remains mostly theoretical in na- Due to its interactive nature, IML most certainly is a human- ture (Meza Martı́nez and Maedche 2019). in-the-loop endeavor. Several studies have highlighted diffi- Deep models of machine learning exhibit very impres- culties that may arise from trust, safety, and quality (Dudley sive results relating to throughput of data and classification and Kristensson 2018; Groce et al. 2013; Gillies et al. 2016; times. IML implementations currently lag behind in these Turchetta, Berkenkamp, and Krause 2019). This section is results. This is in part due to the nature of online machine devoted to discussing the potential of researching and inte- learning; namely, the need to have tight classification and grating models of human cognition as feedback for IML, training cycles. However, research is trending more towards which is not often mentioned in the state of the art. We online and active learning problems, and IML-inspired clas- also make the argument that cognitive feedback directly ad- sifiers with competent performance are emerging (Langford, dresses the shortcomings of IML. The topic of cognitive Li, and Strehl 2007; Lu, Shi, and Jia 2013). feedback is especially useful for defense-related problems, A further issue with IML is that overfitting may occur where trust, safety, and quality of ML implementations is a more frequently since data is generally labeled iteratively. prerequisite for adoption. Without analyst-driven cognitive Overfitting occurs when prior training data causes the model feedback, an IML system can very quickly fall flat, which is to correlate too tightly to features that do not justify the illustrated in the following region digitization example. desired outcome. For example, one of the geographic sites Consider the analyst using GRAIT to digitize the fourth in the GRAIT study is Johnson Lake, WA. The first three image of Johnson Lake as explained in the previous sec- images show the shoreline in roughly the same location. tion. Recall that the machine is overfit, and thus its model The fourth image shows the lake with a receded shoreline. for uncertainty is undershot. Because of this, the machine Though the shoreline may be spotted by an analyst clearly in places 10 vertices, 8 of which are incorrectly placed. If the the fourth image, the classifier overfit to spaital features and analyst continues, they will spend more time correcting the thus incorrectly identified the shoreline. This also caused misplaced vertices than manually digitizing the lake without the uncertainty calculations for the image to be undershot. the help of the machine. AML approaches to overfitting typically require optimiz- This example is simple, but it highlights one of the detri- ing machine parameters or adding diversity to datasets, both mental problems of IML implementations: Overfitting is in- of which typically require large amounts of computation evitable, and it can induce, rather than relieve, cognitive and thus long turnaround times not conducive to success- load. As mentioned previously, reinforcement learning may ful IML implementations. Therefore, reinforcement meta- be used to augment the uncertainty or similarity model based learning, whereby active learning implementations are in- on the number of corrections the user has to make in any it- formed by corrections via specialized ML implementations, eration. However, convergence of such a technique would may be employed to adapt quickly to situations where over- involve the user making excessive corrections in order to in- fitting is inevitable (Bachman, Sordoni, and Trischler 2017). form the model in this example. Unlike AML, the uncer- tainty and workload involved with IML data must be some- how informed by the analyst. The Cognitive Gap Figure 2 shows several situations exemplifying various Although frequently mentioned as a future direction of levels of cognitive load when an analyst uses GRAIT to an- study, perhaps the largest identified gap in IML research is notate some region of interest. In the first example, the ma- the lack of formalization and quantification of cognitive im- chine is very accurate but offers too few vertices for the an- plications from the analyst. For instance, the IML machine alyst to verify. In this situation, the analyst is impeded by translation study (González-Rubio and Casacuberta 2014) an overshot cognitive load. The analyst must work at the mentions specifically that the applied technique lessens the slow pace of the IML implementation, which not only re- cognitive load of the translator by utilizing cost-sensitive duces their productivity but may also reduce their attention metrics such as informativeness. However, the study does and engagement. The second example shows the ideal situ- not perform any human-factors research to back support this ation where GRAIT correctly manages the cognitive load of claim, though it is mentioned as future work. As another the analyst. The analyst is expected to be engaged and pro- example, the study presenting GRAIT uses mathematically ductive. The last situation shows an example of the IML im- modeled uncertainty calculations to meter the workload at plementation undershooting the cognitive load. This causes each iteration. Though it is shown statistically that these un- the analyst to become overwhelmed and possibly confused, certainty calculations correlate to the probability a vertex slowing their productivity and causing frustration. ZzZ ?!? Figure 2: Various degrees of engagement with IML region digitization. In the first image, the machine has overshot cognitive load and thus the analyst’s productivity is hampered. In the second image, the analyst is engaged in the task and the machine is helping their productivity. In the last image, the machine has undershot the cognitive load and thus the analyst is overwhelmed and will most likely abandon the IML implementation for the task. Incorporation of cognitive load is necessary to avoid the color,” and the machine may then optimize its classifier and pitfall of bad cognitive load estimation based on analysis of uncertainty calculation based on this statement. data alone. For instance, consider an augmentation to the As opposed to surveying a user, implicit cognitive feed- third GRAIT example in the figure by providing the user back may be collected in real time while analysts interact with a survey at each iteration. The survey will occur before with the HCI during closed experimentation. Implicit cogni- correction and simply ask, “Is this workload too little, too tive feedback involves collecting physiological data in order much, or fine?” In this particular situation, the analyst will to infer cognitive states in a manner that is continuous, ob- inform the machine that the workload is too much to handle, jective, and occurs in real time. For example, because pupil- and the machine may modify its uncertainty model accord- lary responses are reflective of nervous activity, pupil dila- ingly (e.g. by adjusting weighting or performing best-fit op- tion may act as a proxy for measuring task-induced cog- timization to prior iterations). This very simple solution il- nitive processes. As such, increases in pupil diameter may lustrates how cognitive feedback may enable better IML for be indicative of high cognitive load, attentional processing, many applications, but this concept may be taken further. In and decision making (Hess and Polt 1964; Kahneman 1973; order to promote discussion and research of the possibilities Hahnemann and Beatty 1967) whereas decreases may reflect and implications of this concept, we now present a taxon- fatigue (Lowenstein, Feinberg, and Loewenfeld 1963). This omy for cognitive feedback to inform IML. data may then be correlated with self-reporting to define Self-reported cognitive feedback is gathered by surveys various states of cognitive load. Examples of such biofeed- eliciting cognitive feedback from the user. An example of back include readings of skin conductance, heart rate, pupi- such a survey is the standard NASA-TLX, which allows lometry, and electroencephalogram (EEG). Often, multiple a user to report on the general experienced workload of a physiological measures will be assessed to determine work- particular task (Hart and Staveland 1988). This could be load and inform adaptive algorithms, in essence creating gathered offline during human factors evaluation or online user models that dynamically adjust to support user needs. through an interface for self reporting within the HCI. The For example, such physiological elements were examined to main advantage of online self reporting cognitive load is the monitor the workload of operators while performing UAV simplicity to collect feedback within the HCI. Implementa- piloting tasks of different levels (Wilson and Russell 2007). tion of simple interventions, such as providing buttons for The physiological signals were used as features to train when a workload is too heavy or too light, are trivial. How- a neural network to classify workload. Another approach ever, this approach may be imprecise in complex user envi- of implicit cognitive feedback is to incorporate cognitive ronments because sub-components of a task may differen- cues as features in the machine learning algorithm (Rosen- tially contribute to workload. In these situations, interven- feld et al. 2012). For example, in a recent choice compe- tions may be too simplistic or induce load on an analyst. tition, researchers incorporated cognitive features derived Until now, we’ve discussed the implications of self report- from behavior into a random forest algorithm. They found ing on cognitive load, but this technique may provide insight that this approach significantly outperformed other ML ap- into more than just the analyst’s ideal workload. The field proaches that did not incorporate cognitive features (Plon- of explainable artificial intelligence involves expressing the sky et al. 2017). A recent study has explored how collecting machine’s decision making to a human user (Gunning and and applying cognitive cues as features improves reinforce- Aha 2019). If a model for explainability is feasible, then the ment learning algorithms for playing video games (Zhang user may communicate cognitive information relating to fea- et al. 2019). In summary, implicit cognitive feedback has tures as feedback to the model (Teso and Kersting 2019). Re- the potential to improve IML implementations by gathering lating back to the example above, the machine may explain data in closed experimentation to inform cognitive load, un- its decisions by stating “I believe that historic position of the certainty/similarity measurements, and inform the machine shoreline is very important.” The user may then augment the with features of interest related to a specific task. belief by stating “The historic position is not as important as Implicit cognitive feedback may provide invaluable in- Table 1: Taxonomy of Cognitive Feedback for Interactive Machine Learning Term Definition Examples Self Reporting Gathered by surveying the analyst. Online: Buttons in HCI. Offline: Human-factors surveys. Implicit Collection and evaluation of biofeedback Cognitive load of correction via HCI. via closed experimentation. Load as a function of correction count. Use of cognitive cues as ML features. Modeled Utilization of a cognitive model in Feedback model of user interaction with HCI. the loop. sight to IML implementations, but the disadvantage lies in implementation, though this may take high levels of time the fact that closed experimentation is often necessary to col- and effort (Groce et al. 2013; Gillies et al. 2016). lect biofeedback, control levels of tasking, and survey users of the HCI with respect to a particular application. Addition- A Future Driven by Cognitive Feedback ally, the cognitive state of the user may be more dynamic for some applications than others. In these situations, modeled We have presented a summary of interactive machine learn- cognitive feedback may provide cognitive feedback based ing along with several examples informing the state of the on models of user interaction with the HCI. For example, art. After discussing the advantages of IML, the major short- simulating human behavior using a computational cognitive comings and gaps were delineated. Finally, the implications model is another potential method to provide feedback to an of cognitive feedback for IML implementations were dis- IML system. Models of cognition and decision making have cussed to address the gaps. Though it may seem trivial to been used to simulate human interactions with interfaces in study cognitive feedback as it relates to data science for military contexts (Blasch et al. 2011). Cognitive architec- human-in-the-loop applications, there is a general lack of tures represent a modeling paradigm that computationally such studies in the literature, especially for defense applica- defines the relationship between underlying biological and tions. We hope this article will encourage research and de- cognitive mechanisms to emerging behavior. Architectures, velopment in more IML for defense applications and more such as ACT-R (Anderson et al. 2004) and SOAR (Laird, research in how cognitive feedback may inform IML imple- Newell, and Rosenbloom 1987), have long been a part of mentations. HCI research to simulate users interacting with an interface. For example, ACT-R models are used for usability testing References of menus (Byrne 2001), modeling how users detect phish- ing websites (Williams and Li 2017), and detecting situ- Alves, F.; Szpak, K. S.; Gonçalves, J. L.; Sekino, K.; ations with high cognitive load when using a smartphone Aquino, M.; e Castro, R. A.; Koglin, A.; de Lima Fonseca, (Wirzberger and Russwinkel 2015). Cognitive architectures N. B.; and Mesa-Lao, B. 2016. Investigating cognitive effort have been used with physiological data, such as eye tracking in post-editing: A relevance-theoretical approach. Eyetrack- information and fMRI, to map observed behavior the under- ing and Applied Linguistics 2:109. lying mental states and brain regions (Tamborello and Byrne Ananthanarayanan, G.; Bahl, P.; Bodı́k, P.; Chintalapudi, K.; 2007; Borst and Anderson 2015). Cognitive models, com- Philipose, M.; Ravindranath, L.; and Sinha, S. 2017. Real- bined with self-reported data from surveys and physiolog- time video analytics: The killer app for edge computing. ical data, can provide a starting point for IML systems to computer 50(10):58–67. optimize their suggestions for the overall performance of a Anderson, J. R.; Bothell, D.; Byrne, M. D.; Douglass, S.; human-machine team. Lebiere, C.; and Qin, Y. 2004. An integrated theory of the These three different categories of cognitive feedback mind. Psychological review 111(4):1036–1060. – self reporting, implicit cognitive feedback, and modeled cognitive feedback – delineate the possible ways in which Bachman, P.; Sordoni, A.; and Trischler, A. 2017. Learning IML implementations may be centered around the analyst. algorithms for active learning. In Proceedings of the 34th The categories are summarized in Table 1. International Conference on Machine Learning-Volume 70, 301–310. JMLR. org. Once cognitive feedback has been integrated into IML, more conventional results such as classification accuracy Blasch, E. P.; Breton, R.; Valin, P.; and Bosse, E. 2011. User and overall corrections may be used to evaluate approaches information fusion decision making analysis with the c-ooda against their non-cognitive baseline. However, these re- model. In 14th International Conference on Information Fu- sults may lack true insight into the purpose of the human- sion, 1–8. IEEE. machine team. Measuring the cognitive load on human sub- Borst, J. P., and Anderson, J. R. 2015. Using the ACT-R jects with more objective metrics of productivity would pro- Cognitive Architecture in Combination With fMRI Data. In vide more insight into the effectiveness of IML implementa- Forstmann, B. U., and Wagenmakers, E.-J., eds., An Intro- tions (Alves et al. 2016). Additionally, it is the analyst them- duction to Model-Based Cognitive Neuroscience. New York, selves who must also evaluate the effectiveness of an IML NY: Springer. 339–352. Bottou, L. 1998. Online learning and stochastic approxima- Langford, J.; Li, L.; and Strehl, A. 2007. Vowpal wabbit tions. On-line learning in neural networks 17(9):142. online learning project. Byrne, M. D. 2001. ACT-R/PM and menu selection: apply- Lika, B.; Kolomvatsos, K.; and Hadjiefthymiades, S. 2014. ing a cognitive architecture to HCI. International Journal of Facing the cold start problem in recommender systems. Ex- Human-Computer Studies 55(1):41–84. pert Syst. Appl. 41(4):2065–2073. Dudley, J. J., and Kristensson, P. O. 2018. A review of Lowenstein, O.; Feinberg, R.; and Loewenfeld, I. E. 1963. user interface design for interactive machine learning. ACM Pupillary movements during acute and chronic fatigue: A Trans. Interact. Intell. Syst. 8(2):8:1–8:37. new test for the objective evaluation of tiredness. Investiga- Fails, J. A., and Olsen Jr, D. R. 2003. Interactive machine tive Ophthalmology & Visual Science 2(2):138–157. learning. In Proceedings of the 8th international conference Lu, C.; Shi, J.; and Jia, J. 2013. Online robust dictionary on Intelligent user interfaces, 39–45. ACM. learning. In Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, 415–422. Gillies, M.; Fiebrink, R.; Tanaka, A.; Garcia, J.; Bevilacqua, F.; Heloir, A.; Nunnari, F.; Mackay, W.; Amershi, S.; Lee, Meza Martı́nez, Miguel Angel; Nadj, M., and Maedche, A. B.; et al. 2016. Human-centred machine learning. In Pro- 2019. Towards an integrative theoretical framework of inter- ceedings of the 2016 CHI Conference Extended Abstracts on active machine learning systems. In Proceedings of the 27th Human Factors in Computing Systems, 3558–3565. ACM. European Conference on Information Systems(ECIS). González-Rubio, J., and Casacuberta, F. 2014. Cost- Michael, C. J.; Dennis, S. M.; Maryan, C.; Irving, S.; and sensitive active learning for computer-assisted translation. Palmsten, M. L. 2019. A general framework for human- Pattern Recognition Letters 37:124 – 134. Partially Super- machine digitization of geographic regions from remotely vised Learning for Pattern Recognition. sensed imagery. In Proceedings of the 27th ACM SIGSPA- TIAL International Conference on Advances in Geographic Groce, A.; Kulesza, T.; Zhang, C.; Shamasunder, S.; Burnett, Information Systems, SIGSPATIAL ’19, 259–268. New M.; Wong, W.-K.; Stumpf, S.; Das, S.; Shinsel, A.; Bice, York, NY, USA: ACM. F.; et al. 2013. You are the only possible oracle: Effec- tive test selection for end users of interactive machine learn- Plonsky, O.; Erev, I.; Hazan, T.; and Tennenholtz, M. 2017. ing systems. IEEE Transactions on Software Engineering Psychological Forest: Predicting Human Behavior. In Pro- 40(3):307–323. ceedings of the Thirty-First AAAI Conference on Artificial Intelligence. Gunning, D., and Aha, D. W. 2019. Darpa’s explainable artificial intelligence program. AI Magazine 40(2):44–58. Quionero-Candela, J.; Sugiyama, M.; Schwaighofer, A.; and Lawrence, N. D. 2009. Dataset shift in machine learning. Hahnemann, D., and Beatty, J. 1967. Pupillary responses The MIT Press. in a pitch-discrimination task. Perception & Psychophysics 2(3):101–105. Raghavan, H.; Madani, O.; and Jones, R. 2006. Active learn- ing with feedback on features and instances. Journal of Ma- Hart, S. G., and Staveland, L. E. 1988. Development of nasa- chine Learning Research 7(Aug):1655–1686. tlx (task load index): Results of empirical and theoretical research. In Advances in psychology, volume 52. Elsevier. Robert, S.; Büttner, S.; Röcker, C.; and Holzinger, A. 2016. 139–183. Reasoning under uncertainty: Towards collaborative interac- tive machine learning. In Machine learning for health infor- Hess, E. H., and Polt, J. M. 1964. Pupil size in relation matics. Springer. 357–376. to mental activity during simple problem-solving. Science Rosenfeld, A.; Zuckerman, I.; Azaria, A.; and Kraus, S. 143(3611):1190–1192. 2012. Combining psychological models with machine learn- Hossain, M. D., and Chen, D. 2019. Segmentation for ing to better predict people’s decisions. Synthese 189(1):81– object-based image analysis (obia): A review of algorithms 93. and challenges from remote sensing perspective. ISPRS Talbot, J.; Lee, B.; Kapoor, A.; and Tan, D. S. 2009. En- Journal of Photogrammetry and Remote Sensing 150:115– semblematrix: interactive visualization to support machine 134. learning with multiple classifiers. In Proceedings of the Kabra, M.; Robie, A. A.; Rivera-Alba, M.; Branson, S.; and SIGCHI Conference on Human Factors in Computing Sys- Branson, K. 2013. Jaaba: interactive machine learning for tems, 1283–1292. ACM. automatic annotation of animal behavior. Nature methods Tamborello, F. P., and Byrne, M. D. 2007. Adaptive but non- 10(1):64. optimal visual search behavior with highlighted displays. Kahneman, D. 1973. Attention and effort, volume 1063. Cognitive Systems Research 8(3):182–191. Citeseer. Teso, S., and Kersting, K. 2019. Explanatory inter- Koehn, P. 2009. Statistical machine translation. Cambridge active machine learning. In http://www. aies-conference. University Press. com/accepted-papers/. AAAI. Laird, J. E.; Newell, A.; and Rosenbloom, P. S. 1987. Turchetta, M.; Berkenkamp, F.; and Krause, A. 2019. Safe SOAR: An architecture for general intelligence. Artificial exploration for interactive machine learning. In Advances in Intelligence 33(1):1–64. Neural Information Processing Systems, 2887–2897. Williams, N., and Li, S. 2017. Simulating Human Detection of Phishing Websites: An Investigation into the Applicabil- ity of the ACT-R Cognitive Behaviour Architecture Model. In 2017 3rd IEEE International Conference on Cybernetics (CYBCONF), 1–8. ISSN: null. Wilson, G. F., and Russell, C. A. 2007. Performance en- hancement in an uninhabited air vehicle task using psy- chophysiologically determined adaptive aiding. Human fac- tors 49(6):1005–1018. Wirzberger, M., and Russwinkel, N. 2015. Modeling Inter- ruption and Resumption in a Smartphone Task: An ACT-R Approach. Journal of Interactive Media 14(2):147–154. Wu, T.; Weld, D. S.; and Heer, J. 2019. Local decision pitfalls in interactive machine learning: An investigation into feature selection in sentiment analysis. ACM Transactions on Computer-Human Interaction (TOCHI) 26(4):24. Zhang, R.; Liu, Z.; Guan, L.; Zhang, L.; Hayhoe, M. M.; and Ballard, D. H. 2019. Atari-head: Atari human eye-tracking and demonstration dataset. arXiv preprint arXiv:1903.06754.