=Paper=
{{Paper
|id=Vol-2582/paper7
|storemode=property
|title=Do ML Experts Discuss Explainability for AI Systems?
|pdfUrl=https://ceur-ws.org/Vol-2582/paper7.pdf
|volume=Vol-2582
|authors=Juliana Jansen Ferreira,Mateus de Souza Monteiro
|dblpUrl=https://dblp.org/rec/conf/iui/FerreiraM20
}}
==Do ML Experts Discuss Explainability for AI Systems?==
<pdf width="1500px">https://ceur-ws.org/Vol-2582/paper7.pdf</pdf>
<pre>
            Do ML Experts Discuss Explainability for AI Systems?
                              A discussion case in the industry for a domain-specific solution
                   Juliana Jansen Ferreira                                                     Mateus de Souza Monteiro
                            IBM Research                                                                IBM Research
                        Rio de Janeiro Brazil                                                        Rio de Janeiro Brazil
                        jjansen@br.ibm.com                                                          msmonteiro@ibm.com


ABSTRACT
The application of Artificial Intelligence (AI) tools in different             1 Introduction
domains are becoming mandatory for all companies wishing to                        In the digital transformation era, AI technology is mandatory
excel in their industries. One major challenge for a successful                for companies that want to stand out in their industries. To
application of AI is to combine the machine learning (ML)                      achieve that goal, companies must make the most with domain
expertise with the domain knowledge to have the best results                   data, but also combine it with domain expertise. Machine
applying AI tools. Domain specialists have an understanding of                 Learning (ML) techniques and methods are resourceful while
the data and how it can impact their decisions. ML experts have                dealing with a lot of data. But it needs the human input to add
the ability to use AI-based tools dealing with large amounts of                meaning and purpose to that data. AI technology must empower
data and generating insights for domain experts. But without a                 users [7]. In the first age of AI, the research aimed to get away
deep understanding of the data, ML experts are not able to tune                from studying human behavior and consider the computer as a
their models to get optimal results for a specific domain.                     tool for solving certain classes of problems [19]. But now, the best
Therefore, domain experts are key users for ML tools and the                   results come from the partnership between AI and people where
explainability of those AI tools become an essential feature in that           they are coupled very tightly, and the resulting of this partnership
context. There are a lot of efforts to research AI explainability for          presents new ways for the human brain to think and computers
different contexts, users and goals. In this position paper, we                to process data. The pairing, or the communication, of machines
discuss interesting findings about how ML experts can express                  and people, is the core material for Human-Computer Interaction
concerns about AI explainability while defining features of an ML              (HCI) research. Recently, AI research has been recognizing the
tool to be developed for a specific domain. We analyze data from               HCI view on their advances since the human behavior cannot be
two brainstorm sessions done to discuss the functionalities of an              left out of the context to advance AI research impact on real
ML tool to support geoscientists - domain experts - on analyzing               problems [7][19][29].
seismic data - domain-specific data – with ML resources.                           The explainability dimension of AI, eXplainable AI (XAI), gains
                                                                               even more importance once people are a component for successful
CCS CONCEPTS                                                                   AI application. While researching explainable AI, we observed
Human-centered computing → Empirical studies in HCI                            that different terms are often present in the previous work that
                                                                               sometimes are considered as a synonym of explainable or as a
KEYWORDS                                                                       necessary dimension to enable explainability. Interpretability and
Explainable AI, domain experts, ML experts, machine learning, AI               transparency are constant terms associated with XAI, and they are
development.                                                                   usually related to algorithms or ML models. Although the
                                                                               keywords help us to search for relevant work in XAI, our goal was
ACM Reference format:                                                          to verify if the explanation of AI in the publications has a clear
Juliana Jansen Ferreira and Mateus Monteiro. 2020. Do ML Experts Discuss       goal not just present any explanation.
Explainability for AI Systems? A case in the industry for a domain-specific    AI shows great results dealing with problems that can be cast as
solution. In Proceedings of the IUI workshop on Explainable Smart              classification problems, but they lack the ability to explain their
Systems and Algorithmic Transparency in Emerging Technologies (ExSS-           decisions in a way people can understand [21]. Most AI
ATEC'20). Cagliari, Italy, 7 pages.                                            explainability research focuses on algorithmic explainability or
                                                                               transparency [1][7][30][34], aiming to make the algorithms more
                                                                               comprehensive. But this kind of explanation does not work for all
Copyright © 2020 for this paper by its authors. Use permitted under Creative   people, purpose or context. For those with expertise in ML or
Commons License Attribution 4.0 International (CC BY 4.0).
                                                                               maybe only with computer programming, this approach might be
ExSS-ATEC'20, March 2020, Cagliari, Italy                                                                                 J. J. Ferreira et al.

enough to build explanations, but not for those people without          2 Related Work
that technical expertise, such as domain experts.
                                                                        There are many research efforts regarding explainable artificial
    There is much less XAI research considering usability,
                                                                        intelligence (XAI) in the literature. For this paper, we look over
practical interpretability, and efficacy on real users [12][34]. The
                                                                        previously published work from different venues (e.g., IUI, CHI,
mediation of professionals like designers and HCI practitioners
                                                                        DIS, AAAI, etc.) and databases (e.g., Scopus, Web of Science,
seems even more critical for XAI design [28]. The presence and
                                                                        Google Scholar) to identify which are the people considered on
participation of designers in the early stages of ML models’
                                                                        XAI research. Our research examines two types of people: 1) ML
development presents an interesting approach for XAI. Since
                                                                        experts, which are people capable of building, training and testing
designers are the professionals responsible for building the bridge
                                                                        machine learning models with different datasets from different
between technology and users, they need to understand their
                                                                        domains, and 2) Non-ML-experts, which are people not skilled
working material. In this case, for XAI, ML models are an essential
                                                                        with ML concepts that, in some dimension, use ML tools to
part of this material for design [17]. HCI presents a lot of methods
                                                                        perform tasks on different domains.
and approaches that are flexible enough to deal with different
                                                                            Considering ML experts, there is previous work about
design scenarios. The co-design technique is being applied with
                                                                        supporting the sense-making of the model and data to enable
domain experts [8][32] and also with ML experts or data scientists
                                                                        explainability. These studies are often related to delivering
as users [13][27] to explore explainability functionalities. The
                                                                        explanations through images by showing the relevant pixels (e.g.
explanation challenges are also being tackled in broader aspects
                                                                        [22,24]) or regions (e.g. [24]) of pixels from the classifier result.
that impact the society such as trust (e.g. [[1],[15],[30]]), ethical
                                                                        Other works, such as the presented by Hohman et al. [13], uses a
and legal aspects [16].
                                                                        visual analytics interactive system, named GAMUT, to support
    It is a challenge to combine ML expertise with domain
                                                                        data scientists with model interpretability. Similarly, to Hohman
knowledge to tune ML models for a specific domain. Industries are
                                                                        et al. [13], the authors Di Castro and Bertini [11] explore the use
housing their own AI experts and data scientists [33][35], which
                                                                        of visualization and model interpretability to promote model
is an indicator of the importance of combining AI and domain
                                                                        verification and debugging methods using a visual analytics
expertise. There are a set of new roles that AI technology
                                                                        system. Studies also highlight decision-making before the
generates, and industries need to adapt and hire AI experts to keep
                                                                        developing process. One of the applications is to provide support
their competitive edge. Some of those new roles created by AI are
                                                                        in the process of assertive choosing of the machine learning
related to the ability to explain the AI technology in some matter
                                                                        model. In the work of Wang et al. [27], the authors offer a solution
and considering some dimension [14]. One common characteristic
                                                                        named ATMSeer. Given the dataset, the solution automatically
of all explanation skills is the contextualization of the AI
                                                                        tries different models and allows users to observe and analyze
technology in the business, relate it to the domain. For that, the
                                                                        these models through interactive visualization. Lastly, concerning
domain knowledge is the differentiator factor to make general AI
                                                                        ML experts, but with no visualization, Nguyen, Lease, and Wallace
solutions tuned for a business needs in the industry.
                                                                        [4] present an approach to provide explanations regarding of
    Our research context is in the oil & gas industry. An essential
                                                                        annotator mistakes in Mechanical Turkey Tasks.
part of this industry decision-making process relies on experts’
                                                                            Concerning non-ML-experts, Kizilcec [30] presented a study
prior knowledge and experiences from previous cases and
                                                                        on a MOOC platform. The authors show research on how
projects. The seismic data is an important data source that experts
                                                                        transparency affects trust in a learning system. According to the
interpret by searching for visual indicators of relevant geological
                                                                        authors [30], individuals whose expectations (on the grade) were
characteristics in the seismic. It is a very time-consuming process.
                                                                        met, did not vary the trust by changing the "amount" of
The application of ML on seismic data aims to augment experts’
                                                                        transparency. Besides, individuals whose expectations were
seismic interpretation abilities by processing large amounts of
                                                                        violated, trusted the system less, unless the grading algorithm was
data and adding meaning to visual features in seismic. The ML
                                                                        transparent. Another context-aware example is the work of
tool, in our case, aims to be a sandbox of ML models that can
                                                                        Smith-Renner, Rua, and Colony [2]. The authors present an
handle seismic data in different ways for different tasks to enable
                                                                        explainable threat detection tool. Another work that supports
seismic interpretation experts to have meaningful insights during
                                                                        decisions in high-risk, complex operating environments, such as
their work.
                                                                        the military, is the work from Clewley et al. [25]. In this context,
    In this position paper, we discuss some findings about how ML
                                                                        such use improves the performance of trainees entering high-risk
experts can express their concerns about AI explainability while
                                                                        operations [25].
developing an ML tool for supporting the seismic interpretation.
                                                                            Paudyal et al. [26], on the other hand, present a work in the
We had the opportunity to observe and collect data from two
                                                                        context of Computer-Aided Language Learning, in which the
brainstorm sessions where ML developers and ML researchers,
                                                                        explanation is used to provide feedback on location, movement,
some with domain knowledge, discussed features of an ML tool.
                                                                        and hand-shape to learners of American Sign Language. Lastly,
Although the explainability was not an explicit discussion topic,
                                                                        Escalante et al. [16] explanations happen in the area of human
the concerns about that dimension could be identified in portions
                                                                        resources, in which routinely decisions are made by human
of the participants' discourse throughout the sessions.
                                                                        resource departments to evaluate candidates. In ML, this task
                                                                        demands an explanation of the models as a means of identifying
                                                                        and understanding how they relate to decisions suggested and
gain insight into undesirable bias [16]. The authors [16] address             One work that used co-design for a solution to experts it is the
this scenario by proposing a competition to reduce bias in this ML        work of Wang et al. [27]. In their work, ML experts participated
task.                                                                     in the process of elucidating about how they choose machine
    Works that presents the explanation for non-experts with no           learning models and what opportunities exist to improve the
context are not unusual. For example, Cheng et al. [15] present a         experience. Another expert-centered work is presented in
visual analytics system to improve users' trust and comprehension         Hohman et al. [13]. Through an interactive design process with
of the model. In another non-context work is from Rotsidis,               both machine learning researchers and practitioners, the authors
Theodorou, and Wortham [1], in which the authors show                     emerged a list of capabilities that an explainable machine learning
explainability for human-robots interaction. By showing in                interface should support for Data Scientists.
through virtual reality in real-time, the decision process of the             Finally, Barria-Pineda and Brusilovsky [21] presented the
robot is exposed to the user in a debugging functionality. The            explainability design of a recommender system in an educational
majority of the ML techniques and tools presented in the literature       scenario. After releasing the system for testing, the authors found
are designed to support expert users like data scientists and ML          that transparency seemed to influence the probability of the
practitioners [27] and how visualization has been used widely to          student in opening and to attempt the lesson. Other motivations
explain and visualize algorithms and models (e.g. [13,22,24,27]).         for explainability in the learning context can also be the learning
    However, the work of Kizilcec [30] shows the complexity in            itself (see [26]). Furthermore, the motivation tells a lot about the
providing explanations or making the algorithm more                       awareness of the work within the user and the context. Studies
transparent, especially to non-experts. This fact highlights that the     that had a perceived context-awareness presented a specific
transparency/explainability of models is not static. Instead, it          motivation for explaining, that is, choosing appropriate models
requires a deep understanding of the end-user and the context             before developing [27], improving workers' production [3],
[32]. Besides, the intelligent system's acceptance and effectiveness      debugging models [1], training for a military novice [25], among
depend on its ability to support decisions and actions interpretable      others. Other non-context researches motivated the explanations
by its users and those affected by them [23]. Recent evidence [32]        into generic aspects such as trust (e.g. [[1],[15],[30]]), ethical and
shows that misleading explanation has, consequently, promoted             legal aspects [16]. Chromik et al. [23], for example, affirm that
conflicting in reasoning. An explanation design should, therefore,        companies that motivate only through legal compliance will most
offer the cognitive value to the user and communicate the nature          likely not result in meaningful explanation for users. Legal
of an explanation relevant to their context [[17],[32]].                  compliance acknowledges user rights, but it is not enough for
    Browne [17] presents a reinforcement alternative concerning           users nor our HCI research community [23].
designing explainability. The author argues that the designers
should not only understand the end-user and the context but
preferably also participate in the early conceptualization of the         3 Our ML tool case
ML model. According to Browne [17], with the early participation,             This paper research was designed from the opportunity to
the designers benefit from understanding the models more                  observe and hear discussions of a project development team
sincerely and allow them to develop early prototyping of ML               regarding the features for building an ML tool. We observed and
experiences, i.e., more controllability, testing of the model, and        collected data from 2 brainstorm sessions where ML developers,
successful explanation strategies.                                        ML researchers, and other stakeholders of the ML tool discusses
    Towards a user-centered explanation, co-designing the                 features for that tool. The discussion did not have any orientation
explainable interface appears to be a possible approach to both           to aspects of XAI or any particular feature. They were proposed
expert and non-expert end-users. For example, Wang et al. [9]             by the people involved in the project to get a better understanding
developed a framework using a theory-driven approach. The                 of the ML tool’s features.
explanations were focused on physicians with previous                         The ML tool project is developed in an industry R&D
knowledge in a decision support system. Similarly, in the same            laboratory and is already being used by oil &gas companies in
context of Healthcare, Kwon, et al. [8] co-designed a visual              research projects. We believe it is essential for the research to
analytics system.                                                         explain our settings. There was a previous study with some of the
    Stumpf [32], on the other hand, used co-design to a more              participants in the same laboratory where they were invited to
abroad intelligent system, a Smart Heating system. In their               reflect and discuss on some ML development challenges, such as
discovery [32], end-users voted for more explanation through              XAI [28]. One of the authors of this paper participated in this
more straightforward and textual explanations. Accordingly,               previous study as an HCI researcher and saw the discussions of
Wang et al. [9] affirm that some explanation structures in specific       the ML tool an opportunity to reflect and discuss ML challenges
contexts can be communicated with simpler structures, such as             in a real project context. Therefore, she participated in the session
textual explanations or even single lists. On the other hand, some        as an observer without any intervention or mediation and
well-structured and complex contexts ask for more elaborate               collected the data used to discuss in this paper.
explanations techniques (e.g. [8]), i.e., intelligibility queries about
the system state (e.g. [21]) or even inference mechanisms (see [8])       3.1 Research Domain Context
[9]. Other techniques include XAI elements, such as the feature
that had a positive or negative influence on an outcome [9].
ExSS-ATEC'20, March 2020, Cagliari, Italy                                                                                 J. J. Ferreira et al.

    The ML tool of our case aims to aid seismic interpretation,          facilitator (1) that facilitate the brainstorm session without
which is a central process in the oil and gas exploration industry.      influencing on the discussion content.
This practice main goal is supporting decision-making processes              As aforementioned, for this research, we have four (4)
by reducing uncertainty. To achieve that goal, different people          participants that already collaborated in a previous study [28].
work alone and engage in multiple informal interactions and              Three (3) of them have more than seven years of experience with
formal collaboration sessions, embedding biases, decisions, and          ML development and research, and they have been working in the
reputation. Seismic interpretation is the process of inferring the       oil & gas industry for more than one year (1 of them for more than
geology of a region at some depth from the processed seismic data        four years). Those participants have been working with the
survey 1 . Figure 1 shows an example of seismic data lines (or           domain data in question (seismic data) for a while and have been
slices), which is a portion of seismic data, with an interpretation      exploring different aspects of it with ML technology
about visual indicators of a possible geological structure called salt   [5][6][10][31]. The other participants are also experienced ML
diapir.                                                                  developers or experts having at least three years of experience in
                                                                         the industry, plus academic experience.

                                                                         3.3 Brainstorm sessions
                                                                         The data we collected for this paper analysis was produced during
                                                                         two brainstorm sessions for the development of a domain-specific
                                                                         ML tool. With participants' consent with the data collection before
                                                                         the sessions, and they were aware that it was going to be used for
                                                                         research publication.
                                                                             The ML tool under development is an asset from a larger
                                                                         project with industry clients; therefore, its development aims to
Figure 1: Seismic image example (Netherland – Central                    support real domain practices. The brainstorm sessions were
Graben – inline 474)                                                     organized by the ML tool’s development team from the laboratory.
                                                                         It was not scheduled to produce data for our study in particular
    In the same industry R&D laboratory, ML experts and                  but is presented as an enriching opportunity to investigate if and
researchers are exploring the possibilities of combining ML for          how ML Experts discuss AI explainability in while they are
exploring seismic data. It is important to say that seismic data are     building an ML tool.
mainly examined visually. It commonly has other data to compose
the seismic interpretation, but the domain expert analyzes,
interprets the seismic imagens to identify significant geological
characteristics. Therefore, there is research focusing on image
analysis aspects rather than geophysical or geological discussions.
[5][6]. Plus, there is research on exploring additional texture
features that are prominent in other domains but have not
received attention in the seismic domain yet. Namely, they
investigated the ability of Gabor Filters and LBP (Local Binary
Patterns) – this last, widely used for face recognition – to retrieve
similar regions of seismic data [6]. Still exploring the visual
aspects of seismic data, there is research on generating synthetic
seismic data from sketches [31] and on using ML to improve the
seismic image resolution [10].

3.2 About the ML professionals                                           Figure 2. Brainstorm Sessions plan
In total, there were eleven (11) ML professionals as participants           There were two brainstorm sessions organized to discuss the
on the ML tool discussions: ML Developers (7) that were involved         ML tool’s features. The facilitator organized activities to support
in the ML tools’ discussions and where directly involved in its          individual inputs and collaborative discussions (Figure 2).
development. ML Researchers (2) that were involved in the                Between the sessions, there was a voting activity to prioritize the
discussion about the ML tool, but not directly involved in the           discussion for the second session. The sessions were performed in
development, Domain Expert (1) that is a member of the technical         an online collaboration tool2. The content of the collaboration tool
team (not expert from the industry), but with deep understanding         was discarded as study data because one participant modified it
of the domain data and domain practice with that data, and a             without the facilitator orientation. Therefore, this study data was
                                                                         the videos of the session. Some of the participants were not

1 https://www.britannica.com/science/seismic-survey                      2 https://mural.co/
physically present, participating through a videoconferencing          we selected a few of those quotes to discuss the concerns ML
system and the online tool.                                            developers are expressing about AI explainability while thinking
                                                                       about features for an ML tool.
                                                                           The discussion for the ML tool was sometimes conflicting
4. Data Analysis                                                       about who was the user (or user) for that ML tool. In the quote
As aforementioned, we used the sessions’ videos as our study data.     below, one participant was considering two users: an ML expert
We transcribed the audios from both videos (session 1: 2h and          and a data scientist. In his discourse, it is aligned with previous
session 2: 1.5h, respectively) and tagged the quotes of every          research about ML models’ interpretability [11][13] and
participant of the sessions. We wanted to identify the XAI aspects     understanding the data that ML models handle [22,24]. The
of the discourse and relate it to the participant who brought it to    visualization of trained model and the visualization of the data
the discussion. We considered the data from both sessions as one       with its metrics could be a way to explain an XAI scenario for ML
dataset because we wanted to analyze the discourse of                  experts and data scientists. This kind of feature could be a pointer
participants throughout the discussion about the ML tool’s             to further discussions on XAI:
features.                                                                   […] a visualization, feature "I'm a machine learning guy
    For the data analysis, we used a qualitative approach since we          and I want to see the trained model"; "I'm the data guy
are still framing concerns about XAI on ML tools’ development.              and I want to see the data […] I want to correctly
Our goal was to identify the critical ideas that repeatedly arise           visualize the data […] how is this data spatially
during the ML professionals’ discussion of an ML tool’s features.           distributed […] visualize the metrics. […].
We used the discourse analysis method that considers the written
                                                                           In the next quote, a participant comment on a new trend in oil
or spoken language concerning its social context [18] (pp. 221,
                                                                       & gas companies of training geoscientists on machine learning.
[20]). We did start by doing some content analysis (pp. 301, [20])
                                                                       This trend aims to combine the ML tools potential to handle a lot
to verify the frequency of terms, cooccurrences, and other
                                                                       of data and the domain expert tacit knowledge and experience to
structural markers. But since the topic of the discussion was
                                                                       tune the pair model-data to have the best results with ML. Not
broader – ML tool’s features – this did not provide relevant
                                                                       only quantitative results (best ML model accuracy) but qualitative
findings. Therefore, we changed to discourse analysis, which goes
                                                                       results when that domain expert with ML learning knowledge can
beyond looking at discussions of words and contents to examine
                                                                       make the best of model-data by understanding the meaning of the
the structure of the conversation, in search of cues that might
                                                                       results. There are new roles of “explainers” in AI [34] that will
provide further understanding (pp. 221, [20]).
                                                                       make the technology fit the domain in which it is applied. By
                                                                       having the understanding model and domain data, they are
4 Discussion about AI Explainability                                   equipped to define the necessary explanations in a domain:

    We started our data analysis trying to tag the participants'            […] what happens in these companies now is that they
quotes with the codes “aid-XAI” or “harm-XAI” (aid or harm                  are hiring geophysicists and giving a machine learning
eXplainable AI). Then, we notice that any categorization of the             course, and I also think the same guy may be acting
                                                                            depending on the role he's playing at that time […].
data we had was not possible without further feedback from the
person who said the quote. Therefore, we decide to tag the quotes         The understanding of the algorithms and the ML workflows
that had in the discourse features or concerns related to AI           has been the focus of most XAI research [1][7][31][35]. The trails
explainability. We selected a total of 25 quotes from                  on what data goes into which model and which was the output
approximately 3.5h of audio transcriptions. Considering that the       result can support the decision about how to fit the model and
brainstorm session had a broader goal of discussing the ML tool’s      data were for a particular case. In the next quote, a participant
features, we believe those quotes point to an exciting direction for   places a concern about the timeline and resolution of the seismic
our research to investigate “Do ML Experts Discuss Explainability      data. Those are parameters of the seismic data that could help the
for AI Systems?”. The discussion did not have any intervention or      building a better ML tool. A comparison feature could be
bias towards explainability concerns, which allow us to see if and     considered a way to explain what is available, what was in fact,
how AI explainability would be included in their development           used by the ML tool and why:
discussion.
                                                                             […] you have to imagine that you have seismic data
    From the 25 quotes, 13 were from those three ML professionals
                                                                            from 20 years ago, as usual, and you have a new seismic
that have more experience with ML development and also                      data that has a different resolution […] For you to be
experience working with the domain data (seismic data). We                  able to compare things, you need to have a grid there
learned that professionals that have ML+Domain knowledge                    and start comparing things. All the information that
combined might be more capable of having an overall vision of               goes in there needs to be useful […]
how the AI system will impact the domain and its experts. The
                                                                          The participants were mostly ML developers; therefore, they
quotes indicate concerns about XAI without any mention of the
                                                                       are used to handle ML models and data like one type of user
specific topic. The theme was of genuine concern from those
                                                                       considered for the ML tool under development. The quote above
professionals, and it was present in their discourse while
                                                                       shows a participant finding a solution to their users the same as
developing an AI system for geoscientists. In this position paper,
ExSS-ATEC'20, March 2020, Cagliari, Italy                                                                                         J. J. Ferreira et al.

him, as the user thinks as a good solution. This seems an              how it would affect the discussion about its features. Using design
interesting approach: to use existing tools that somehow explain       techniques, such as co-design [13][27], to explore those scenarios
the ML results and see if it works for other users. Combining this     with ML professionals as users could open different discussions
initial input with co-designing approaches [13][27], the               topics. Maybe concerns about explainability would appear more
investigation of what works as an explanation for every user           once developers are in users’ place.
could present promising research results:                                  In a previous study in the same R&D lab, mediation challenges
                                                                       were identified for the development of deep learning model [28].
     […] something like Jupyter does. You have a report that
                                                                       One exciting aspect of that earlier study was that once the ML
     says, "For this data here I had this result," the views and
     the guy can follow more or less […]                               professional considered his ML solution in a real context, new
                                                                       concerns about the impact on people and explanations were
                                                                       identified. In this study, the ML professionals have a real context
5 Final Remarks and future work                                        where their ML tool will be applied, but we believe they are still
                                                                       very distant from the consequences the ML tool might have on the
In this position paper, we aim to use the data collected from a real
                                                                       user decision-making. The study reported in [28], the context and
ML tool’s development project brainstorm to discuss if and how
                                                                       its impacts were easier to relate (ML to support hand-written
ML experts express concerns about AI explainability while
                                                                       voting process using MNIST dataset). For the oil & gas domain,
defining features of an ML tool to be developed. It was not a
                                                                       for example, the effect of a wrong decision cannot be so easily
controlled study with users. We analyze data from two brainstorm
                                                                       foreseen. This could be an approach for investigating the
sessions done to discuss the functionalities of an ML tool to
                                                                       mediation challenges [28].
support geoscientists - domain experts - on analyzing seismic data
                                                                           Explanations are social, and they are a transfer of knowledge,
- domain-specific data – with ML resources. It was serendipity
                                                                       presented as part of a conversation or interaction, and are thus
that one of the authors got aware of the discussion and that all
                                                                       shown relative to the explainer’s (explanation producer) beliefs
participants agree that she could be present and collect the data
                                                                       about the ‘explainee’s’ (explanation consumer) beliefs. [34]. XAI
for this research.
                                                                       needs social mediation from technology builders to technology
    The data collected was tough to transcript because the
                                                                       users and their practice [28]. We believe the explanation cannot
brainstorm sessions were used to structure all the participants
                                                                       be generic. The design of a “good” explanation needs to take into
understanding the ML tool, user, and features. Therefore,
                                                                       account: who is receiving the explanation, what for and in which
sometimes participants did not make complete sentences, or the
                                                                       context the explanation was requested.
sentences were incomprehensible. As mentioned in the Data
                                                                           This initial study opened paths to many exciting kinds of
Analysis session of this paper, we started the data analysis with
                                                                       research, not only associated with XAI. For the XAI research, as
content analysis [20] but changed to discourse analysis [18] to
                                                                       future work, we intend to investigate AI explanations considering
analyze the data. But while analyzing word frequency, we
                                                                       those three dimensions (who + why + context). The investigation
generate the word cloud presented in Figure 3. The most frequent
                                                                       of XAI considering those dimensions shows promising paths for
word was “you” which was used by participants to present their
                                                                       designing AI systems considering different scenarios. Industries
ideas.
                                                                       are training their domain experts on ML tools, but what about
                                                                       capacitate ML experts on data and domain practice before building
                                                                       ML solutions? It might enable the ML expert to design the solution
                                                                       aware of how it will impact the domain and the people involved.
                                                                           Other promising research path is to address the XAI topic
                                                                       explicitly with ML professionals as part of the design material for
                                                                       developing AI systems. The mediation challenges identified by
                                                                       Brandão et. al [28] are an initial pointer for that XAI discussion
                                                                       with ML professionals . As our first study, we plan to go back to
                                                                       the same group participants and discuss AI explainability to verify
                                                                       what kind of feature and concerns are raised once we point to the
                                                                       specific topic.

                                                                       REFERENCES
                                                                       [1]   Alexandros Rotsidis, Andreas Theodorou, Robert H. Wortham. 2019. Robots
Figure 3. Word cloud from transcripts                                        That Make Sense: Transparent Intelligence Through Augmented Reality.
                                                                             In Intelligent User Interfaces for Algorithmic Transparency in Emerging
                                                                             Technologies - IUIATEC (2019).
    Considering that ML professionals were one of the potential        [2]   Alison Smith-Renner, Rob Rua, and Mike Colony. 2019. Towards an
                                                                             Explainable Threat Detection Tool. Workshop on Explainable Smart Systems –
users for the ML tool, it is interesting that ML developers did not          ExSS.
use the first person in their phrases, but the third person “ you”.    [3]   Alison Smith-Renner, Rob Rua, Mike Colony. 2019. Towards an Explainable
An investigation path was to check with those ML professionals               Threat Detection Tool. Workshop on Explainable Smart Systems – ExSS (2019).
                                                                       [4]   An T. Nguyen, Matthew Lease, and Byron C. Wallace. 2019. Explainable
if they thought of themselves as a possible user to the ML tool and          modeling of annotations in crowdsourcing. In Proceedings of the 24th
       International Conference on Intelligent User Interfaces (IUI '19). ACM, New York,     [26] Prajwal Paudyal, Junghyo Lee, Azamat Kamzin, Mohamad Soudki, Ayan
       NY, USA, 575-579. DOI: https://doi.org/10.1145/3301275.3302276                             Banerjee, Sandeep Gupta. 2019. Learn2Sign: Explainable AI for Sign Language
[5]    Andrea Britto Mattos, Rodrigo S Ferreira, Reinaldo M Da Gama e Silva, Mateus               Learning. Workshop on Explainable Smart Systems – ExSS (2019).
       Riva, and Emilio Vital Brazil. 2017. Assessing texture descriptors for seismic        [27] Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J.
       image retrieval. 2017 30th SIBGRAPI Conference on Graphics, Patterns and                   Smith, Kalyan Veeramachaneni, and Huamin Qu. 2019. ATMSeer: Increasing
       Images (SIBGRAPI), IEEE, 292–299.                                                          Transparency and Controllability in Automated Machine Learning. In
[6]    Andrea Britto Mattos, Rodrigo S. Ferreira, Reinaldo M. Da Gama e Silva,                    Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems
       Mateus Riva, and Emilio Vital Brazil. 2017. Assessing Texture Descriptors for              (CHI '19). ACM, New York, NY, USA, Paper 681, 12 pages. DOI:
       Seismic Image Retrieval. 2017 30th SIBGRAPI Conference on Graphics, Patterns               https://doi.org/10.1145/3290605.3300911
       and Images (SIBGRAPI), IEEE, 292–299.                                                 [28] Rafael Brandão, Joel Carbonera, Clarisse de Souza, Juliana Ferreira, Bernardo
[7]    Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y. Lim, and Mohan                          Gonçalves, and Carla Leitão. 2019. Mediation Challenges and Socio-Technical
       Kankanhalli. 2018. Trends and Trajectories for Explainable, Accountable and                Gaps for Explainable Deep Learning Applications. arXiv:1907.07178 [cs].
       Intelligible Systems: An HCI Research Agenda. In Proceedings of the 2018 CHI               Retrieved December 2, 2019 from http://arxiv.org/abs/1907.07178
       Conference on Human Factors in Computing Systems (CHI ’18), 582:1–582:18.             [29] Randy Goebel, Ajay Chander, Katharina Holzinger, Freddy Lecue, Zeynep
       https://doi.org/10.1145/3173574.3174156                                                    Akata, Simone Stumpf, Peter Kieseberg, and Andreas Holzinger. 2018.
[8]    Bum Chul Kwon, Min-Je Choi, Joanne Taery Kim, Edward Choi, Young Bin                       Explainable AI: The New 42? In Machine Learning and Knowledge Extraction,
       Kim, and Soonwook Kwon. 2018. RetainVis: Visual Analytics with                             Andreas Holzinger, Peter Kieseberg, A Min Tjoa, and Edgar Weippl (eds.).
       Interpretable and Interactive Recurrent Neural Networks on Electronic                      Springer         International         Publishing,         Cham,         295–303.
       Medical Records. IEEE Transactions on Visualization and Computer Graphics 25,              https://doi.org/10.1007/978-3-319-99740-7_21
       1: 299–309.                                                                           [30] René F. Kizilcec. 2016. How Much Information?: Effects of Transparency on
[9]    Danding Wang, Qian Yang, Ashraf Abdul, and Brian Y. Lim. 2019. Designing                   Trust in an Algorithmic Interface. In Proceedings of the 2016 CHI Conference on
       Theory-Driven User-Centric Explainable AI. Proceedings of the 2019 CHI                     Human Factors in Computing Systems (CHI '16). ACM, New York, NY, USA,
       Conference on Human Factors in Computing Systems, ACM, 601:1–601:15.                       2390-2395. DOI: https://doi.org/10.1145/2858036.2858402
[10]   Dario A. B. Oliveira, Rodrigo S. Ferreira, Reinaldo Silva, and Emilio Vital Brazil.   [31] Rodrigo S Ferreira, Julia Noce, Dario AB Oliveira, and Emilio Vital Brazil. 2019.
       2019. Improving Seismic Data Resolution With Deep Generative Networks.                     Generating Sketch-Based Synthetic Seismic Images With Generative
       IEEE Geoscience and Remote Sensing Letters 16, 12: 1929–1933.                              Adversarial Networks. IEEE Geoscience and Remote Sensing Letters.
[11]   Federica Di Castro, Enrico Bertini. 2019. Surrogate decision tree visualization       [32] Simone Stumpf. 2019. Horses For Courses: Making The Case For Persuasive
       interpreting and visualizing black-box classification models with surrogate                Engagement In Smart Systems. Workshop on Explainable Smart Systems –
       decision tree. Workshop on Explainable Smart Systems – ExSS (2019).                        ExSS (2019).
[12]   Finale Doshi-Velez and Been Kim. 2017. Towards A Rigorous Science of                  [33] Thomas H Davenport and DJ Patil. 2012. Data scientist. Harvard business
       Interpretable Machine Learning. arXiv:1702.08608 [cs, stat]. Retrieved                     review 90, 5: 70–76.
       December 18, 2019, from http://arxiv.org/abs/1702.08608                               [34] Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social
[13]   Fred Hohman, Andrew Head, Rich Caruana, Robert DeLine, and Steven M.                       sciences.           Artificial          Intelligence           267:         1–38.
       Drucker. 2019. Gamut: A Design Probe to Understand How Data Scientists                     https://doi.org/10.1016/j.artint.2018.07.007
       Understand Machine Learning Models. In Proceedings of the 2019 CHI                    [35] Yogesh K. Dwivedi, Laurie Hughes, Elvira Ismagilova, et al. 2019. Artificial
       Conference on Human Factors in Computing Systems (CHI '19). ACM, New York,                 Intelligence (AI): Multidisciplinary perspectives on emerging challenges,
       NY, USA, Paper 579, 13 pages. DOI: https://doi.org/10.1145/3290605.3300809                 opportunities, and agenda for research, practice and policy. International
[14]   H James Wilson, Paul Daugherty, and Nicola Bianzino. 2017. The jobs that                   Journal of Information Management: S026840121930917X.
       artificial intelligence will create. MIT Sloan Management Review 58, 4: 14.
[15]   Hao-Fei Cheng, Ruotong Wang, Zheng Zhang, Fiona O'Connell, Terrance
       Gray, F. Maxwell Harper, and Haiyi Zhu. 2019. Explaining Decision-Making
       Algorithms through UI: Strategies to Help Non-Expert Stakeholders. In
       Proceedings of the 2019 CHI Conference on Human Factors in Computing
       Systems (CHI '19). ACM, New York, NY, USA, Paper 559, 12 pages. DOI:
       https://doi.org/10.1145/3290605.3300789
[16]   Hugo Jair Escalante, Isabelle Guyon, Sergio Escalera, Julio Jacques, Meysam
       Madadi, Xavier Baró, Stephane Ayache, Evelyne Viegas, Yağmur Güçlütürk,
       Umut Güçlü, Marcel A. J. van Gerven, Rob van Lier. 2017. Design of an
       explainable machine learning challenge for video interviews. In Proceedings
       of the 2017 International Joint Conference on Neural Networks (IJCNN’17),
       Anchorage, AK, 3688-3695. DOI: https://doi.org/10.1109/IJCNN.2017.7966320
[17]   Jacob T. Browne. 2019. Wizard of Oz Prototyping for Machine Learning
       Experiences. Extended Abstracts of the 2019 CHI Conference on Human Factors
       in Computing Systems, ACM, LBW2621:1–LBW2621:6.
[18]   James Paul Gee. 2004. An introduction to discourse analysis: Theory and
       method. Routledge.
[19]   Jonathan Grudin. 2009. AI and HCI: Two fields divided by a common focus. AI
       Magazine 30, 4: 48–48.
[20]   Jonathan Lazar, Jinjuan Heidi Feng, and Harry Hochheiser. 2017. Research
       methods in human-computer interaction. Morgan Kaufmann.
[21]   Jordan Barria-Pineda and Peter Brusilovsky. 2019. Making Educational
       Recommendations Transparent through a Fine-Grained Open Learner Model.
       IUI Workshops.
[22]   Mandana Hamidi Haines, Zhongang Qi, Alan Fern, Fuxin Li, Prasad Tadepalli.
       2019. Interactive Naming for Explaining Deep Neural Networks: A Formative
       Study. Workshop on Explainable Smart Systems – ExSS (2019).
[23]   Michael Chromik, Malin Eiband, Sarah Theres Völkel, and Daniel Buschek.
       2019. Dark Patterns of Explainability, Transparency, and User Control for
       Intelligent Systems. Intelligent User Interfaces for Algorithmic Transparency
       in Emerging Technologies - IUIATEC (2019).
[24]   Mukund Sundararajan, Jinhua Xu, Ankur Taly, Rory Sayres, Amir Najmi. 2019.
       Exploring Principled Visualizations for Deep Network Attributions. Workshop
       on Explainable Smart Systems – ExSS (2019).
[25]   Natalie Clewley, Lorraine Dodd, Victoria Smy, Annamaria Witheridge, and
       Panos Louvieris. 2019. Eliciting Expert Knowledge to Inform Training Design.
       In Proceedings of the 31st European Conference on Cognitive Ergonomics (ECCE
       2019), Maurice Mulvenna and Raymond Bond (Eds.). ACM, New York, NY,
       USA, 138-143. DOI: https://doi.org/10.1145/3335082.3335091

</pre>