<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>X-RAI: A Framework for the Transparent, Responsible, and Accurate Use of Machine Learning in the Public Sector</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Per Rådberg Nagbøl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oliver Müller</string-name>
          <email>oliver.mueller@uni-paderborn.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IT University of Copenhagen</institution>
          ,
          <country country="DK">Denmark</country>
        </aff>
      </contrib-group>
      <fpage>259</fpage>
      <lpage>267</lpage>
      <abstract>
        <p>This paper reports on an Action Design Research project taking place in the Danish Business Authority focusing on quality assurance and evaluation of machine learning models in production. The design artifact is a Framework (X-RAI) which stands for Transparency (X-Ray), Responsible(R), and explainable (X-AI). X-RAI consist of four sub-frameworks: the Model Impact and Clarification Framework, Evaluation Plan Framework, Evaluation Support Framework, and Retraining Execution Framework for machine learning that builds upon the theory of interpretable AI and practical experiences tested on nine different machine learning models used by the Danish Business Authority.</p>
      </abstract>
      <kwd-group>
        <kwd>Machine Learning Evaluation</kwd>
        <kwd>Government</kwd>
        <kwd>Interpretability</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Recent years have seen breakthroughs in the field of AI, both in terms of basic research and
development as well as in applying AI to real-world tasks. The AI Index 2019 Annual Report of the
Stanford Institute for Human-Centered Artificial Intelligence
        <xref ref-type="bibr" rid="ref1">(Perraul et al., 2019)</xref>
        , which
summarizes the technical progress in specialized tasks across computer vision and natural language
processing, attests that AI is now on par or has even exceeded human performance in tasks such as
object classification, speech recognition, translation, and textual and visual question answering.
However, augmenting and automating tasks previously performed by humans can also lead to
serious problems. Research studies and real-world incidents have shown that AI systems or better
the machine learning models they are based on- can err, encode societal biases, and discriminate
against minorities. These issues are amplified by the fact that many modern machine learning
algorithms are complex black boxes whose behavior and predictions are almost impossible to
comprehend, even for experts. Hence, more and more researchers and politicians are calling for legal
and ethical frameworks for designing and auditing these systems
        <xref ref-type="bibr" rid="ref2">(Guszcza et al. 2018)</xref>
        . Against this
background, the government of Denmark released a national strategy for AI in 2019. The strategy
covers a broad array of initiatives related to AI in the private and public sectors, including an
initiative concerning the transparent application of AI in the public sector. As part of this initiative,
common guidelines and methods will be created to enforce the legislation's requirements for
transparency. As one of the first steps, the government launched a pilot project to develop and test
methods for ensuring a responsible and transparent use of AI for supporting decision making
processes
        <xref ref-type="bibr" rid="ref3">(Regeringen, 2019)</xref>
        . The pilot project takes place at the Danish Business Authority (DBA)
in collaboration with the Danish Agency of Digitization. In this paper, we report on the first results
of an Action Design Research (ADR) project accompanying the pilot project. The overall ADR project
is driven by the following research question: How do we ensure that machine learning (ML) models
meet and maintain quality standards regarding interpretability and responsibility in a governmental
setting? To answer this question, the project draws on literature and theory on interpretability of
machine learning models and practical testing on machine learning models in the DBA.
2. Explainable AI Through Interpretable Machine Learning Models
Modern machine learning algorithms, especially deep neural networks, possess remarkable
predictive power. However, they also have their limitations and drawbacks. One of the most
significant challenges is their lack of transparency. Complex neural networks are opaque functions
often containing tens of millions of parameters that jointly define how input data (e.g., a picture of
a person) is mapped into output data (e.g., the predicted gender or age of the person in the picture).
Hence, it is virtually impossible for end users, and even technical experts, to comprehend the general
logic of these models and explain how they make specific predictions. As long as one is only
interested in the predictions of a black box model and these predictions are correct, this lack of
transparency is not necessarily a problem. Broadly speaking, there are two alternative approaches
to open up the black box of modern machine learning models
        <xref ref-type="bibr" rid="ref10 ref4 ref6">(in the following see Lipton, 2018,
Molnar, 2019, Du et al., 2020)</xref>
        . First, instead of using black box deep learning models, one can use
less complex but transparent models, like rule-based systems or statistical learning models (e.g.
linear regression, decision trees). These systems are intrinsically interpretable, but the
interpretability often comes at the cost of sacrificing some predictive accuracy. The transparency of
these systems works on three levels: Simulatability concerns the entirety of the model and requires
models to be rather simple and ideally human computable. Decomposability addresses
interpretability of the components of the model, such as, inputs, parameters, and calculations.
      </p>
      <p>
        Consequently, decomposability requires interpretable model inputs and disallows highly
engineered or anonymous features. Algorithmic transparency concerns the training/learning
algorithm. A linear model's behavior on unseen data is provable, which is not the case with deep
learning methods with unclear inner workings. Second, instead of using transparent and inherently
interpretable models, one can develop a second model that tries to provide explanations for an
existing black box model. This strategy tries to combine the predictive accuracy of modern machine
learning algorithms with the interpretability of statistical models. These so-called post-hoc
examinability techniques can be further divided into techniques for local and global explanations.
Local explanations are explanations for particular predictions, while global explanations are
explanations that provide a global understanding of the input-output relationships learned by the
trained model. In other words, a local explanation would explain why a concrete person on a picture
has been predicted to be female, while global explanations would explain what general visual
features differentiate females from other genders. Different types of post-hoc explanations exist. Text
explanations use an approach similar to how humans explain choices by having a model generating
explanations as a supplement to a model delivering predictions. Visualizations generate
explanations from a learned model through a qualitative assessment of the visualization.
Explanations by example let the model provide examples showing the decisions the model predicts
to be most similar
        <xref ref-type="bibr" rid="ref9">(Lipton, 2016)</xref>
        . Local Explanations for particular predictions
        <xref ref-type="bibr" rid="ref5">(Doshi-Velez &amp; Kim,
2017)</xref>
        such as Local Interpretable Model-agnostic Explanations (LIME) (Ribeiro et al., 2016) and
SHAP for explaining feature importance
        <xref ref-type="bibr" rid="ref11">(Lundberg, S &amp; Lee, S, 2017)</xref>
        . Focusing on the local
dependence of a model helpful when working with neural networks being too incomprehensible to
explain the full mapping learned satisfactorily
        <xref ref-type="bibr" rid="ref9">(Lipton, 2016)</xref>
        . When choosing which approach and
technique to use in order to create an explainable AI system, it is worth to consider why there is a
need for explanation (e.g., to justify decisions, enhance trust, show correctness, ensure fairness, and
comply with ethical or legal standards), who the target audience is (e.g., a regular user, an expert
user, or an external entity), what interpretations are derivable to satisfy the need, when is the need
for information (before, during, or after the task), and how can objective and subjective measures
evaluate the system
        <xref ref-type="bibr" rid="ref7">(Rosenfeld, A &amp; Richardson, A, 2019)</xref>
        .
      </p>
    </sec>
    <sec id="sec-2">
      <title>3. The X-RAI Framework as a Design Artifact</title>
      <p>
        The X-RAI framework is an ensemble consisting of four artifacts (Fig. 1). First, the Model Impact and
Clarification (MIC) Framework, which ensures that a ML model fulfills requirements regarding
transparency and responsibility. Second, the Evaluation Plan (EP) Framework, which plans resource
requirements and the evaluation of ML models. Third, the Evaluation Support (ES) Framework that
facilitates the actual empirical evaluation of ML models and supports the decision whether a ML
model shall continue in production, be retrained or shut down. Fourth, the Retraining Execution
(RE) Framework, which initiates the process of sending an ML model back to the Machine Learning
Lab (ML Lab) for retraining.
The first two artifacts are part of the decisive foundation for a steering committee regarding
launching the ML model into production (pre-production). The last two artifacts support the
continuous evaluation and improvement of the ML model after it goes live (post-production). The
design artifacts in ADR are solutions to problems experienced in practice and with theory ingrained.
The problems must be generalizable outside the context of the project
        <xref ref-type="bibr" rid="ref8">(Sein et al., 2011)</xref>
        . X-RAI is a
solution to problems experienced in the context of the Danish Business Authority where government
officials are the intended end users. The government officials are, in our case, educated within the
sciences of law, business, and politics as well as data scientists with plural backgrounds. Their
expertise varies according to the governmental institution. X-RAI must be capable of involving and
utilizing stakeholders with variating expertise without excluding some by setting an unachievable
technological barrier of entry.
3.1.
      </p>
      <sec id="sec-2-1">
        <title>Model Impact and Clarification Framework</title>
        <p>
          The MIC Framework has been applied and tested on four ML models--three times in its initial
version and one time in its current version. The MIC is a questionnaire that enables the questionee
to describe and elaborate on issues related to different aspects of ML related to transparency,
explainability, responsible conduct, business objectives, data, and technical issues. The primary
purpose of the MIC Framework is to improve, clarify, and guide communication between various
stakeholders, such as developers with technical expertise, caseworkers with expertise in the ML
models decision space and management. The idea of the MIC Framework derives from an analysis
of the Canadian Algorithmic Impact Assessment (AIA)1 tool that was found to have a strong link to
the Canadian directive on automated decision-making2. MIC differs from AIA since it is grounded
in theory and business needs instead of legislation. The algorithmic information in Box 1 contains
information about the ML model. Box 2 is filled out by the future owner of the system enabling them
to state their needs concerning the use, explainability, transparency, users, and accountable actors.
Box 3 builds directly on Lipton's descriptions of transparency with the following three sub-levels:
simulatability, decomposability, algorithmic transparency. In addition, it builds on types of
posthoc interpretability with the following approaches: text explanations, visualization, local
Explanations, and explanation by example
          <xref ref-type="bibr" rid="ref9">(Lipton, 2016)</xref>
          . These are supplemented with three
concrete explainability methods, Local Interpretable Model-agnostic Explanations (LIME) (Ribeiro
et al., 2016) and SHAP
          <xref ref-type="bibr" rid="ref11">(Lundberg, S &amp; Lee, S, 2017)</xref>
          . The output verification is bound to the fact that
ML models in the DBA are decision-supportive, not decision-making, which reduces the need for
an explanation if the end-user can validate the truthfulness of the model output instantly. Box 4
focuses on the data dimensions of the ML model including the relation to data sources and other
ML models. Box 5 explains every feature to avoid opaque ML models due to highly engineered or
anonymous features
          <xref ref-type="bibr" rid="ref9">(Lipton, 2016)</xref>
          and supplements methods such as SHAP
          <xref ref-type="bibr" rid="ref11">(Lundberg, S &amp; Lee, S,
2017)</xref>
          . Box 6 draws on the special categories from the 2016 European Union's General Data Protection
Regulation3 and the 2018 Danish Data Protection Act4, repeating the questions on other data
1 See https://canada-ca.github.io/aia-eia-js/ and
https://github.com/canada-ca/digital-playbookguide-numerique/tree/master/en
2 See https://www.tbs-sct.gc.ca/pol/doc-eng.aspx?id=32592
3 See
https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:02016R067920160504&amp;from=EN
        </p>
        <p>4 See https://www.retsinformation.dk/Forms/r0710.aspx?id=201319 (all links last checked 01/06/20)
categories to avoid discrimination. Box 7 focuses on the consequences of the output, mitigation of
consequences, and ensuring the responsible application of ML models. It takes inspiration from the
confusion matrix enabling an easy estimate of the frequency of each outfall.</p>
      </sec>
      <sec id="sec-2-2">
        <title>3.2. Evaluation Plan</title>
        <p>
          The Evaluation Plan (EP) was applied and tested on eight ML models in three incrementally
different versions. The EP structures the ongoing evaluation of a ML model throughout its lifetime
and thereby illuminates the necessary resources for maintenance. The Evaluation Plan clarifies
uncertainties such as time and frequency for the evaluation meetings, involved actors including roles
and obligations, data foundation, and meeting preparation. The goal is to ensure that all ML models
fulfill the defined quality requirements from the cradle to the grave. The theory is ingrained
indirectly in the EP through the MIC framework. The choices made when using the MIC framework
influences how the ML model can be evaluated. The ML model's degrees of transparency and
explainability influences the possibilities of the evaluations. The evaluation detects data drift in a
procedure similar to the application-grounded evaluation where the ML model is evaluated
accordingly to domain experts performance on the task
          <xref ref-type="bibr" rid="ref5">(Doshi-Velez &amp; Kim, 2017)</xref>
          . The EP
encourages the first evaluation to be as early as possible due to the difficulties in predicting complex
methods such as neural network on unseen data
          <xref ref-type="bibr" rid="ref9">(Lipton, 2016)</xref>
          .
(1) The name of the model and version number
(2) Participants for an example the application manager, caseworkers, ML lab etc.
(3) When is the first evaluation meeting?
(4) Expected evaluation meeting frequency: (How often are we expected to meet? And are there peak
periods which we need to take into consideration?)
        </p>
        <p>(5) Foundation for evaluation: For an example logging data or annotated data (Annotated data is here
data where the domain experts classification is compared to the machine)</p>
        <p>(6) Resources: (who can create the evaluation/training data, internal vs. external creation of training
data, what is the quantity needed for evaluation, time/money)</p>
        <p>(7) Estimated resource requirement for training, training frequency, and complications degree
(procedure regarding regular bad performance)
(8) The Role of the Model: Is it visible or invisible for external users.
(9) Is the models output input for another/is the models input an output from another model.
(10) What are the criteria of success and failure (When does a model perform good/bad. How many
percent?)</p>
        <p>(11) Is there future legislation that will impact the model performance? (Including: bias, introduction of
new requirements/legal claims, abolition of requirements/legal claims, bias, etc..</p>
        <p>(12) When does the model need to be retrained?
(13) When should the model be mutet?
3.3.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Evaluation Support</title>
        <p>The Evaluation Support (ES) framework was applied five times on three different ML models in
three incrementally changed editions.</p>
        <p>
          A fourth edition is ready for testing. The ES facilitates the evaluation of the ML model at the
evaluation meetings. The domain specialist responsible for the ML model answers relevant fields in
the framework before the meeting. The stakeholders complete the remaining framework
collaboratively at the meeting and decide if the ML model shall continue in production, be retrained,
or shut down. The ES strives to evaluate the ML model accordingly to the task as described in the
applications-grounded evaluation
          <xref ref-type="bibr" rid="ref5">(Doshi-Velez &amp; Kim, 2017)</xref>
          . In our case, we let the caseworker that
normally would do the task of the ML model evaluate the classifications and report it in the ES
framework. The ES primarily focuses on fulfillments of performance requirements while it lets
transparency and explainability be subcomponents of interpreting the reason for ML model
performance. The reason is important if the model needs retraining.
3.4.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Retraining Execution Framework</title>
        <p>The Retraining Execution (RE) Framework was applied and tested two times on two different ML
models in two incrementally changed versions. The RE initiates the process of sending a ML model
back to the machine-learning lab for retraining. The retraining occurs when the ML model needs to
improve performance and will continue to provide value. The RE framework focuses on the
reusability of evaluation data and old training data for retraining, the occurrence of new
technological possibilities, the detection and elimination of bias, changes in data types and
legislation, the urgency for retraining, and if the input and output are related to other models.
Transparency and explainability of the ML model become relevant when explaining a root cause for
the need for retraining.
all stakeholder agreed on that the model has to be retrained?)</p>
        <p>Data distribution becomes relevant if the data are skewed and slows down and thereby increases
the cost in a data annotation process with the focus on providing training examples for the minority
class. The use of the retraining execution framework restarts the X-RAI process by leading to the use
of the MIC framework.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Conclusion and Outlook</title>
      <p>
        The X-RAI framework was successfully developed, applied, and tested on nine different ML models
used in the Danish Business Authority accordingly to the ADR principle of authentic and concurrent
evaluation
        <xref ref-type="bibr" rid="ref8">(Sein et al.. 2011)</xref>
        . The iterations have let to incremental changes in the frameworks. The
frameworks are currently standard procedures and mandatory for all ML models developed by the
ML Lab in the Danish Business Authority, which we conclude to be successful in the aspect of
organizational adoption of artifacts and procedures. Artifacts must have theory ingrained
accordingly to ADR
        <xref ref-type="bibr" rid="ref8">(Sein et al.. 2011)</xref>
        . Interpretability theory, including the subcategories of
transparency and explanation, is ingrained into the frameworks. The lens provides a strong
foundation for informing how the ML models work. Future work will focus on analyzing the
evaluation data and using it to design IT artifacts and integrate them into the Danish Business
Authority's IT-ecosystem. An additional theoretical lens will be ingrained in the artifacts to create a
theoretical foundation for responsible conduct in the design.
      </p>
      <sec id="sec-3-1">
        <title>About the Authors</title>
        <p>Per Rådberg Nagbøl
Per Rådberg Nagbøl is employed as a Ph.D. fellow at The IT University of Copenhagen and does a
collaborative Ph.D. in collaboration with the Danish Business Authority.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Perraul</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Shoham</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Brynjolfsson</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Etchemendy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Grosz</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Lyons</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Manyika</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Niebles</surname>
            ,
            <given-names>J. C.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <source>The AI Index 2019 Annual Report. AI Index Steering Committee</source>
          ,
          <string-name>
            <surname>Human-Centered AI</surname>
          </string-name>
          Institute, Stanford University.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Guszcza</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Rahwan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Bible</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Cebrian</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Katyal</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          (
          <year>2018</year>
          )
          <article-title>Why We Need to Audit Algorithms</article-title>
          . https://hbr.org/
          <year>2018</year>
          /11/why-we
          <article-title>-need-to-audit-algorithms.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Regeringen</surname>
          </string-name>
          (
          <year>2019</year>
          )
          <article-title>Finansministeriet og Erhvervsministeriet: National strategi for kunstig intelligens</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Molnar. C.</surname>
          </string-name>
          (
          <year>2020</year>
          )
          <article-title>: Interpretable Machine Learning A Guide for Making Black Box Models Explainable</article-title>
          . https://christophm.github.io/interpretable-ml-book/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Doshi-Velez</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2017</year>
          )
          <article-title>Towards a rigorous science of interpretable machine learning</article-title>
          . https://arxiv.org/abs/1702.08608v2
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Techniques for Interpretable Machine Learning</article-title>
          .
          <source>Communications of the ACM</source>
          . Volume
          <volume>63</volume>
          . Issue 1. https://dl.acm.org/doi/10.1145/3359786
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Rosenfeld</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Richardson</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Explainability in Human-Agent Systems</article-title>
          . arXiv:
          <year>1904</year>
          .08123v1
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Sein</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Henfridsson</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Purao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Rossi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          &amp;
          <string-name>
            <surname>Lindgren</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2011</year>
          )
          <article-title>ACTION DESIGN RESEARCH</article-title>
          .
          <source>MIS Quarterly</source>
          , Volume
          <volume>35</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>1</given-names>
          </string-name>
          , page 37-
          <fpage>56</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Lipton</surname>
            ,
            <given-names>Z</given-names>
          </string-name>
          (
          <year>2016</year>
          )
          <article-title>The Mythos of interpretability</article-title>
          .
          <source>Presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI</source>
          <year>2016</year>
          ), New York, NY.
          <source>last revised 6 Mar</source>
          <year>2017</year>
          . arXiv:
          <volume>1606</volume>
          .03490v3
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Lipton</surname>
            ,
            <given-names>Z</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>The Mythos of Model Interpretability</article-title>
          .
          <source>ACM QUEUE</source>
          . Volume
          <volume>16</volume>
          , issue 3 https://queue.acm.org/detail.cfm?id=
          <fpage>3241340</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Lundberg</surname>
            ,
            <given-names>S</given-names>
          </string-name>
          &amp; Lee,
          <string-name>
            <surname>S</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>A Unified Approach to Interpreting Model Predictions</article-title>
          .
          <source>31st Conference on Neural Information Processing Systems (NIPS</source>
          <year>2017</year>
          ), Long Beach, CA, USA.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>