=Paper= {{Paper |id=Vol-3090/paper22 |storemode=property |title=Social Aspects of Machine Learning Model Evaluation: Model Interpretation and Justification from ML-practitioners’ Perspective |pdfUrl=https://ceur-ws.org/Vol-3090/spaper22.pdf |volume=Vol-3090 |authors=Victoria Zakharova,Alena Suvorova |dblpUrl=https://dblp.org/rec/conf/ims2/ZakharovaS21 }} ==Social Aspects of Machine Learning Model Evaluation: Model Interpretation and Justification from ML-practitioners’ Perspective== https://ceur-ws.org/Vol-3090/spaper22.pdf
Social Aspects of Machine Learning Model Evaluation:
Model Interpretation and Justification from ML-practitioners’
Perspective
Victoria Zakharovaa and Alena Suvorovaa
a
    HSE University, 3A Kantemirovskaya Street, St Petersburg, 194100, Russia

                 Abstract
                 Machine Learning (ML) is now widely applied in various life spheres. Experts from different
                 domains become involved in the decision-making on the basis of complex machine learning
                 models that causes in-creased interest in the research in model explainability. However, little
                 is known about the ways that ML-practitioners use to describe and justify their models to
                 others. This work aims to fill the research gap in understanding how data specialists evaluate
                 machine learning models and how they communicate results to third parties. To explore that,
                 the qualitative research design is suggested and semi-structured interviews with ML-
                 practitioners are conducted. The decision-making process will be explored from a sociological
                 perspective according to which data specialists are considered as actors who tend to construct
                 knowledge rather than passively take it. The potential result of this work is to reveal the role
                 of data specialists in model explanation and justification and describe methods they could use
                 to explain complex models to domain experts with non-technical backgrounds.

                 Keywords 1
                 Machine Learning, Algorithm Evaluation, Knowledge Sharing

1. Introduction
    Digitalization promotes innovations and facilitates a process of globalization. With that, ongoing
digital transformation causes the emergence of new tasks together with new methods for their solutions
which are rarely clear for a wide audience but accepted since they provide solutions for urgent issues
[1]. This tendency is noticeable in the applied domains when medium-size companies, large
corporations, and small start-ups appeal to non-traditional digital solutions to present unique values of
their works to strengthen competitiveness and take an outstanding position among the other market
players. Data-driven approaches have achieved their recognition in customer-oriented settings that are
thought to have an impact on society and its characteristics causing far-reaching effects [2]. For
example, the banking sphere has changed with the help of the implementation of chat-bots based on
machine learning algorithms, that give answers to clients quicker or send personalized notifications that
are also already used in such industries as retail [3], healthcare [4], and insurance [5].
    As one of the consequences, being motivated by the up-growing demand for analytical expertise at
the labour market some people adhere to follow trends and take roles of problem-solvers to deal with
latter-day challenges [6]. Expanding knowledge to boost expertise and diving into the data science
sphere, such roles become diverse and barely clearly defined due to uncertainty. Moreover, specialists
have to collaborate with each other to reach the commonly established goals such as releasing new
digital products or upgrading existing infrastructure with advanced algorithms. Simplifying the
concepts, model builders, model breakers, and consumers can be distinguished [7]. Considering a
ground stage of the technological development and knowledge formation about that, the first two
mentioned roles are taken by actors who are interested in facilitating innovations initially and make

IMS 2021 - International Conference "Internet and Modern Society", June 24-26, 2021, St. Petersburg, Russia
EMAIL: vv.vict26@gmail.com (A. 1); suvalv@gmail.com (A. 2)
ORCID: 0000-0003-3641-6326 (A. 1); 0000-0002-5392-4683 (A. 2)
              © 2021 Copyright for this paper by its authors.
              Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
              CEUR Workshop Proceedings (CEUR-WS.org)
IMS-2021. International Conference “Internet and Modern Society”                                       231



decisions based on data introducing state-of-the-art project results to the mass. Specifically, data
specialists have not only to develop and evaluate statistical models such as machine learning ones but
also come to an agreement with stakeholders who are far from direct work with mathematical
algorithms, but they are who can ensure the promotion and assistance of the further project realization.
   Motivation for conducting this research is based on the recent increase in scientific papers that
emphasize the importance of machine learning practitioners’ expertise dealing with algorithms
promoting transparency analysis as an integral part of the work with algorithms and, by this, paying
precise attention to a need for clarification of the data- and machine-learning-based solutions to
providing understanding for all involved parties [7, 8, 9, 10]. As for a potential work contribution, this
works attempts to present theoretical justification of model evaluation process with consideration of
practices of model interpretation and sharing the knowledge to the other involved actors supported with
qualitative data provided by machine learning practitioners seeing a case from their perspective.

2. Related Research and Problem Statement
    Presently, there is plenty of research papers proving scientific interests towards data-driven
innovations from the managerial, economical, and social perspectives. One of the research themes is
related to studying a working process performed by data practitioners. In particular, these research are
more focused on data-oriented skill-set [11], data science role division [12, 13, 14], team collaborations
[15, 16], tools and practices within the workflow with the notion of practical settings [17, 18, 19]. In
addition, other studies focus on the role of explanation in decision-making revealing that data specialists
tend to trust algorithms too much and make decisions in a biased manner [20, 21]. However, little is
known how data specialists, who implement complex models (i.e., machine learning ones), evaluate
models in non-academic settings, and how they translate the obtained information to the others involved
directly or indirectly in their work. Actually, several studies related to that issue have been aimed at a
direction of practitioner’s work investigation, but they are much more empirical rather than theoretically
justified [7], and experts’ needs and opinions about interpretability is rarely provided.
    Thus, this research is aimed at studying practices (i.e., practical actions based on the real-life
working experience) of data science specialists with the focus on the model evaluation stage and
communicating their knowledge about model quality and other characteristics with the third parties.
The following research questions are proposed: How do specialists perform model evaluation and what
they pay attention at? How do data practitioners explain complex statistical models to other people
without a deep understanding of data science principles and ideas? The relevance of studying this issue
stems from the idea that data-experts are the first who interact with algorithms, who have specific
knowledge to understand them, and their decisions are initial for promoting the use of algorithms in
production, which might have a significant impact on society over time [22, 23].

3. Research Design

    In the framework of this research, the description of the practical work of specialists is planning to
be supported with empirical data collected via semi-structured interviews with practitioners working in
different business spheres with data-intense applications. An interview-based approach is used to
understand experience, positions, attitudes, and to know opinions of industry practitioners who are
direct guides to the world of technology [24]. Variability sample or, in other words, interviewing
practitioners from various domains is thought to be applicable for reviewing common (domain-
independent) patterns and discrepancies to provide explanations of the performed actions and formed
viewpoints with the help of shared real-life. As for sampling technique, convenient and snowball
samplings were performed, and, as a result, 16 interviews with 11 men and 5 women have been
conducted. The main criteria for recruiting participants were that they had to have at least one year of
practical experience in the industry, as well as they had to practice machine learning algorithms for
problem-solving at their work.
    The obtained results will be analyzed with the help of thematic qualitative analysis in order to
explore the general case from the perspective of the applied theoretical framework. In detail, this work
is planning to be based on theory in order to justify its results by grounded interpretation of empirics.
232                                                      PART 2: Digital Transformation and Global Society



As for the theory, a concept of “worlds” introduced by Boltanski and Thévenot in 2006 [25] is chosen
for the elaboration of data practitioners’ work. According to that, there are a few “worlds” or ways of
thinking related to how people and objects dwell together being guided by their own interests,
intentions, and perception of particular issues. These “worlds”, that are prone to experience conflicts,
reach compromises, and collaborate on justification, are the following: inspired, fame, domestic, civic,
market, and industrial. Taking into account a fact that data scientists generally work in a business
sphere, an idea that these practitioners have to work together not only with each other but also with
managers and stakeholders that are more likely to be related to the other “worlds”, especially market
one, seems to be straightforward.

4. Plans and Preliminary Results

    Further plans of this research are mainly focused on data analysis to obtain justified answers to the
research questions. In beforehand, findings emphasize the difficulty of contacting a few “worlds”.
Precisely, data practitioners actually evaluate models with the help of mathematical metrics that are
understandable for them, and further, they have to consider interests of the others such as managers
who are more likely to concern about financial payoffs and stakeholders who decide whether they
should invest to an ML-based project or not. The situation becomes more complicated when there is a
necessity to review the models in a social context (e.g., whether obscene content that was unblocked by
mistake is causing moral injury to users). In addition, data practitioners support the idea that one of the
managerial purposes is to sell projects reeling in superiors. Moreover, sometimes managers can attempt
to take part in market tenders offering technical solutions that hardly can be realized by data specialists.
In general, these insights strengthen the idea that there is a high need in building effective
communication between the “worlds” to inform about the capabilities of each of the parties, in particular
converting mathematical metrics to business ones to demonstrate the efficiency and potential benefits
justifiably.
    As for interpretable machine learning methods (which appear to be one of the highly debatable topics
in data science communities), several practitioners mentioned the usefulness of such tools for revealing
model transparency with a certain degree of confidence since there were cases when they helped to
define which model would be better in terms of its algorithm or even elaborate on a project case
considering it step-by-step making representation of the work easier for experts from the other “worlds”.
The others pointed that they did not use interpretable machine learning methods in their project
workflow since they are not worth it: strict explanations are required by stakeholders but
computationally and timely expensive.

5. Acknowledgements

      The work is supported by the Russian Science Foundation grant (project No. 19-71-00064).

6. References

[1] J.J. Kassema, Products and Services Improvement through Innovation and Creativity: Case of IT
    Business Sector. Social Science Research Network, Rochester, NY, 2019.
    https://doi.org/10.2139/ssrn.348581111.
[2] A. Mugrauer, J. Pers, Marketing managers in the age of AI: A multiple-case study of B2C firms,
    2019.
[3] T. Calle-Jimenez, B. Orellana-Alvear, R. Prado-Imbacuan, GIS and User Experience in Decision
    Support for Retail Type Organizations. In: 2019 International Conference on Information Systems
    and         Software        Technologies       (ICI2ST),        2019,        pp.       156–161.
    https://doi.org/10.1109/ICI2ST.2019.00029.
[4] L. Syed, S. Jabeen, S. Manimala, H.A. Elsayed, Data Science Algorithms and Techniques for
    Smart Healthcare Using IoT and Big Data Analytics. In: Mishra, M.K., Mishra, B.S.P., Patel, Y.S.,
    and Misra, R. (eds.) Smart Techniques for a Smarter Planet: Towards Smarter Algorithms,
IMS-2021. International Conference “Internet and Modern Society”                                    233



     Springer International Publishing, Cham, 2019, pp. 211–241. https://doi.org/10.1007/978-3-030-
     03131-21124.
[5] A. Singh, K. Ramasubramanian, S. Shivam, Building an Enterprise Chatbot: Work with Protected
     Enterprise Data Using Open Source Frameworks. Apress, 2019.
[6] S. Kampakis, Problem Solving. In: S. Kampakis (ed.) The Decision Maker’s Handbook to Data
     Science: A Guide for Non-Technical Executives, Managers, and Founders. Apress, Berkeley, CA,
     2020, pp. 89–95.
[7] S.R. Hong, J. Hullman, E. Bertini, Human Factors in Model Interpretability: Industry Practices,
     Challenges, and Needs. Proc. ACM Hum.-Comput. Interact. 4,1–26, 2020.
     https://doi.org/10.1145/33928789.
[8] C. Molnar, G. Casalicchio, B. Bischl, Interpretable Machine Learning A Brief History, State-of-
     the-Art and Challenges. arXiv:2010.09337 [cs, stat], 2020.
[9] W.J. Murdoch, C. Singh, K. Kumbier, R. Abbasi-Asl, B. Yu, Interpretable machine learning:
     definitions, methods, and applications. Proc Natl Acad Sci USA.116, 2019, pp. 22071–22080.
     https://doi.org/10.1073/pnas.190065411616.
[10] H. Suresh, S.R. Gomez, K.K. Nam, A. Satyanarayan, Beyond Expertise and Roles: A Framework
     to Characterize the Stakeholders of Interpretable Machine Learning and their Needs.
     arXiv:2101.09824 [cs], 2021. https://doi.org/10.1145/3411764.344508823.
[11] T. Stadelmann, K. Stockinger, G. Heinatz B ̈urki, M. Braschler, Data Scientists. In: Braschler, M.,
     Stadelmann, T., and Stockinger, K. (eds.) Applied Data Science: Lessons Learned for the Data-
     Driven Business, Springer International Publishing, Cham, 2019, pp. 31–45.
     https://doi.org/10.1007/978-3-030-11821-1322.
[12] S. Baˇskarada, A. Koronios, Unicorn data scientist: the rarest of breeds. Program, 51, 2017, pp.
     65–74. https://doi.org/10.1108/PROG0720160053.
[13] M. Kim, T. Zimmermann, R. DeLine, A. Begel, Data Scientists in Software Teams: State of the
     Art and Challenges. IEEE Transactions on Software Engineering. 44, 2018, 1024–1038.
     https://doi.org/10.1109/TSE.2017.275437413.
[14] J.S. Saltz, N.W. Grady, The ambiguity of data science team roles and the need for a data science
     workforce framework. In: 2017 IEEE International Conference on Big Data (Big Data), 2017, pp.
     2355–2361. https://doi.org/10.1109/BigData.2017.825819019.
[15] A.Y. Wang, A. Mittal, C. Brooks, S. Oney, How Data Scientists Use Computational Notebooks
     for Real-Time Collaboration. Proc. ACM Hum.-Comput. Interact., 3, 2019, 39:1-39:30.
     https://doi.org/10.1145/335914125.
[16] A.X. Zhang, M. Muller, D. Wang, How do Data Science Workers Collaborate? Roles, Workflows,
     and      Tools.    Proc.     ACM      Hum.-Comput.     Interact.,   4,     2020,     022:1-022:23.
     https://doi.org/10.1145/3392826.
[17] N. Boukhelifa, M.-E. Perrin, S. Huron, J. Eagan, How Data Workers Cope with Uncertainty: A
     Task Characterisation Study. In: Proceedings of the 2017 CHI Conference on Human Factors in
     Computing Systems, Association for Computing Machinery, New York, NY, USA, 2017,
     pp.3645–3656. https://doi.org/10.1145/3025453.3025738.
[18] A. Crisan, B. Fiore-Gartland, M. Tory, Passing the Data Baton: A Retrospective Analysis on Data
     Science Work and Workers. IEEE Transactions on Visualization and Computer Graphics, 27,
     2021, 1860–1870. https://doi.org/10.1109/TVCG.2020.3030340.
[19] P. Pereira, J. Cunha, J.P. Fernandes, On Understanding Data Scientists. In: 2020IEEE Symposium
     on Visual Languages and Human-Centric Computing (VL/HCC), 2020, pp. 1–5
     https://doi.org/10.1109/VL/HCC50065.2020.912726918.
[20] D.-A. Ho, O. Beyan, Biases in Data Science Lifecycle. arXiv:2009.09795 [cs], 2020.
[21] H. Kaur, H. Nori, S. Jenkins, R. Caruana, H. Wallach, J. Wortman Vaughan, Interpreting
     Interpretability: Understanding Data Scientists’ Use of Interpretability Tools for Machine
     Learning. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems.
     Association for Computing Machinery, New York, NY, USA, 2020, pp. 1–14.
     https://doi.org/10.1145/3313831.337621912.
[22] U. Garzcarek, D. Steuer, Approaching Ethical Guidelines for Data Scientists. In: Bauer, N.,
     Ickstadt, K., L ̈ubke, K., Szepannek, G., Trautmann, H., and Vichi, M.(eds.) Applications in
     Statistical Computing: From Music Data Analysis to Indus-trial Quality Improvement, Springer
234                                                   PART 2: Digital Transformation and Global Society



     International Publishing, Cham, 2019, pp. 151–169. https://doi.org/10.1007/978-3-030-25147-
     510.
[23] S. Passi, S.J. Jackson, Trust in Data Science: Collaboration, Translation, and Accountability in
     Corporate Data Science Projects. Proc. ACM Hum.-Comput. Interact. 2, 2018, pp. 1–28
     https://doi.org/10.1145/327440517.
[24] N. Seaver, Algorithms as culture: Some tactics for the ethnography of algorithmic systems. Big
     Data & Society, 4, 2053951717738104, 2017. https://doi.org/10.1177/205395171773810420.
[25] L. Boltanski, L. Th ́evenot, On Justification: Economies of Worth. Princeton University Press,
     Princeton, 2006.