=Paper=
{{Paper
|id=Vol-3144/RP-paper3
|storemode=property
|title=DOGO4ML: Development, Operation and Data Governance for ML-based Software Systems
|pdfUrl=https://ceur-ws.org/Vol-3144/RP-paper3.pdf
|volume=Vol-3144
|authors=Claudia Ayala,Besim Bilalli,Cristina Gomez,Silverio Martinez-Fernandez
|dblpUrl=https://dblp.org/rec/conf/rcis/AyalaBGM22
}}
==DOGO4ML: Development, Operation and Data Governance for ML-based Software Systems==
<pdf width="1500px">https://ceur-ws.org/Vol-3144/RP-paper3.pdf</pdf>
<pre>
DOGO4ML: Development, Operation and Data Governance for
ML-based Software Systems
Claudia Ayala, Besim Bilalli, Cristina Gómez and Silverio Martínez-Fernández1
1
    Universitat Politècnica de Catalunya, Barcelona, Catalonia, Spain


                  Abstract
                  Machine Learning based Software Systems (MLSS) are becoming increasingly pervasive in
                  today’s society and can be found in virtually every domain. Building MLSS is challenging due
                  to their interdisciplinary nature. MLSS engineering encompasses multiple disciplines, of which
                  Data Engineering and Software Engineering appear as most relevant. The DOGO4ML project
                  aims at reconciling these two disciplines for providing a holistic end-to-end framework to
                  develop, operate and govern MLSS and their data. It proposes to combine and intertwine two
                  software cycles: the DataOps and the DevOps lifecycles. The DataOps lifecycle manages the
                  complexity of dealing with the big data needed by ML models, while the DevOps lifecycle is
                  in charge of building the system that embeds these models. In this paper, we present the main
                  vision and goals of the project as well as its expected contributions and outcomes. Although
                  the project is in its initial stage, the progress of the research undertaken so far is detailed.

                  Keywords
                  DevOps, Machine Learning, DataOps, Data and Software Engineering, ML-based software
                  systems

1. Introduction

The European Political Strategy Centre stated that “Data is rapidly becoming the lifeblood of the global
economy. It represents a key new type of economic asset. Those that know how to use it have a decisive
competitive advantage in this interconnected world, through raising performance, offering more user-
centric products and services, fostering innovation—often leaving decades-old competitors behind.”1.
It becomes necessary for companies to master the development, operation, and governance of software
systems that embed advanced statistical models exploiting data for different purposes. Such models are
typically generated using Machine Learning (ML), i.e., “the study of computer algorithms that
improve automatically through experience”, which rely on available sample data to learn models and
“make predictions or decisions without being explicitly programmed to do so” [1]. We call ML-based
software systems (MLSS) those software systems whose behavior is greatly determined by ML models
embedded therein. MLSS are becoming increasingly pervasive in today’s society and are present in
virtually every domain: from smart mobility (autonomous driving) and Industry 4.0 (factory robots) to
smart health (diagnostic systems) and smart infrastructures (cloud-based services), etc.
    Processes for building MLSS tend to be complex, inherently iterative and difficult to manage and
govern. One of the reasons for this complexity is that they encompass multiple disciplines, of which
Data Engineering (DE) and Software Engineering (SE) appear as most relevant. In setting up MLSS,
data and software engineers are often faced with several challenges that make even more complicated
their development and operation: (i) the lack of a well-established set of good practices to design,
manage and govern, in a systematic manner, such software systems; (ii) the increasingly usual

Joint Proceedings of RCIS 2022 Workshops and Research Projects Track, May 17-20, 2022, Barcelona, Spain
EMAIL: cayala@essi.upc.edu (A. 1); bbilalli@essi.upc.edu (A. 2); cristina@essi.upc.edu (A. 3); silverio.martinez@upc.edu (A. 4)
ORCID: 0000-0002-6262-3698 (A. 1); 0000-0002-0575-2389 (A. 2); 0000-0002-3872-0439 (A. 3); 0000-0001-9928-133X (A. 4)
               ©️ 2020 Copyright for this paper by its authors.
               Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
               CEUR Workshop Proceedings (CEUR-WS.org)

1
    https://ec.europa.eu/epsc/sites/epsc/files/epsc_strategic_note_issue30_trategic_autonomy.pdf
characteristics of Big Data, and (iii) the lack of definition of specific indicators and quality requirements
for MLSS (e.g., related to trustworthiness or ethics), and tool support for validating them.
    This paper presents the DOGO4ML project, acronym for Development, Operation and Data
Governance for MLSS. DOGO4ML (https://dogo4ml.upc.edu/en) is a 4-year project that started on
September 2021 and it is funded by the Spanish research agency, under the National Spanish Program
for Research Aimed at the Challenges of Society 2020 (RETOS 2020).
    DOGO4ML is run by the integrated Software, Services, Information and Data Engineering research
group (inSSIDE, https://insside.upc.edu/) at the Universitat Politècnica de Catalunya (UPC). inSSIDE
is composed of two subgroups: (i) the Software and Service Engineering group (GESSI,
https://gessi.upc.edu/en) and (ii) the Database Technologies and Information Management group
(DTIM, https://www.essi.upc.edu/dtim/). These two subgroups together cover the relevant aspects
related to SE and DE that lay the foundations for DOGO4ML.
    The rest of the paper is organized as follows. Section 2 and Section 3 present the conceptualization
of DOGO4ML and its objectives, respectively. The expected outcomes of the project are detailed in
Section 4. Section 5 sketches the relevance of the project for the ML field. Then, Section 6 summarizes
the initial results of DOGO4ML. Finally, Section 7 presents the conclusions.

2. DOGO4ML Conceptualization
The main objective of the project is to provide a holistic approach to MLSS engineering aligning its DE
needs with SE practices. DOGO4ML proposes a holistic end-to-end framework to develop, operate and
govern MLSS and their data. This framework revolves around a new proposal we call the DevDataOps
lifecycle, which unifies two software lifecycles: the DevOps lifecycle and the DataOps lifecycle. The
DevOps cycle aims to transform the requirements of an MLSS into deployed code (Dev) and get
feedback as soon as possible from the end-users (Ops). This can be used to evolve the requirements
(including those that apply to the ML models). The DataOps cycle provides support to the data
management and analysis processes that characterize MLSS. The DataOps processes are inter-related
with those in the Dev phase of the DevOps software cycle, since they produce the required ML models
(created through several iterations in the DataOps lifecycle) to be embedded into the ML software
components of the MLSS. Further, the DataOps cycle aims to get feedback from the data analysts to
continuously improve the data management and analysis processes. A detailed explanation of the
conceptualization of both cycles follows.

2.1.    The DevOps software cycle
DevOps is a software development and delivery process that produces software from its
conceptualization, as well as from the feedback provided by monitors when the software system is in
an operational environment. This feedback is then used to maintain and evolve the system. The
specificity of MLSS requires continuous context-aware delivery, and feedback to adjust and refine their
embedded ML components. Fig. 1 presents the resulting DevOps cycle.
    In the Dev phase, a typical requirement engineering process applies both at the system and the ML
component levels. At the system level, the requirements include quality requirements specific for the
MLSS (e.g., trustworthiness and ethics), extracted from a requirement patterns catalogue based on [2]
to be built during the project. At the level of the ML components, requirements also include quality
requirements of ML models (e.g., model accuracy and low latency) which are key to identify and
process the relevant data for the ML model construction, validation, and operation (see description of
DataOps software cycle in Section 2.2).
    Going on with the Dev phase, agile practices will be adopted to continuously deliver high-quality
MLSS. The definition of reference architectures and best practices (e.g., iterative integration of ML
models provided by the DataOps software cycle into ML components), driven by the MLSS quality
requirements, will enable rapid MLSS implementation and deployment in small iterations. Automated
integration and testing of those systems will reconcile the particularities of both types of components,
ML and non-ML (e.g., in terms of uncertainty in the functional validation). Once validated, the MLSS
will be deployed in its contextual operational environment (e.g., in a type of system with high decisional
capabilities such as a smart vehicle).


Figure 1: The DevOps cycle for MLSS projects proposed by the DOGO4ML project. A zoom-in of the
DataOps cycle is in Figure 2.

    During the Ops phase, the MLSS in production interacts with both the user and the environment.
For example, a MLSS for a smart vehicle will receive input from the user (e.g., through voice) and a
continuous stream of sensed data (e.g., data indicating people crossing the street). Through these
interactions and input, the ML models deployed inside the system are able to make predictions (e.g.,
there is an increased risk of accident).
    While the system is in execution, it generates runtime data, mainly in the form of measurements of
the system behavior (through monitors) and log files that contain the sequence of time-stamped
interactions. This data will be gathered by a module able to analyze it and assess a set of high-level
indicators that may refer to MLSS quality requirements (e.g., runtime efficiency, trustworthiness of the
system) or other more general aspects (e.g., users’ ethical behavior). At this respect, we plan to adapt
our previous results in: 1) self-adaptive systems monitoring to the area of MLSS [3] and 2) visualization
of high-level indicators and quality requirements in the form of a dashboard [4]. This dashboard, which
we call the DevOps dashboard, is an essential aspect of the DevOps lifecycle to generate the needed
feedback to impact the Dev cycle, and to close the continuous DevOps loop. Feedback enables the
evolution of the MLSS (also including the ML components, by evolving the quality requirements of
ML models). Then, the approach starts over again, and the Dev phase uses the feedback to evolve the
MLSS. Note that data in operation may thus require revisiting the DataOps lifecycle in another Dev
phase.

2.2.    The DataOps software cycle
DataOps defines the lifecycle of the data management and analysis processes, characteristic aspects of
MLSS related to DE (see Fig. 2). The complexity and iterative nature of these processes require their
own software cycle specific for data-related aspects. Additionally, these processes are interdependent
with DevOps activities undertaken in the Dev phase. It is thus one of the objectives of the DOGO4ML
project to identify and operationalize such dependencies.
   Data management processes are responsible to ingest, store, process and prepare data according to
the requirements gathered. These processes, common for the whole organization, are carried out by the
data management backbone system that serves the data in the form of data views (i.e., datasets generated
from the wealth of data ingested ready to be consumed). Then, each project, during its requirement
engineering process conducted in the DevOps Dev phase, decides the specific subset of data assets (i.e.,
data views) required. These data views are the main driver enabling the data analysis processes.
Figure 2: The DataOps cycle for MLSS projects.

    Data analysis processes include data discovery (i.e., finding the relevant data assets and requesting
the needed data views), feature engineering, data preparation and the model learning. These processes,
specific for each project, are carried out by the analysis backbone system that is responsible for learning
the models that will be eventually deployed in the Dev phase. Some works frame the data analysis
processes into their own lifecycle (e.g., under the concept of MLOps). However, many authors argue
that the complete data lifecycle (management and analysis) should be jointly governed within a single
unified view [5]. In DOGO4ML we follow this approach and the tasks identified by MLOps are
considered in our DataOps lifecycle.
    The complexity of the data management and analysis processes requires dedicated data and model
governance, embedded in the data governance subsystem. The governance can be achieved by gathering
the required metadata to automate, trace, monitor and assess specific requirements for the data
management and analysis backbones systems.
    Quality requirements of ML models elicited during the Dev phase (e.g., model accuracy), indicators
related to learning models (generated during the data analysis processes, such as model appropriateness)
and indicators related to data (generated during the data management processes, such as quantifying
data bias or query time when accessing the data views) must be monitored during the operation of the
data ops cycle and visualized through the DataOps dashboard. Those indicators provide feedback that
is key to close the loop with the data cycle. For example, the feedback obtained from monitoring a
generated ML model (e.g., its poor accuracy) may require to consider features from another data view,
to learn new models or even ingest a new external data source.

2.3.    The Holistic software cycle
While the DevOps and DataOps cycles raise significant challenges by themselves, the emerging grand
challenge is their combination into an overarching cycle smoothly integrating their different process
elements (activities, roles, etc.) into a unique holistic process. We already made a first approximation
to the problem in the context of trustworthy autonomous systems [6]. Overall, we envisage three major
determinants.
    Inter-dependency. Both lifecycles generate a number of inter-dependencies, which, due to the
iterative nature of the problem, are not easy to identify, formalize and generalize as to guarantee
adaptability to different scenarios.
    Context-awareness. We do not aim at defining a universal holistic MLSS lifecycle. Instead, we
recognize the fact that different organizations, projects and teams may respond to different context
characteristics (e.g., data quality, available human skills, problem size, etc.), and that the MLSS
lifecycle needs to be flexible enough as to apply to all of them.
    Systematization. To assist software and data engineers in customizing the lifecycle according to
context, DOGO4ML proposes a systematic, tool-supported knowledge-based approach that assists them
in: (i) defining parameterized process fragments (possibly inter-dependent with others) that describe
activities that may take part in the holistic cycle; (ii) select the most appropriate process fragments in a
particular context, respecting their inter-dependencies; (iii) combine them into the holistic process.
   Given these determinants, the project will use situational method engineering (SME) [7] as the
conceptual framework for defining MLSS lifecycles. In SME, we can define a library of process
fragments (“chunks”) classified according to some context criteria. We will use our knowledge in
context ontologies [8] to define the relevant context criteria in the scope of MLSS. SME supports the
composition of such chunks (although the current state of the art does not handle the problem of inter-
dependencies), as we have done in previous work (e.g. in the field of software evolution [9]). Tool-
support will take the form of a handy web application establishing a conversation with the engineers to
proceed with the context criteria elicitation, chunk selection and final composition.

3. Objectives
Based on the aforementioned DOGO4ML conceptualization, we break down the main objective of the
project into four objectives:
        1. Specify, design and implement a holistic and configurable end-to-end lifecycle for MLSS
            aligning SE and DE development and operational processes.
        2. Specify, design and implement the data-driven Dev phase for MLSS considering quality
            requirements and architectural aspects.
        3. Specify, design and implement the Ops phase increasing users' trust in MLSS by
            transparently monitoring quality requirements in near real-time.
        4. Specify, design, implement and govern the data management and analysis processes for
            MLSS in the form of a DataOps lifecycle.

4. Expected outcomes
In line with its objectives, the project aims to contribute scientifically to advance the state of the art on
the effective and efficient production and continuous evolution of ML models integrated into MLSS.
Although ML models and intelligent systems have existed for a long time, the tight integration among
models and software from the different perspectives of development, operation, governance and
evolution makes this proposal highly innovative.
    In particular, this project changes the way in which the interdisciplinary combination of DE and SE
proposes the foundation of a new future technology, setting up a baseline (in the form of a proof-of-
concept) ready to be matured and transferred to interested actors.
    The main assets to be produced in the project that can be individually transferred are:

   (1) Process support.
           (1.a) Catalog of SE and DE combinable and customizable process fragments
           (1.b) Catalog of customizable MLSS holitistic processes
           (1.c) Tool support to build the appropriate MLSS process using (1.a) and (1.b)
   (2) Development support.
           (2.a) Quality model for MLSS with associated catalogs of requirement patterns
           (2.b) Quality model for ML models
           (2.c) Software reference architecture (SRA) for MLSS
           (2.d) Tool support to instantiate the SRA to a given context considering domain
        requirements
           (2.e) Tool support for integration testing including ML and non-ML components
   (3) Operations support.
           (3.a) Set of tools for the deployment of MLSS with models customized to context
           (3.b) Data ingestion infrastructure for monitoring MLSS specific quality requirements
           (3.c) Set of strategic indicators related to users’ trust in MLSS
           (3.d) Strategic dashboard to visualize trust-related indicators
   (4) DataOps support.
           (4.a) Set of tools to govern the complete data lifecycle in MLSS.
             (4.b) Set of tools to semi-automatically manage data quality aspects.
             (4.c) Set of tools to support the automation of the data analysis tasks.
    (5) Complete platform. Integrates all the assets above into a single platform.
    To promote the dissemination, exploitation and technology transfer of these assets, initial plans have
been designed and will be further elaborated as the project progresses.
    To foster the dissemination of scientific contributions, the plan targets: (i) industrial dissemination,
through participation in industry-oriented meetings and dedicated meetings; (ii) educational
dissemination, incorporating consolidated results at the end of the project into MSc and PhD courses.
    Regarding technology transfer and exploitation, the aim is to promote and maximize industry
collaborations; capitalization of knowledge and assets in future projects; network growth into new
domains. Therefore, an initial business plan has been designed. Such business plan follows the business
model canvas approach [10] including the following drivers:
      Create awareness in the partners’ ecosystems and beyond (e.g., local networks on the topics of
         DE and SE standardization through bodies like IREB, see above);
      Deployment of the project ideas into the health, insurance and finance, and open publications
         domains. Regarding these domains, we plan to conduct empirical studies in companies that
         expressed their interest in this project to validate the results of the project;
      Get advice from world-leading experts in specific areas to enable the encapsulation of
         meaningful transferable parts of the project that will ease the project results’ transferability and
         the elaboration of specific supporting tools that will be also integrated to offer an end-to-end
         support tool.
      The resulting tools from the project will be offered as open source in a public GitHub repository
         with the appropriate licenses and communicated through adequate channels, as for instance the
         ReachOut Platform2, aimed to connect research projects with beta testers and early users on the
         market.
      To offer a catalog of services of all exploitable resources from the project to foster and facilitate
         their adoption (installation, maintenance, etc.).
    All in all, based on the predicted economic impact of adopting AI/ML from diverse organizations,
we expect that the results from the project will contribute to such impact. On the one hand, the World
Economic Forum stated that AI/ML are expected to create 133 million new jobs globally by 2022. IDC
reported that AI/ML technology spending in Europe for 2019 has increased by 49% over the 2018 figure
to reach $5.2 billion. According to a recent survey, AI/ML tools globally are expected to reach US$119
Billion by 20253. Yet, according to market analysis firm McKinsey, “Most companies are capturing
only a fraction of the potential value from data and analytics. [. . . ] manufacturing, the public sector,
and health care have captured less than 30 percent of the potential value we highlighted five years
ago4.” This is particularly true in Europe, which is “lagging behind in embracing the digital and data
revolution5.” Consequently, any significant advance in the field will have a positive impact not only on
economic terms but also socially.

5. Relevance to information science
Interest in the development of artificial intelligence, with particular focus on ML has exploded in the
past decade5. This has drawn a lot of research activity and investment, enabling ML models to
continuously make gains in image recognition, language translation, object recognition, and other
applications. The latter has raised the need for incorporating ML models inside conventional software
systems, requiring a paradigm shift in terms of how these systems are developed and maintained.
   The DOGO4ML project aims to set the foundations for developing software systems that embrace
these new possibilities provided by ML. Consequently, the project is set to address different challenges

2
   https://www.reachout-project.eu
3
   https://medium.com/@RWW/how-ai-is-transforming-software-development-ba705e799ca4
4
   https://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/the-age-of-analytics-competing-in-a-data-
driven-world
5
   https://aiindex.stanford.edu/report
that appear as a result of intertwining ML and non-ML components. In particular, since ML models are
fed by data, one of the main challenges, with relevance to the information science discipline, is related
to data and information management. In this regard, DOGO4ML aims to develop an end-to-end
operational data governance framework, identifying and operationalizing the data management
lifecycle and analysis processes. Data management processes are responsible to ingest, store, process
and prepare data according to the requirements gathered, and then the ready to be consumed data are
the main driver enabling the data analysis processes. The complexity and iterative nature of these
processes requires a dedicated data and model governance that produces and gathers the required
metadata to automate, trace, monitor and assess specific quality requirements. The latter can be related
to (i) the learned models (e.g., model appropriateness or training time), and (ii) the data (e.g.,
quantifying data bias, data quality or query time when accessing the data views). The first are interpreted
and validated by domain experts assessing the quality of the current model, while the second are
interpreted and analyzed by the data and software engineers. Finally, once this loop of the data cycle is
closed, the resulting learned models are embedded and integrated in the MLSS in the form of an ML
component during the Dev phase, as described in Section 2.

6. Current results
Although the DOGO4ML project is at an initial stage, we may report some results. To gain a deep
insight about the state-of-the-art related to the project, [11] conducted a systematic mapping study about
SE approaches for building, operating, and maintaining AI-based systems. This mapping study provides
a consolidated background to tackle the project tasks. Furthermore, [12] developed and scrutinized a
generic method that allows to generate pre-processing pipelines, as a step towards automating the data
preparation for ML, which in turn is a critical step for the data analysis part.
   DOGO4ML will use SME as the conceptual framework for defining MLSS lifecycles (see Section
2.3). As a first result to elaborate this framework, [13] proposed a holistic method, applying SME, to
consider data as a new source in requirements elicitation for data-driven systems, as the case of MLSS.
   Regarding the development support expected outcomes, DOGO4ML aims at providing a software
reference architecture for MLSS and tool support to instantiate this architecture to a given context
considering domain requirements. In this sense, an analysis of the impact of design decisions on the
achievement of high-accuracy and low resource-consumption in the context of AI mobile applications
are provided in [14]. Additionally, in the sentiment analysis domain and as a proof-of-concept, [15] and
[16] proposed an architecture able to monitor and analyze the sentiment of tweets shared by end-users.

7. Conclusions
In this paper we have presented the goals and vision of the DOGO4ML project. The expected outcomes
and initial results are detailed. So far, all the project tasks are being developed as expected and the first
results confirmed that the project is progressing in the right direction. More information is available on
the project website, https://dogo4ml.upc.edu/en.

Acknowledgements
   This paper has been funded by the Spanish Ministerio de Ciencia e Innovación under project /
funding scheme PID2020-117191RB-I00 / AEI/10.13039/501100011033.

References
[1] A.L. Samuel, Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of
    Research and Development 44 (1959) 206–226. doi: 10.1147/rd.33.0210.
[2] M. Oriol, S. Martínez-Fernández, W. Behutiye, C. Farré, R. Kozik, P. Seppänen, A. M. Vollmer,
    P. Rodríguez, X. Franch, S. Aaramaa, A. Abhervé, M. Choraś, J. Partanen, Data-driven and Tool-
     supported Elicitation of Quality Requirements in Agile Companies, Software Quality Journal 28
     (2020) 931-963. doi: 10.1007/s11219-020-09509-y.
[3] E. Zavala, X. Franch, J. Marco, Adaptive Monitoring: A Systematic Mapping, Information and
     Software Technology 105 (2019) 161-189. doi: 10.1016/J.INFSOF.2018.08.013.
[4] L. López, M. Manzano, C. Gómez, M. Oriol, C. Farré, X. Franch, S. Martínez-Fernández, A.M.
     Vollmer, QaSD: A Quality-aware Strategic Dashboard for supporting decision makers in Agile
     Software Development, Science of Computer Programming 202 (2021). doi:
     10.1016/j.scico.2020.102568.
[5] V. Khatri, C.V. Brown, Designing Data Governance. Communications of the ACM 53 (2010) 148-
     152. doi: 10.1145/1629175.1629210.
[6] S. Martínez-Fernández, X. Franch, A. Jedlitschka, M. Oriol, A. Trendowicz, Developing and
     Operating Artificial Intelligence Models in Trustworthy Autonomous Systems. In: Cherfi, S.,
     Perini, A., Nurcan, S. (eds) Research Challenges in Information Science. RCIS 2021. Lecture
     Notes in Business Information Processing, vol 415. Springer, Cham. doi: 10.1007/978-3-030-
     75018-3_14
[7] B. Henderson, J. Ralyté, P. Ågerfalk, M. Rossi, Situational Method Engineering. Springer, 2014.
[8] O. Cabrera, X. Franch, J. Marco, 3LConOnt: A Three-level Ontology for Context Modelling in
     Context-aware Computing. Software System and Modelling 18 (2019) 1345–1378. doi:
     10.1007/s10270-017-0611-z.
[9] X. Franch, J. Ralyté, A. Perini, A. Abelló, D. Ameller, J. Gorroñogoitia, S. Nadal, M. Oriol, N.
     Seyff, A. Siena, A. Susi, Situational Approach for the Definition and Tailoring of a Data-Driven
     Software Evolution Method, in: Proceedings of the Advanced Information Systems Engineering,
     CAiSE’18, Springer, 2018, LNCS 10816, pp. 603-618. doi: 10.1007/978-3-319-91563-0_37.
[10] A. Osterwalder, Y. Pigneur, Business Model Generation: A Handbook for Visionaries, Game
     Changers, and Challengers. Wiley, 2010.
[11] S. Martínez-Fernández, J. Bogner, X. Franch, M. Oriol, J. Siebert, a. Trendowicz, A.M. Vollmer,
     S. Wagner, Software Engineering for AI-Based Systems: A Survey, ACM Transactions on
     Software Engineering and Methodology, Vol. 31, No. 2, Article 37e (2022), 59 pages. doi:
     10.1145/3487043
[12] J. Giovanelli, B. Bilalli, A. Abelló, Data pre-processing pipeline generation for AutoETL,
     Information Systems, In Press, 2021. doi: 10.1016/j.is.2021.101957.
[13] X. Franch, A. Henriksson, J. Ralyté, J. Zdravkovic, Data-Driven Agile Requirements Elicitation
     through the Lenses of Situational Method Engineering, in: Proceedings of the IEEE 29th IEEE
     International Requirements Engineering Conference (RE’21), RE@Next! Track, 2018, pp. 402-
     407. Doi: 10.1109/RE51729.2021.00045.
[14] R. Creus, S. Martínez, X. Franch, Which Design Decisions in AI-enabled Mobile Applications
     Contribute to Greener AI?, ESEM 2021, URL: https://arxiv.org/abs/2109.15284.
[15] A. de Arriba, M. Oriol, X. Franch, Merging Datasets for Emotion Analysis. An Approach using
     BETO on Spanish Tweets, 2nd International Workshop on Software Engineering Automation: A
     Natural Language Perspective, NLP-SEA@ASE‘21). doi: 10.1109/ASEW52652.2021.00051.
[16] A. de Arriba, M. Oriol, X. Franch, Applying Transfer Learning to Sentiment Analysis in Social
     Media, Proceedings of the 5th International Workshop on Crowd-Based Requirements
     Engineering (CrowdRE’21), 2021, pp. 342-348, 2021. doi: 10.1109/REW53955.2021.00060.

</pre>