Questionnaire Development for a Scientifically
Founded Agile Assessment Model
Doruk Tuncel1 , Christian Körner1 and Reinhold Plösch2
1
 Siemens AG, Otto-Hahn-Ring 6, Munich, 81739, Germany
2
 Department of Business Informatics Software Engineering Johannes Kepler University Linz, Altenberger Str. 69, Linz,
4040, Austria


                                         Abstract
                                         Agile software development methodologies have been a focal point of industry and academia over the past
                                         two decades. Organizations are interested in extracting the value of agile methodologies and increasing
                                         their business success, however, it is evident that merely applying agile practices does not necessarily
                                         make organizations agile. We are building a context agnostic agile assessment model based on agile
                                         values and principles to enable contextually appropriate self-assessments, and support organizations’
                                         agile transformation endeavors. In this paper, we share the results of our expert interviews with the
                                         focus on two principle pillars: Human Centricity and Technical Excellence. Results of this study show
                                         that the proposed assessment questionnaire appropriately addresses highly relevant aspects of agility. It
                                         is found practically useful by the experts, yet the coverage of human centricity targeted questions can be
                                         improved.

                                         Keywords
                                         agile, maturity assessment, process improvement


1. Introduction
As we approached the end of second decade mark of the publication of the Agile Manifesto [1],
agile software development methodologies have been investigated by academics, and advocated
by many of the industry leading organizations over the past years [2]. Whether named agile
transition or transformation, programs towards becoming an organization that can cope easier
with the changing market demands, customer needs and technology stacks have been frequently
promoted. Some of these programs are also called digital transformation programs which, in
essence, highlight the relevance and importance of the notion of change and transformation
for surviving in the highly competitive volatile, uncertain, complex and ambiguous (VUCA)
business landscape. While these programs motivate organizational units to adopt certain agile
frameworks, or to adhere to certain best practices of agility, it is evident that focusing merely
on practices misses the essence behind agile adoption journeys.
   Therefore we focus on the values and principles of the Agile Manifesto in building an agile
assessment model (AAM). With this agile assessment model, we aim to support organizations

IWSM Mensura’22: International Workshop on Software Measurement and the International Conference on Software
Process and Product Measurement, September 28–30, 2022, Cesme, Izmir
Envelope-Open doruk.tuncel@siemens.com (D. Tuncel); christian.koerner@siemens.com (C. Körner); reinhold.ploesch@jku.at
(R. Plösch)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
in their transformation journeys and help them identify how agility is lived within their units,
irrespective of any agile framework. This model, in our opinion, has the potential to support
organizations in conducting contextually appropriate (self)-assessments, judging their own
capabilities with respect to their own targets, and identifying areas of improvement that are not
pre-imposed as global optima by any of the agile frameworks.
   During our previous research [3], we have identified a comprehensive set of agile assessment
models, and noticed that the models with scientific foundations do not address some of the
aspects that are very important and relevant for real life application scenarios such as large scale
organizations, required industry standards for safety critical systems, application domains where
hardware development is an inseparable element of the value delivery. While many contributions
provide valuable insights towards which agile practices are useful for an organization to adopt,
the underlying meta-model structures of the proposed agile assessment models tend to overlook
foundational aspects of agility, as most of them do not explicitly reflect on the values and
principles of the Agile Manifesto.
   Although the industrial relevance of the problem is clear, there have been only a few attempts
for practically validating the existing models. Given the additional limitations of the existing
models we discussed in [3], we decided to develop a model that is scientifically founded, yet is
capable of providing practical use to the industry in varying contexts [4]. It is also important
to mention that, agile at large scale was listed among the top 10 burning research questions
[5]. For taxonomic clarity throughout the paper, we keep an alignment with [6]. To continue
our endeavour in further validating our model elements, we focus on the following research
questions (RQ) in this paper:

    • RQ1: Do the questions belonging to Human Centricity and Technical Excellence principles
      cover the concepts related to the respective principles they represent?
    • RQ2: Is the proposed mechanism for addressing the large scale product development
      applicable from a questionnaire perspective?

  The main contribution of this research is the result of the expert interviews we conducted
across different business units of Siemens towards the validation of the questions for Human
Centricity and Technical Excellence principles. We select these two principles because if we
can successfully design a robust question development mechanism to serve the needs of these
two principles with two distinct stances, this will guide us for the remaining three principles,
whose stances fall in between human centricity and technical excellence. The validation focuses
on (1) the understandability and feasability of the questions for these principles and (2) the
completeness of the questions with respect to the principles. This validation is necessary
and valuable for our future research, as it allows us to adjust our self-defined questionnaire
development guidelines, and to incorporate the feedback in the development of the questions
for the remaining principles.


2. A Novel AAM
The structural components of the proposed agile assessment model have been discussed in our
previous paper [4]. In order to increase the understandability of this paper’s contribution, we
present the underlying vocabulary in Section 2.1. Section 2.1.3 describes the questionnaire that
is built on the assessment model. Finally, Section 2.2 elaborates on the novel aspects of both the
assessment model and the method.

2.1. Model Structure
Our model consists of two main structural components: Principles and Clusters. These structural
components correspond to the static aspect of the overall assessment system. Principles are the
abstract components that are constructing the areas under which agility should be evaluated.
After an extensive literature review, the principles are derived particularly from the model
proposed by Sidky et al. [7]. Clusters, on the other hand, represent one lower level of abstraction,
containing groups of certain practices of agile software development methodology. We are
reusing the principles and clusters as we published and validated in XP’21 [4].

2.1.1. Principles
The proposed model consists of five principles: (1) Embrace change to deliver customer value
(2) Plan and deliver software frequently (3) Human centricity (4) Technical excellence and (5)
Customer collaboration with no particular order of importance. Principles provide the frame of
the assessment model, and are populated with clusters.

2.1.2. Clusters
The proposed model consists of seventeen clusters. As the numbers suggests, clusters are not
enforced to be equally distributed across the principles. The clusters came into life as a result of
semantic grouping the identified agile software development practices in our literature study [3].
Due to space limitations, the cluster names are excluded in this section, however, particularly
relevant clusters for this research, along with their codings, can be found under Section 5.

2.1.3. Questionnaire
The questionnaire is the backbone of the model, as it defines the core of the evaluation. For
each cluster there exist multiple questions, and sum of all clusters’ questions form the entire
questionnaire. In order to systematically develop questions and answer option statements
for each capability level-question pair, we setup a meta-structure as a guideline. Questions
are therefore either related to specific aspects of artifacts. We also defined, how to bring in
indicators, metrics or KPIs at which capability level of a question. We additionally decided to
differentiate from SPICE [8] and CMMI [9] to work with specific practices on all capability levels,
and therefore to avoid generic practices at higher capability levels. The developed questionnaire
contains the questions, capability levels along with their underlying definitions as well as the
answer option statements dedicated to each capability level-question pair. The overall capability
level definitions enables systematically defining the descriptions for each capability level per
question. This additional layer of granularity is an important differentiation point of our design
from the existing models. This is because while other models are evaluating whether certain
agile practices are performed in the organization, we are going beyond and evaluating the degree
of performing those practices, and quality of the produced artifacts. As our model consists
a total of seventeen clusters, any evaluation may easily expand to an impractical amount of
questions for each cluster, making this a minimum spanning problem. Therefore, in order
to increase the practical applicability of the proposed assessment model our aim is to reach
the maximum coverage of agility by means of the minimum set of questions that allow us
to maintain practical applicability without sacrificing quality. Due to space limitations, the
questions of both the Technical Excellence and Human Centricity Principles are provided at
https://bit.ly/3NG8glr.
   Figure 1 visualizes a question from a cluster under Technical Excellence. In total, the validation
covers eight different clusters consisting total of thirty two questions. For each question, the
provided answer option statements per capability level brings in another level of complexity to
the overall design with total of hundred sixty statements.
   Remaining elements of the overall assessment method (e.g., assessment guidelines, aggrega-
tion and evaluation mechanisms) are mentioned under Section 8.2.


Figure 1: Sample Question


2.2. Novelty
Our previous study [3] revealed that, fundamental aspects are not present in the existing models.
Therefore we introduce a new model in [4] instead of substantially reworking existing models
which are fundamentally different by design. The novelty of our proposal resides under both
the static and dynamic components. Features such as having theoretical and practical validation,
providing guidance for improvement, showing an explicit awareness of the agile values and
principles of the Agile Manifesto or even paying attention to human centric aspects of agility
are not very common in the existing assessment models.

2.2.1. Artifact Orientation
Artifact orientation is one of the core guiding aspects of our model development approach. It is
important because constructing an assessment model on artifacts contributes to the objectivity of
the assessment results. In our model, the artifact orientation is reflected under the questionnaire.
   Some artifact examples of the Technical Excellence principle’s Testing cluster are “Test
Strategy”, “Test Plan” and “Unit Test”. If we take test strategy, while mere existence of a test
strategy itself may hold value to an extent, the provided leveling mechanism allows identifying
a range of states: whether the importance of having a test strategy is acknowledged, whether
a test strategy is in-use up to the point where an existing strategy is being reduced down to
its essentials and new, contextually supportive strategies are introduced. Evaluating the state
of an artifacts not only supports the endeavour of conducting objective evaluations, but also
provides a concrete path of improvement to the assessed entity.

2.2.2. Enabling Self-Assessments
Enabling self-assessments is another guiding aspect of our model development endeavor. This
aspect is important mainly for two reasons. The first one is about the practical applicability of
the assessment model. The questions are designed and validated during the design iterations, to
ensure that they are understandable. In industry, assessments hold a great potential in providing
units with a realistic image of their current state and actionable insights for improvement
opportunities. With our model, we encourage units to conduct their own assessments and
identify improvement areas with respect to their own targets. To realize that, we designed
the questions of the questionnaire without any framework-specific terminology and provided
guiding examples for each question. Second, the leveling mechanism has its own underlying
descriptions to ease the assessment process during self-assessments. The aforementioned
artifact orientation for objectivity is also a contributing factor to this aspect.

2.2.3. Modularity
The modularity corresponds to the aspect of the assessment model that enables contextually
appropriate assessments. To achieve modularity, we separate the concerns of different context
defining elements in our design. For example, questions under each cluster are categorically
labeled as being relevant only for large scale organizations. This gives our model the flexibility
to discard contextually inappropriate questions, and to adapt the questionnaire size. Similar
mechanism applies to other context defining elements such as having hardware development
concerns, or having geographically distributed units. Having such modular structure enables
the questionnaire to be tailored specifically to the changing organizational contexts.


3. Methodology
In our journey towards establishing a practically applicable agile assessment model that is both
academically and practically validated, we designed our research method as a design science
research methodology as proposed by Hevner et al. [10]. In order to ensure the soundness of the
model and questionnaire development activities, we structured our own research in the form of
an agile process, where we conduct interview studies with the experts in our organization and
integrate the feedback into the method development process. While the design science research
methodology provides the overall frame of research, in order to integrate the learning outcomes
of the academic and industrial validation steps, we iteratively realize design science research
in alignment with action design research defined by Sein et al. [11]. By doing so, feedback is
systematically collected from the potential consumers of the design artifact.
4. Related Work
Sidky et al. [7] propose one of the early models for guiding organizations in their agile adoption
journey. Authors define an agile adoption framework with two core elements: agile measurement
index and four stage process. These two elements support organizations in adopting agile
practices. The agile measurement index consists of agile levels, principles, practices and indicators.
Five agile principles of the measurement index are derived from the Agile Manifesto and the
practices populate the evaluation matrix. The four stage process utilizes the described agile
measurement index. The model is explicitly highlighting the importance of tailorability of the
five levels, by describing the challenges behind reaching a consensus on the assignment of
practices to the levels.
   Qumer and Henderson-Sellers [12] provide an agile solution framework for software. The
model is built upon a conceptual agile model that is accompanied by an analytical tool consisting
of four dimensions and an agile toolkit. The core of the method comprises the following five
aspects: Agility, People, Process, Product, Tools, and an additional Abstraction aspect to reflect
the agile software development methodology. While the agile toolkit consists of seven main
components, the provided analytical tool focuses on the following four dimensions: Method
scope, Agility characterization, Agile value characterization and Software process characterization.
In order to support these two framework components, the authors propose the Agile Adoption
and Improvement Model (AAIM) that is built on three agile blocks: Prompt, Crux and Apex and
six agile levels. The study highlights the importance of supporting situation specific scenarios
in software engineering.
   Fontana et al. [13] suggest an agile maturity framework based on the complex adaptive
systems theory. Authors emphasize ambidexterity as a fundamental attribute towards maturity
and the described framework focuses on outcomes rather than agile practices. The fundamental
positioning of people in the software development processes is made explicit, while the trade-off
between exploitation and exploration is mapped to the trade-off between the specific outcomes
and adopting new practices. As a result of their cross-case analysis, the following six outcomes
were provided: Practices, Team, Deliveries, Requirements, Product and Customer. The study
concludes by underlining the importance of allowing context-specific practices in the maturing
process while considering the values of the Agile Manifesto.


5. Evaluation Results
This section discusses the results of the expert interview study conducted with 7 experts. The
demographics of the experts can be found in Figure 2.
  Each interview was designed as 45-60 minutes, per principle. When the area of expertise fell
more towards ethics, communication, collaboration and not necessarily to software engineering
on the implementation level, only the human centricity principle was evaluated. The data
collection was both in the form of likert scale questions, as well as open text for further
descriptions. Likert scale questions along with their codings as described below, represent the
quantitative results. These are provided in the form of heatmaps in the Figures [3 - 6]. Open
text questions provide the qualitative results in accordance with [14] and are either part of
Figure 2: Demographics


Section 5 or form the basis of Section 6 and Section 8.
  In order to deliver the results in a coherent way, the following subsections will share the results
regarding the questionnaire, human centricity principle and technical excellence principle under
Section 5.1, Section 5.2 and Section 5.3 respectively.
  The coding mechanism of results in the following figures will be as follows: Strongly Agree: 5,
Agree: 4, Neither/Nor: 3, Disagree: 2, Strongly Disagree: 1, while E1 to E7 refer to the interviewed
experts. For the cluster mappings of the Human Centricity Principle, we use Psychological
Safety (HC1), Effective Communication (HC2), Unit Empowerment and Autonomy (HC3), Unit
Collaboration (HC4) and Personal Growth (HC5). Same mechanism applies for the Technical
Excellence Principle as, Design and Coding Practices (TE1), Testing (TE2) and Data Driven
DevOps (TE3).

5.1. Generic Results
The overall results show that the relevance of the artifacts used in the questionnaire, and the
coverage of the questions with respect to the clusters they are asked under, do differ. As it can
be observed in Figure 3, all but one of the experts that are involved in software development
on the level of implementation agree, if not strongly, that the provided artifacts are relevant
for representing their associated clusters. On the other hand, as it can be seen in Figure 4,
the coverage of the questions with respect to the cluster they are associated shows that the
questions of Psychological Safety (HC1) cluster lack the expected coverage.
   Promising results with respect to coverage show that the rules we used for developing the
questions as described under Section 2.1.3 were reliable reference points, as judging the coverage
of a cluster requires full understanding of the questions and their formulations.
   Another encouraging result is regarding the evolutionary leveling mechanism, where all 7
experts agreed that the leveling mechanism is suitable for differentiating between the capabilities
of organizations with respect to agile software development methodology.
   For the interpretation of Figures [5-6] the following question mappings will be considered: Q1:
I think the set of clusters sufficiently reflects the underlying principle in an agile organizational
Figure 3: Relevance of the artifacts used in the questionnaire, per cluster


unit. Q2: I think this mechanism of adding a question for larger scales is suitable for this
principle. Q3: I think an evaluation based on this questionnaire would help my unit to identify
improvement areas. Q4: I can imagine myself using this model for a self-assessment of my unit.
Q5: This model would be useful for capturing the state of agility of my unit.
   Results of our interviews show that not only the structural components, but also the pro-
vided questions for both principles have been separately evaluated and agreed to be providing
sufficient coverage for a practical evaluation as well. It is important to mention again that
the evaluation is not on the questions only, but also on the statements we provide for each
capability level(L1-L5) as depicted in Figure 1. Overall, only one expert(E4) disagreed to use
Human Centricity Questionnaire. 5 out of 7 experts agreed that they would be willing to use one
of the questionnaires in real life while 4 out of 7 stated the same for both of the questionnaires.


Figure 4: Coverage of the questions used in the questionnaire, per cluster
5.2. Human Centricity Specific Results
As it can be observed from Figure 5, questionnaire belonging to this principle, excluding one
expert, received particularly positive feedback with respect to its industrial applicability potential.
In addition to that, Figure 5 highlights the coverage of the Human Centricity Principle by the
proposed five clusters was very well received by the experts. It can also be observed in the same
figure, on the level of a principle, the mechanism for addressing the large scale development
aspects with respect to human centricity found to be suitable by most of the experts. On the
level of clusters, addressing the scaled aspects were evaluated as sufficient by three experts for
Psychological Safety and Effective Communication. For Unit Autonomy and Empowerment, and
for Personal Growth, the decision of not including an additional question for addressing large
scale was also found appropriate. Further, combining autonomy and empowerment concepts
under one cluster instead of separately addressing them received positive feedback. As a side
note, although keywords such as purpose, meaning, engagement or valuing someones work,
were articulated during the discussions on the Unit Autonomy and Empowerment, the current
placement of these keywords under the Psychological Safety was found appropriate.


Figure 5: Representativeness of human centricity clusters and addressing human centricity at scale


   Regarding Human Centricity Principle, “failure” was highlighted as an important missing
aspect of Psychological Safety by one of the experts. Moreover, it was suggested for large scale
product development organizations that if there exists a misalignment between teams with
respect to the artifacts and methods associated with psychological safety, the questionnaire
should be able to address those and make them visible to the members of the organization.
Another remark with respect to Psychological Safety was the importance of capturing abstract
notions such as trust, integrity and transparency. In terms of Effective Communication, the
domain of communication engineering along with its features (e.g., structure, construct, time,
type) was commented as a useful component that can be additionally considered. Importance
of in person gatherings was also discussed in relation to COVID-19 related developments and
their consequences. It was also reported that while tool support is very essential, the utiliziation
of tools may differ among different scales of an organization. A critical view on Unit Autonomy
and Empowerment was highlighting that the existence of organizational structures supporting
autonomy do not necessarily mean that the organization can operate autonomously, as it relies
on people and their intrinsic motivation to own certain tasks and work in an autonomous way.
Additionally, even the individuals are intrinsically motivated towards self-organization and
working autonomously, one of the experts commented that the resource limitations may hinder
the process, even if the will is there. For Unit Collaboration, one of the expert’s recommendation
was framing the collaboration aspect as an activity of mutual artifact creation and therefore
suggested us to inspect joint efforts going into the creation of any software artifact such as
revision repositories. When it comes to the Personal Growth, following two aspects were
commented as important: first, how growth is perceived in the organization and second, the
relevance of decoupling the views of the provider and receiver of any learning component.

5.3. Technical Excellence Specific Results
As it can be observed from Figure 6, questionnaire belonging to this principle received particu-
larly positive feedback with respect to its industrial applicability potential.
   As a positive remark, except one expert, questions associated with each cluster of Technical
Excellence Principle are found to be easy to understand. Moreover, regarding addressing the
large scale product development on a principle level, Q2 shows that the suitability of the
proposed mechanism is agreed by all experts. On the cluster level, the questions tailored for
the scaled aspects of Design and Coding Practices was also found to be sufficient in addressing
the large scale development by all but one expert. The comment from that one expert was
the necessity of including “interface and cross-component connectivity” under this cluster.
When it comes to Testing, the scaled aspects received positive response from three out of five
experts. The remaining two highlighted the importance of an “overall test architecture” and
suggested “test strategy” to be explicitly considered also at large scale. Data Driven DevOps as
well received positive response from three of the experts with respect to completeness, while
the other two highlighted the importance of “purposeful measuring” and handling metrics and
dashboards separately.


Figure 6: Representativeness of technical excellence clusters and addressing technical excellence at
scale

   Clusters of the Technical Excellence Principle received additional comments regarding their
coherence and completeness. Design and Coding Practices was primarily evaluated with high
relevance and sufficiently good coverage with its underlying artifacts and methods. Conse-
quently, it did not received any negative feedback, however, one of the experts shared its own
preference towards considering software architecture related aspects under this cluster more
explicitly. Following comment was emphasizing the importance of reflecting the architectural
decisions both on team and large scale product development level. Testing received additional
input regarding the level of granularity of its questions. Two experts stated that the joint
representation of two questions under this cluster may reduce the assessment time without
compromising the quality of results. In terms of Data Driven DevOps, an improvement sugges-
tion was towards the clarification of the “incident documentation” artifact. It was denoted by
one of the experts that incident documentation may not be a very informative artifact, if the
area of incident is not specified within the questions.


6. Discussion
In order to avoid the potential risk of leading to human fatigue, the scope of this research
focuses on the questionnaires of two principles. In order to deliver the results in a coherent
way, the subsections will address the following: our findings that are not specific to either of
the principles, findings that are specific to the human centricity principle and findings that
are specific to the technical excellence principle, under Section 6.1, Section 6.2 and Section 6.3
respectively.

6.1. Generic Findings
As the main finding, it is important to highlight that both questionnaires are found to be
applicable and showed potential for being utilized in the industry. As mentioned in Section
5, questions Q3, Q4 and Q5 refer to experts’ opinion regarding their usage of this model for
their own purposes and the responses to these questions are very promising and encouraging.
Artifact orientation in the questionnaire was very well received, which is an important reference
point and learning outcome for future researchers to consider.
   On the other hand, understandability of the questionnaires were very high. Surely, there are
suggestions and minor improvement possibilities for each principle. The results of the expert
interviews show that the questions of the Technical Excellence Principle have a better coverage
than the questions of the Human Centricity Principle. Here, the Psychological Safety stands
out as a cluster with a relatively lower coverage. Our interpretation is that, as a very intangible
and subjective concept, psychological safety becomes hard to evaluate on an artifactual basis.
   Another important remark was on setting targets appropriately. The ability of establishing
targets is very fundamental to any improvement activity, however, setting targets carelessly
brings the risk of jeopardizing the optimization and continuous improvement activities. This is
usually perceived as a hidden risk, as the individuals may believe that the organization is on its
way to substantial optimization, without noticing that the optimization is directed towards a
wrong, or sub-optimal target.
   Besides the aforementioned generic findings, below we also address our research questions
based on our findings.
   RQ1: Do the questions belonging to Human Centricity and Technical Excellence
principles cover the concepts related to the respective principles they represent? It is
clearly indicated that the existing clusters sufficiently cover the principles they are positioned
under. This is a very important finding to ensure the reliable foundation of the structural
components of the overall assessment system, as the model components provide the foundation
for it.
   RQ2: Is the proposed mechanism for addressing the large scale product development
applicable from a questionnaire perspective? As agile at large scale is one of the contem-
porary areas of research, and as organizational scale is one of the context defining components
of our modular approach, we provided a mechanism for addressing the aspects of clusters that
are only relevant for large scale organizations. Our mechanism for addressing the large scale
aspects of agility has received very positive feedback and was found useful by the experts.

6.2. Human Centricity Specific Findings
Regarding human centricity, one of the essential comments was that the leadership, whether low,
medium or high level, plays an important role in the organizations’ ability to act autonomously.
This was suggested as an aspect that would, once made explicit, increase the coverage of the
Unit Autonomy and Empowerment. In the model structure, notion of leadership is implicitly
considered under Unit Autonomy and Empowerment, yet, is expected by the interviewees to be
more explicitly represented. Another aspect that was frequently discussed during the interviews
was the contrast between enabling the members of the organization, and actively supporting
them.
   Under Effective Communication and Unit Collaboration, it was also discussed that the tech-
nology or best practice sharing as well as the community of practice like initiatives increase the
cohesion among the organizational structures, and contributes to the organizational synergy.
Regarding the unit collaboration, a systematic enabler for an organization was mentioned as
establishing rewarding mechanisms and objectives that are supportive of the collaboration
across units, and not inhibiting their information experience sharing. To us, it is also clear that,
once organizations establish the rewarding mechanisms and objectives in a way that they do
not conflict with other units, there exist higher chances for collaboration to foster organically.
   It is evident that the new ways of work, along with the transformations COVID-19 brought
into our professional lives, shifted the medium of work to online tools and platforms. This came
up particularly under Personal Growth in relation to the training and learning topics. It was
commented that, be it technical or personal, the importance and effectiveness of on-site trainings
should not be overlooked. It was discussed from the trainers point of view that tracking the
learning progress of an individual becomes very challenging and observation based evaluation
opportunities are lost in an online setting.
   One particular aspect related to the personal growth was the consideration of time and
motivation as fundamental resources for individuals’ growth journey. It was commented for
many individuals, that it is not necessarily about whether there are enough training offerings
or external resources for a learning task, rather it is about having the dedicated hours for a
particular training, or about knowing the purpose and meaning behind the learning activities.
   Additional discussion regarding human centricity was also in the direction of taking action,
instead of just communicating the importance of a concept. With one of the experts, the
necessity of making a distinction between communicating the importance of something and
actively doing something was underlined, as this was perceived as an important aspect for
capturing the actual state of agility being practiced and not staying only theoretical. This
distinction can be made explicit by employing verbs such as enabling, realizing or instantiating.
6.3. Technical Excellence Specific Findings
The overall findings regarding the technical excellence principle were very positive and there
were no major change requests.
   From one of the senior experts from the testing domain, the assessment questionnaire was
described as having a good structure for self assessments and positively commented on its ability
to guide the individuals. Further, clusters were perceived as very valuable for representing the
technical excellence, particularly in a software development organization.
   In parallel, there was the discussion about the testing cluster’s level of granularity and
coherence in terms of representing testing related activities such as unit testing, integration
testing, system testing, their automation and systematic advanced applications. The conclusion
was distributing automated testing and advanced testing methods as an element of higher level
capabilities, rather than having a standalone question for them.
   For technical excellence, another main discussion line was on the infrastructure and envi-
ronment orchestration. When it comes to technical excellence and the quality of the software
products, it was highlighted that along with the dominance of new architectures and technolo-
gies such as microservices, containerized applications and automated pipelines, the technical
excellence requires organizations to have a new perspective on their delivery pipelines. Particu-
larly at large scale product development, delivery pipelines and their automation configurations
and testing activities have become a critical player in the value streams. As a result of that, the
importance of paying as much attention to environments and delivery pipelines as the source
code itself was stressed. Representative example could be having a containerized, shippable
product, yet having failures in the container application itself.
   An additional highlight under technical excellence is that, although monitoring and data
collection for systems have a great potential in providing valuable insights, organizations need
to identify what and why they are willing to measure certain indicators. If not, may result in
overwhelming amount of data that is of little value and do not provide any actionable feedback
onto system.
   Another discussion point with respect to addressing the large scale product development
concerns was the following: while the alignment of terminology, practices and methods across
multiple teams is beneficial, it should be kept in mind that enforcing certain best practices onto
other teams may result in reduced productivity. One should pay utmost attention not to fall
into the alignment trap where the effort towards ensuring alignment starts negatively affecting
autonomy.
   Additionally, when it comes to addressing the aspects regarding the large scale, it is important
to remember that while large scale product development may require handling totally new
artifacts and methods, it may as well be the case that certain team level artifacts and methods
may remain relevant in a slightly different form. An example to this could be, while data
collection and monitoring are relevant for both on a team as well as for product level, the data
to be monitored and questions that need to be answered are different for one team and for a
large scale product development process.
7. Threats to Validity
In this section, we discuss the validity threats and our attempts to ensure a high quality of
research by keeping these threats minimal. The initial consideration of the threats to validity
have been the four well known threats as described by Yin [15]: Construct Validity, External
Validity, Internal Validity and Reliability. These threats to validity have been adapted to the
domain of software engineering by Runeson et al. [16]. Within the scope of this paper, as our
methodological focus is not the case study research, we address three most relevant threats for
our methodology.

7.1. Construct Validity
Construct validity refers to the mechanism which entitles researchers to address the mutual
understanding of the concepts being discussed and vocabulary being used. To address threats
on construct validity, we have provided clear examples in our interview guidelines to establish
the boundaries for discussion. Further, exactly the same examples and artifacts were used
across each of the interviews. We also asked interviewees if the provided examples were helpful
for them to have clarity of discussion. All experts validated that providing concrete examples
within the questionnaire was helpful for them.

7.2. External Validity
External validity refers to the generalizability of the generated results by the research activities.
To address external validity threats, we consulted experts from different divisions, and put
effort in gaining viewpoints based on geographical diversity. By doing so, our aim was to have
a sample that is representative of different business strategies, work cultures and disciplines,
yet, having seven experts partnering in our interviews remains as a threat to external validity.
External validity is also affected by the fact that, although it is a very diverse conglomerate, all
experts are within the Siemens organization, and may be expected to share common values and
thinking patterns of the organization to a certain degree.

7.3. Reliability
Reliability refers to the reproducibility of the research results. If the research have been
conducted by different researchers, the same results should be derived. To ensure the reliability
of our research, we included additional questions and objective interview guidelines for receiving
feedback on the complexity of the interview. The interview results have been provided to the
experts for an additional approval step.


8. Conclusion
8.1. Summary
This study follows on the previous two steps of our research [3], [4] that we have published in
SEAA’20 and XP’21 respectively. We share the results of the interview study that examines the
questionnaires we have developed with respect to the Human Centricity and Technical Excellence
principles.
   Our aim in questionnaire development is to reach maximum coverage with minimal set of
questions, by reducing complexity. The results of our interview study shows that not only the
structural components falling under human centricity and technical excellence provide sufficient
coverage, the provided questionnaires both for human centricity and technical excellence have
been evaluated as providing sufficient coverage for a practical evaluation as well. Overall,
only one expert disagreed to use one of the two questionnaires. Final and perhaps the most
essential remark for us has been that 5 out of 7 interviewees agreed that they would be willing
to use one of the questionnaires in real life while 4 out of 7 stated that they would be willing
to use both of the questionnaires. As already mentioned in Section 2.1.3, the development of
the questionnaire was carried out on the basis of guidelines we developed beforehand. The
validation of the clusters of the two principles is a good indicator for us now, that the guidelines
used for questionnaire development proved to be useful and will be used for the development
of the questions for the remaining clusters and principles.

8.2. Future Work
In the process of preparing this publication, we have completed the questions for the remaining
clusters. First, we will be integrating the interview results in the questionnaire and implementing
minor improvements as part of the intervention step of our action design research. Minor
improvements regarding the example practices, and certain component name alterations will
also be continuously considered. We will be finalizing our aggregation mechanism, as having
an aggregation mechanism is important for being able to provide quantitative results out of the
collected assessment data. Having this done we will apply the model in a pilot project in order
to validate the overall suitability of the assessment model and method in an industrial setting
by means of a case study.


References
 [1] K. Beck, et al., Manifesto for agile software development, 2001. URL: http://www.
     agilemanifesto.org/.
 [2] P. Rodríguez, J. Markkula, M. Oivo, K. Turula, Survey on agile and lean usage in finnish
     software industry, in: Proceedings of the 2012 ACM-IEEE International Symposium on
     Empirical Software Engineering and Measurement, IEEE, 2012, pp. 139–148.
 [3] D. Tuncel, C. Körner, R. Plösch, Comparison of agile maturity models: Reflecting the
     real needs, in: 2020 46th Euromicro Conference on Software Engineering and Advanced
     Applications (SEAA), IEEE, 2020, pp. 51–58.
 [4] D. Tuncel, C. Körner, R. Plösch, Setting the scope for a new agile assessment model:
     Results of an empirical study, in: International Conference on Agile Software Development,
     Springer, 2021, pp. 55–70.
 [5] S. Freudenberg, H. Sharp, The top 10 burning research questions from practitioners, Ieee
     Software 27 (2010) 8–9.
 [6] T. Dingsøyr, T. E. Fægri, J. Itkonen, What is large in large-scale? a taxonomy of scale for
     agile software development, in: International Conference on Product-Focused Software
     Process Improvement, Springer, 2014, pp. 273–276.
 [7] A. Sidky, J. Arthur, S. Bohner, A disciplined approach to adopting agile practices: the agile
     adoption framework, Innovations in systems and software engineering 3 (2007) 203–216.
 [8] T. P. Rout, K. El Emam, M. Fusani, D. Goldenson, H.-W. Jung, Spice in retrospect: De-
     veloping a standard for process assessment, Journal of Systems and Software 80 (2007)
     1483–1493.
 [9] C. Institute, Cmmi model v2.0 in retrospect: Developing a standard for process assessment
     (2018).
[10] A. R. Hevner, S. T. March, J. Park, S. Ram, Design science in information systems research,
     MIS quarterly (2004) 75–105.
[11] M. K. Sein, O. Henfridsson, S. Purao, M. Rossi, R. Lindgren, Action design research, MIS
     quarterly (2011) 37–56.
[12] A. Qumer, B. Henderson-Sellers, A framework to support the evaluation, adoption and
     improvement of agile methods in practice, Journal of systems and software 81 (2008)
     1899–1919.
[13] R. M. Fontana, V. Meyer Jr, S. Reinehr, A. Malucelli, Progressive outcomes: A framework
     for maturing in agile software development, Journal of Systems and Software 102 (2015)
     88–108.
[14] P. Mayring, Qualitative content analysis: a step-by-step guide, SAGE, 2021.
[15] R. K. Yin, Case study research and applications, 1st Edition from 1984, Sage, 2018.
[16] P. Runeson, M. Host, A. Rainer, B. Regnell, Case study research in software engineering:
     Guidelines and examples, John Wiley & Sons, 2012.