Do We Really Know How to Measure Software
                 Quality?

    Vineta Arnicane[0000−0003−3942−9229] , Juris Borzovs[0000−0001−7009−6384] ,
                  Anete Nesaule-Erina[0000−0001−7244−2148]

                     Faculty of Computing, University of Latvia
                            Raina blvd. 19, Riga, Latvia
                    {vineta.arnicane, juris.borzovs}@lu.lv,
                              anete.nesaule@gmail.com


       Abstract. Particular return to McCall’s et al. seminal work has been
       done in recent international standards ISO/IEC 25022:2016 and ISO/IEC
       25023:2016. 122 lower level measures (metrics in McCall’s et al. terminol-
       ogy) were introduced. Authors of standards themselves classify quality
       measures as Highly Recommended, Recommended, Used at user’s Discre-
       tion and Generic, Specific. In this paper, quality measures are addition-
       ally classified also from the point of view of objectivity of measurement
       and of usage in industry.

       Keywords: Software engineering · Software quality models · Quality
       characteristics · Quality metrics · Quality measures.


1    Introduction
We are not going to review software quality models because the authors of [12]
already have done it. As James A. McCall, wrote in Wiley Online Library in
2002 [11]:
    “All engineering disciplines have some basis in measurement. Software qual-
ity measurement frameworks have been developed to provide a measurement
approach for software engineering. The frameworks have a common structure.
At the highest level are quality factors, that comprise a definition of software
quality and represent attributes or characteristics of the software, relate to its
overall quality. The second level provides the criteria or software attributes that
relate to the factors, and their existence provides the related characteristics of
quality. The third level provides the “metrics” or measurements that measure
the degree to which those attributes exist.
    The use of software quality factors and metrics has become a common prac-
tice in the industry, although the applications are not consistent, systematic or
typically applied across projects or organizations.” In their seminal paper [10]
the authors introduced 11 quality factors, 23 criteria (i.e.,sub-factors), and 42
metrics to measure software quality at the U.S. Air Force Electronic Systems
Division’s and Rome Air Development Center’s mission to provide standards
and technical guidance to software acquisition managers. Two types of metrics


Copyright © 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0)

                                      9
2       J. Borzovs et al.

were used. The first type, like a ruler, is a relative quantity measure. The second
type is a binary measure which determines the existence (1) or absence (0) of
something. The units of the metric were chosen as the ratio of actual occurrences
to the possible number of occurrences. So, a total quality of a software product
could be characterized as a sum of values of metrics.
    For unknown reasons, idea of quantitative evaluation of software was not
incorporated into international standard neither in its first version in 1991 [2],
nor in its subsequent revisions [3–6]. This view is also shared by [1]: “The metrics
in the lower level of the McCall’s, Boehm’s, Dromey’s and FURPS quality models
are neither clearly nor completely defined and connected.” (However, to our
opinion, it is not true regarding McCall’s lower level definitions.)
    Particular return to McCall’s et al. seminal work has been done in recent
standards ISO/IEC 25022:2016 [8] and ISO/IEC 25023:2016 [9]. 122 lower level
measures (metrics in McCall’s et al. terminology) were introduced. In this pa-
per, quality measures are additionally classified also from the point of view of
objectivity of measurement and of usage in industry.
    In Section 2 generally explains quality models provided by ISO/IEC 25022:2016
and ISO/IEC 25023:2016 standards. There is an investigation about objectivity
of quality measures described in Section 3. The Section 4 provides insight into
the capacity of measures to assess subcharacteristics and therefore characteris-
tics of quality models. There is survey about usage of quality measures in the
enterprises of Latvia elaborated in Section 5. Section 6 contains results of survey
and Section 7 gives insight in conclusions and future work.


2   What do ISO/IEC 25022:2016 and ISO/IEC 25023:2016
    offer?

ISO/IEC 25022:2016 and ISO/IEC 25023:2016 are the part of series of Interna-
tional Standards from ISO/IEC 25000 to ISO/IEC 25099 entitled Systems and
software engineering – Systems and software Quality Requirements and Evalua-
tion with the acronym SQuaRE. They give the measures which are supposed as
appropriate to assess the extent or degree at what system or product satisfies
each of subcharacteristics. There are two quality models defined in the standard
ISO/IEC 25010:2011 [7] - Quality in Use model and System/software product
quality model. Each of these models consists of quality characteristics and their
subcharacteristics as shown in Fig. 3 and Fig. 4.
    Each measure in ISO/IEC 25022:2016 and ISO/IEC 25023:2016 is classified
as generic or specific. Generic are generally applicable, can be used in a wide
range of situations while Specific are specialised for specific needs. For instance,
generic measure is MRe-1-G Reusability of assets - how many assets in a system
can be reusable. It is measured as ratio between assets which are designed and
implemented to be reusable and assets in a system. As example for specific
measure can be highlighted REc-6-S Website visitors converted to customers -
the proportion of visitors to a particular web page(s) who become customers. It
is measured using business analytics.


                                    10
                      Do We Really Know How to Measure Software Quality?               3

   The system / software product quality model’s measures are classified ac-
cording to the recommendation level [9]. The authors of the standard ISO/IEC
25023:2016 advice to use highly recommended measures always. Accordingly rec-
ommended measures should be used as quality measures when it is appropriate.
At the same time measures with unknown reliability are meant to be used at
user’s discretion as a reference when developing a new quality measure [9].


3     How objective are measures?

The measure can be applicable and reliable only if its measurements can be
obtained objectively. We analysed measures mentioned in ISO/IEC 25022:2016
and ISO/IEC 25023:2016 and obtained assessment is shown in the Fig. 1. For


    Fig. 1. Assessment of possibility to obtain objective measurements for measures.


instance, we assessed as objective measure RAv-1-G Test function completeness
- how completely are test functions and facilities implemented. Measure is com-
puted as as ratio between system operation time actually provided and system
operation time specified in the operation schedule. This measure is objective be-
cause both parameters of ratio come from either from statistics or from schedule.
The measurements can be used in order to compare two products.
    Sus-2-G Satisfaction with features - the satisfaction of the user with specific
system features is the example of subjective measure. It is measured using ques-
tionnaires. Judgement of people about system’s features can greatly depend on
their mood, level of tiredness, attitude to system, etc..
    Measure Ef-3-G Errors in task - the number of errors made by the user during
a task is the instance of measures which are measurable objectively but at the
same time the result depends heavily on the end-user qualification in the field as
well as the level of training to work with the system under test. That also is the
reason why the obtained results are hardly applicable to comparison different
systems or one system in the different stages of development.


                                       11
4        J. Borzovs et al.

    Better situation are with next group of measures which can be measured ob-
jectively but the results are specific of product and can be used for comparison
in the form of percents. For instance, measure REc-2-G Time to achieve return
on investment - the time taken to achieve the expected return on investment
which can be calculated by business analytics based on careful analysis, though,
there is also a good dose of more or less subjective assumptions. Measure MTe-


    Fig. 2. Coloured quality characteristics for software or software product quality.


1-G Test function completeness - how completely are test functions and facilities
implemented which is calculated as ratio between count of test functions imple-
menter as specified and the count of test functions required, is a representative
of group of measures which can be measured objectively if there exist industry
criteria which help to understand how many test functions should be required.
Otherwise this measure becomes very subjective.


4     Can measures given in ISO/IEC 25022:2016 and
      ISO/IEC 25023:2016 really assess the quality
      subcharacteristics?

Let us try to understand which of measures are usable from practical point of
view - which are objective and reliable. If we put together our findings about
objectivity of quality measures with recommendations of standard we would
have obtained situation shown in the Fig. 2. We put red crosses on measures
that the authors of standard ISO/IEC 25023:2016 considered as measures with
unknown reliability. Next step - we put black box around those measures which
are objective according to our analysis and are highly recommended by authors
of standard.
    If to look on Quality in use model’s measures of subcharacteristics there
are no any objectively measurable measure. That could be expected to happen


                                       12
                     Do We Really Know How to Measure Software Quality?             5

(see Fig. 3). Accordingly also quality characteristics cannot be well measured
if their subcharacteristics are not such. The situation with Software product


       Fig. 3. Measurability of quality characteristics for Quality in use model.


quality model looks much better (see Fig. 4). The most difficult measurable are
usability subcharacteristics while reliability, security and performance efficiency
subcharacteristics looks measurable and reliable.


5     What measures are used in practise in Latvia?

5.1   Survey Objectives

The University of Latvia conducted a survey on measures used in the Latvia’s
industry of software development.
    The primary objective of the survey was to determine measures of quality
characteristics used in Latvia when they carry out software development and
maintenance activities.
    The aim was to find out which quality characteristics and subcharacteristics
are currently used in the IT industry of Latvia and whether all quality measures
included in ISO/IEC 25022:2016 and ISO/IEC 25023:2016 standards are well
known by quality assurance specialists, testers and developers.


5.2   Survey Description

The survey targeted senior employees involved with quality assurance, testing
or development in software development companies or departments of software
development in enterprises with another kind of main business.


                                      13
6       J. Borzovs et al.


Fig. 4. Measurability of quality characteristics for software or software product quality.


    There were three parts in the survey. The first part was an introductory
section where the aims of survey and way how to fill it was described. The second
part of the survey comprised questions about the company. The third part was
the main part of the survey – 122 measures of 42 quality subcharacteristics
according to standards ISO/IEC 25022:2016 and ISO/IEC 25023:2016.
    The respondents of the survey were invited to complete the survey online.
Each measure had predefined answers: 1)Yes; 2)No; 3)Other (open type answer
to give some specific answer). Respondents were informed that survey terms
are taken from standards ISO/IEC 25022:2016 and ISO/IEC 25023:2016. For
each measure the definition was given, formulae was explained and the method
of measuring was described. Confidentiality and privacy were assured to all re-
spondents and the organization that they represented.


6    Analysis and Summary of Survey Findings

As a result, a total of 17 companies participated in the survey, representing more
than 4000 IT employees including almost 800 testers. The survey results show
which measures of system and software product quality as well as measure of
quality in use are used in the IT industry of Latvia.
    Summary of survey’s results is shown in the Fig. 5. Measures are color coded
in five groups. The red ones are measures that are used rarely - in less than
20% of organizations, while yellow ones are used moderately and green ones
frequently or very frequently - even in more than 80% of organizations.


                                       14
                    Do We Really Know How to Measure Software Quality?             7


    Fig. 5. Results - at what extent measures are used in IT industry of Latvia.


    As an example of rarely used measure in surveyed organizations can be men-
tioned CIn-1-G Data formats exchangeability - what proportion of the specified
data formats is exchangeable with other software or systems. It is measured as
ratio between data formats exchangeable with other software or systems and
data formats specified to be exchangeable.
    Next group of measures is colored in orange - these measures are used in 20-
40% of surveyed organizations. As representative of moderately used measures
colored in yellow are used in 41-59% of surveyed organizations. Green color
indicates that measure is used frequently. Ones colored in light green are used
in 60-89% of surveyed organizations.
    The most frequently used measures are colored in dark green. There are only
two measures used by more than 80% of surveyed organizations. The one is
PTb-1-G Mean response time and the other is Sus-1-G overall satisfaction - the
overall satisfaction of the user measured using questionnaires for users of system
or software product.


7   Conclusions and Future Work
The authors of standards ISO/IEC 25022:2016 and ISO/IEC 25023:2016 have
done hard work trying to define measures which could allow to assess subchar-
acteristics and therefore quality characteristics for Software in use and Software
product quality models. They divide measures in two sets - generic for usage
in wide range of situations while specific for specialised needs. For the Prod-
uct quality model they give also classification of measures into sets of highly
recommended, recommended and measures with unknown reliability.
   If measures of both quality models are assessed from the objectivity viewpoint
then we see that Quality in use measures all are more or less subjective while
the most of Product quality model’s measures are objective.
   Survey about actual usage of quality measures in IT industry of Latvia shows
that only two measures are used in more than 89% of surveyed enterprises. At the


                                    15
8       J. Borzovs et al.

same time one third part of all measures mentioned in standards is used in less
then 20% of enterprises. Surprisingly, used measures are not mandatory reliable
and objective. This fact urges to continue research in order to understand if it
is enough for enterprises with those measures which they are using and, if not,
why they do not use other measures.


8    Acknowledgements

The work described in this paper was supported by the project “Innovative
information technologies” at the University of Latvia. We are grateful to all
respondents of the survey. Any errors or omissions are the faults of the authors.


References
1. Al-Qutaish, Rafa E. Quality Models in Software Engineering Literature: An Ana-
   lytical and Comparative Study. Journal of American Science 6(3), 166–175 (2010)
2. ISO/IEC 9126:1991 Software engineering – Product quality. ISO/IEC, (1991)
3. ISO/IEC 9126-1:2001 Software engineering – Product quality – Part 1: Quality
   model. International Organization for Standarization, Geneva, Switzerland (2001)
4. ISO/IEC TR 9126-2:2003. Software Engineering - Product Quality - Part 2: Exter-
   nal Metrics. International Organization for Standardization, Geneva, Switzerland
   (2003)
5. ISO/IEC TR 9126-3: 2003. Software Engineering - Product Quality - Part 3: Inter-
   nal Metrics. International Organization for Standardization, Geneva, Switzerland
   (2003)
6. ISO/IEC TR 9126-4: 2004. Software Engineering - Product Quality - Part 4: Quality
   in Use Metrics. International Organization for Standardization, Geneva, Switzerland
   (2004)
7. ISO/IEC 25010:2011 Systems and software engineering – Systems and software
   Quality Requirements and Evaluation (SQuaRE) – System and software quality
   models. International Organization for Standardization, Geneva, Switzerland (2011)
8. ISO/IEC 25022:2016 Systems and software engineering – Systems and software
   quality requirements and evaluation (SQuaRE) – Measurement of quality in use.
   International Organization for Standardization, Geneva, Switzerland (2016)
9. ISO/IEC 25023:2016 Systems and software engineering – Systems and software
   Quality Requirements and Evaluation (SQuaRE) – Measurement of system and
   software product quality. International Organization for Standardization, Geneva,
   Switzerland (2016)
10. McCall, J., Richards, P. ,Walters, G.: Factors in Software Quality, three volumes,
   NTIS AD-A049-014, 015, 055 (1977)
11. McCall, James A.: Quality Factors. In: Wiley Online Library (2002),
   https://doi.org/10.1002/0471028959.sof265. Last accessed 26 Feb 2020
12. Miguel, José P., Mauricio, D., Rodrı́guez, G.: A Review of Software Quality Models
   for the Evaluation of Software Products. International Journal of Software Engineer-
   ing Applications (IJSEA), 5(6), 31–53, (2014)


                                      16