=Paper= {{Paper |id=Vol-1284/paper20 |storemode=property |title=Software Reliability Measurement Experiences Conducted in Alcatel Portugal |pdfUrl=https://ceur-ws.org/Vol-1284/paper20.pdf |volume=Vol-1284 |dblpUrl=https://dblp.org/rec/conf/quatic/Lourenco01 }} ==Software Reliability Measurement Experiences Conducted in Alcatel Portugal== https://ceur-ws.org/Vol-1284/paper20.pdf
  Software Reliability Measurement Experiences Conducted in Alcatel Portugal

                                       Rui Loureno, Alcatel Portugal, S.A.




Abstract                                                        With respect to the software life cycle, the phases
                                                                requiringcarefulreliability evaluation are:
Sofhvare Reliabz.lity measurement is essential for              .   Test: To quantify the efficiency of a test set and
examining the degree of qualz.ty or relz"abilz.tyof a               detect the saturationinstant i.e., the instant when the
developed sojiware system. This paper describes the                 probability of test failure detection becomes very
experz`ments conducted at Alcatel Portugal, concerning              low.
the use of Sofiware Reliabilz.ty models. The results are        .   Qualification:To demonstrate quantitatively that the
general and can be used to monitor sofiware relz"abilz.ty           software has reached a specified level of quality.
growth in order to attain a certain qualz`ty withz.n            .   Maintenance: To quantify the efficiency of
schedule. A method based on the analysis of the trend               maintenance actions. At the starting of operational
exhibz"tedby the data collected is used to z.mprove the             life, the software might be less reliable as the
predictions. The results show that-.                                operational environment changes. Maintenance
.    It is dicultfor the models to reproduce the observed           actionsrestore the reliability to a specified level.
    faz"luredata when changes in trend do notfoLLowthe
     models assumptions.                                        Data requirements needed to implement
.    The Laplace trend test is a major toolfor guz.dingthe      these models
     partitionz.ng of failure data according to the
     assumptions of models relz.abz`Iz.tygrowth.                To implement these software reliability models, a process
.    Predz-ctz`on yields good results over a -timeperiod of a   needs to be set up in order to collect, verify and validate
    few months, showing that reliabilitY modeling is a          the errordata needed to be used as an input"
     major tooLfor test/maz"ntenanceplanning and foLlow         Since we are using a defect management tool to submit,
     up.                                                        manage and track defects detected during the software
                                                                development life cycle phases mentioned above, it is
                                                                thereforerelatively easy for us to retrieve errordata from
Wtroduction                                                     the software defects database.
                                                                This way, it is possible to collect historical and actual
Software reliability models are used to monitor, evaluate       error data from projects, in the form of time intervals
and predict the quality of software systems. Quantitative       between failures and/or number of failures per time unit,
measures provided by these models are a key help to the         as softwarereliability models usually request.
decision-makingprocess in our organization.                     Normally, the following dataneeds to be available before
Since a lot of resources are consumed by software               we startusing the models:
development projects, by using software reliability             .    The fault counts per time unit (where repeated
models, our goal is to optimize the use of these resources,          failures are not counted)
in order to achieve the best quality with lower costs and       .    The elapsed time between consecutive failures
optimized schedules.                                            .    The length of each time unit used
They enable us to estimate software reliability measures        .    The effortspent on test per time unit
such as:
.   The number of failures that will be found during            ModeBng approach
    some time period in the future
.   How much time will be required to detect a certain          The basic approachhere is to model past failure data to
    numberof failures                                           predict future behavior. This approach employs either the
.   What is the mean time interval between failures and         observed number of failures discovered per time period,
    what resources are needed (testing + correction) to         or the observed times between failures of the software.
    achieve a given quality level                               The models used therefore, fall into two basic classes,
.   To perform comparative analysis: "how does my               depending upon the types of data the model uses:
    productcomparewith others"
                                                                1. Failuresper time period




                                                                                                      QuaTlC'2001/ 169
2. Times between failures                                      analysis is needed and the model must be applied from the
                                                               time period presentingreliability decay.
These classes are, however, not mutually disjoint. There       However, this is not the only way of looking at the
are models that can handle either data type. Moreover,         problem. Assuming that the error detection rate in
many of the models for one data type can still be applied      software testing is proportional to the current error
even if the user has data of the other type, applying data     content and the proportionalitydepends on the currenttest
transformationsprocedures.                                     effort at an arbitrarytesting time, a plausible software
For example, one of the models we use with more success        reliability growth model based on a Non-Homogeneous
is the S-shaped (SS) reliability growth model. For this        Poisson Process has also been used.
model, the software error detection process can be
described as an S-shaped growth curve to reflect the           How to         obtain      predictions        of    future
initial learning curve at the beginning, as the test team      reliability
members become familiar with the software, followed by
a growth and then leveling off as the residual faults          In a predictive situation, statements have to be made
become more difficult to uncover.                              regarding the future reliability of software, and we can
Like the Goel Okamoto COO)and the Rayleigh models,             only make use of the informationavailable at that time. A
that we also use very often, they can be classified as         trend test carried out on the available data helps choose
Poisson type models (the numberof failures per unit of         the reliability growth model(s) to be applied and the
time is an independentPoisson random variable). Their          subset of data to which this (or these) model(s) will be
performancedepends basically on 2 parameters:                  applied.
.     One that estimates the total number of software          As mentioned before, the models are applied as long as
      failures to be eventually detected.                      the environmental conditions remain significantly
.    Another that measures the efficiency with which           unchanged (changes in the testing strategy, specification
     software failures are detected.                           changes, no new system installation...).
In order to estimate the models parameters, we use the         In fact even in these situations, reliability decrease may be
tool CASRE: Computer-AidedSoftware Reliability tool.           noticed. Initially, one can consider that it is due to a local
This is a PC based tool that was developed in 1993 by the      random fluctuation and that reliability will increase
Jet PropulsionLaboratoryfor the U.S. Air Force~                sometime in the near future. In this case predictions are
                                                               still made without partitioning data. If reliability keeps
Models assumptions                                             decreasing, one has to find out why and new predictions
                                                               may be made by partitioningdata into subsets according
The modeling approach described here is primarily              to the new trend displayed by the data.
applicable from the testing phase onward. The software         If significant changes in the development or operational
must have maturedto the point that extensive changes are       conditions take place, greatcare is needed since reliability
not being routinely made. The models can't have a              trend changes may result, leading to erroneous
credible performanceif the software is changing so fast        predictions. New trend tests have to be carriedout.
that gathering data on one day is not the same as              If there is insuf6cient evidence that a different phase in
 gathering data on another day. Different approaches and       the programs reliability evolution has been reached,
models need to be consideredif that is the case.               applicationof reliability growthmodels can be continued.
Another important issue of the modeling procedure, is          If there is an obvious reliability decrease, reliability
that we need to know the inflection points, i.e.. the points   growth model's application has to be stopped until a new
in time when the software failures stop growing and start      reliability growth period is reached again. Then, the
to decrease. Reliability growth models cannot follow           observed failure data has to be partitioned according to
these trend variations, thus our approach consists of          the new trend.
partitioning the data into stages subsequent to applying
the models. Inflection points are the boundariesbetween        Number of models to be appEed
these stages. A simple way to identify inflection points is
by performing trend tests, such as the Laplace trend test      With respect to the number of models to be applied,
[3].                                                           previous studies indicated that there are not "universally
The use of trend tests is particularlyimportantfor models      best" models. This suggests that we try several models
such as the S-shaped, on which predictions can only be         and examine the quality of predictionbeing obtained from
 accurate as long as the observed data meet the model          each of them and that even doing so, we are not able to
assumption of reliability decay prior to reliability growth.   guaranteeobtaining good predictions.
The model CS-shaped) cannot predict future reliability         During the development phases of a running project, it is
 decay, so that when this phenomenon occurs, a new             not always possible to apply several models, because of
                                                               lack of time, experience, and analytical and practical




170 t QuaTIC'2001
tools. Usually people only apply one, two or three models        .   Decrease in reliability,
to their data. Analysis of the collected data and of the         .   Stable reliability.
environmental conditions helps us understanding the
evolution of software reliability, and data partitioninginto     Case study
subsets helps us improve the quality of the predictions.
                                                                 We are going to apply the previously described
Models caRbratJonand appBcation                                  methodology to the software validation phase of one of
                                                                 the software projects currently in the maintenance phase
The models may be calibrated either after each new               in our company.
observed data (step-by-step) or periodically after               The project in question is a large telecom network
observation of a given number of failures, say y, (y-step-       managementsystem, with more than 350 000 source lines
ahead). Stepby-step prediction seems more interesting.           of code. The volume and complexity of this software
However, one needs to have a good data collection                system make it difficult, if not impossible, to eliminate all
process set up to implement this procedure, since data           software defects prior to its operational phase. Our aim
might not always be available immediately. In operational        was to evaluate quantitatively some operational quality
life, longer inter-failure times allow step-by-step              factors, particularly software reliability, before the
predictions.                                                     software package startedits operationallife.
Since we have a database with error data from running            The software validationphase for this project is a 4 step
projects in our organization (the defects are collected          testing process I) integration test, 2) system test, 3)
from the test phase onwards), we have a forma\ procedure         qualification test and 4) acceptance test. The first 3 steps
to regularly retrieve, analyse and verify this data.             correspond to the test phases usually defined for the
Then we use a periodical approach, to make predictions,          software life cycle. Acceptance test consists of testing the
which can be summarizedas follows:                               entire software system in real environment, which
.    Every week, we retrieve errordata from the projects         approachesthe normal operating environment. It uses a
     we are interestedin evaluate software reliability.          system configuration (hardware and software) that has
.    We analyze and validate this data and look for              reached a sufficient level of quality after completing the
     possible trends,in order to select the best data set that   first 3 test phases describedabove,
     could be used for doing predictions.                        After the validation phase has started, software errors
.    If models assumptions are met, we apply the models,         detected by the test team were submitted into the defects
     validate them and analyze the results they provide.         database.
.    Then we collect feedback from people involved in            Failure reports that had at least one of the following
     the projects and, if necessary. take actions that help in   characteristicswere rejected:
     improving productsreliability.                              .    Failures not due to software, but data, documentation
                                                                      or hardware
Laplace trend test                                               .    Reportsrelated to an improvement request
                                                                  . Results in accordanceto specifications
The Laplace trend test [3] is used to determine the              .    Reports failures alreadyaccounted for
software reliability trend using data on failures relating to    In order to collect the test effort spent per time unit on a
software:                                                        given project, we used data existent in another database
.    Time intervalbetween failures,or                            specially created to collect manpower figures.
.    Number of failures per time unit.                            Ourgoal was to evaluate:
This test calculates an indicator u(n), expressed according       .    The number of failures that still remained in the
to the data (time interval between failures or number of               software, after half of the test"planned time was
failuresper time unit). A negative u(n) suggests an overall            completed
increase in reliability between dataitem 1 and dataitem n.        .    The time-point when 95% of all software failures
A positive u(n) suggests an overall decrease in reliability            existing in the software (forecasted by the models)
between data items I and n. Therefore, if we notice an                 were found
increase (decrease) in u(n) then we have a period of local        .    The mean time between failures achieved by the end
reliability decrease (growth).                                         of system, qualification and acceptance tests
The Laplace trend test is straightforwardand much faster          .    The amount of test effort (testing + correction) still
to use than models. The reliability study can be stopped at            needed to achieve the target of 95% of software
this stage if it is believed that the information obtained             defects found.
has, indeed, answered the proposed questions. Of course,          When we first decided to apply these models, we were
the obtained informationis restrictedto:                          half the way through the system test phase. At that time
.    Increase in reliability,                                     we were interested in determine the number of defects




                                                                                                        QuaTIC 2001 / 171
remaining in our application so we could reevaluate our                 predictions for the next 20 time units, which revealed
test strategy. The first approach consisted in considering              more accurate.
the entire set of software failures collected up to that time,          After these results have been analysed, the project team
to model the software reliability.                                      agreed that the system test had serious chances to be
To meet this goal, we selected a set of models that used                delayed, so they had to rethink their test strategy.
failures per time period as an input The S-shaped (SS)                  Lateron in the project, right after the qualification test
and the Broof;:s/MOdey(BM) models were chosen,                          phase has started, the questions were whether if the
independently of the severity (critical, major and rninor),             software would be ready for shipping by the end of this
of the observed failures.                                               phase and in case it didn`t, how much effort (testing +
Figure 1 shows that the models had difficulty in                        corrections) was still required to achieve the target of
modelling the entire [ailure process.                                   95% of all defects forecasted found. It was an important
                                                                        decision to be made and the conclusions couid have
                                                                        seriousimplications in the project schedule.
     1800 ` - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
                                                                        In orderto improve the accuracy of the new predictions
                                                                        we decided to restrictthe data set to be used by applying
                                                                        the Laplacetrend test.
                                                                        As it can be seen in figure 2 the Laplace trend test
                                                                        graphicallowed us to observe periods of local reliability
                                                                        growth and decrease in the data.




Figure 1: BM and SS models fxttfng to the fust data set

Despite the fitting and adjustmentproblems observed, we
can notice two different behaviours in the software
models predictions. The SS model presents a more
"optimistic" vision of the failure process than the BM.
These differences are often observed. and to identify                   Figure 2: Laplace       trend   test   applied    to   the
which model was trustworthyfor predictions, some expert                 observedfailure data.
judgement was needed, since validation statistics, namely
the residue and the Chi-Squared statistics, were not                    Consideringthe models assumptions, the periods selected
enough to help us deciding.                                             for a new reliability evaluation were P2 for the SS model,
The following table summarizes the models results:                      since we can notice that there is a decrease in reliability
                                                                        followed by a reliability growth period, andPI for the GO
                                                                        and BM models, since there is only reliability growth
                                                                        observed.
                                                                        By runningthe models again we noticed that the deviation
                                                                        was significantly reduced, thus improving the reliability
                                                                        evaluation (see figure 3).




Notice that the total number of failures predicted by the
BM model was extraordinarybig. This didn't mean that
we didn't consider this model prediction. Instead of using
ifs asymptotic measures, we only considered the




 172 / QuaTIC`2001
                                                                defects found. Figure 4 and table 3 below summarize the
    147   - - - - -   - - - - - - - - - -   - - - - - - - - -
                                                                results obtainedby using this model,




Figure 3: SS and BM models fitting to the data by using
P2 and Pl datasets as inputs, respectiveJy

The validation statistics told us that that the observed
residues were now lower, which gave us more confidence
in the models results.                                          Figure 4: SSM model fitting to P2 data set adjusted
The following table summarizesthe new results observed:




                                                                                                                         set
                                                                adjusted

                                                                As it can be seen, the model fitting is quite accurate and
                                                                reasonably adapted to the failure data observed. These
                                                                results were a major help to the project team, who was
                                                                able to make more accuratedecisions based on the results
Based on these results, plus the expert judgement               providedby this model.
provided by the project team we considered the S-shaped
                                                                As it was mentioned before, expert judgement provided
model values for reference(optimistic view).                    by people from the projects,plays an essential role in the
However there was still a question that needed an answer
                                                                process of deciding which model results to select. Unless
How much test effort sciJl had to be spenc in order the
                                                                we are pretty shore about the stability of our product, i.e.,
software could be 95% error free? To answer to that             we know that we shouldn'texpect too many defects in the
question a different model with a different approach was
                                                                near future, and the test environment is not suppose to
needed. Since tesc effort is clear2y correlated with the        change much, we can not rely significantly on these
defects found during the test phases, we decided to use
                                                                results.
test effort inputs, in the S-shaped model instead of
calendartime,
To include the test effort data in the model, we had to         Conclusions
restrict the data range to the period from where we had
reliable effort data figures. By doing so, it was possible      Software reliability models are an important aid to the
for us to evaluate with the same model, the remaining           test/maintenance planning and reliability evaluation,
failures in the software and the test effort needed to find a   However, it is well known that no particular model is
given amountof defects in the software system.                  better suited for predicting software behaviour for all
We decidedto apply this new model Cs-shapedmodified -           software systems in any circumstances. Our work helps
SSM) to the da set containing , suggest by e                    the already existing models to give better predictions
   pJace end tt (s 68e 2). th a few adjustmen                   since they are applied to data displaying trends in
in order for the test eo     to reBect more accurately e        accordanceto their assumptions,




                                                                                                       QuaTIC 2001 / 173
With respect to the applicationof the proposed method to
the failure data of our network management project 2
models, namely, the S-shaped and Brooks/Motley, have
been analysed according to their predictive capabilities.
The results obtained show that:
.   The trend test help partitionthe observed failure data
     according to the assumptions of reliability growth
    models; it also indicates the segment of data from
    which the occurrence of future failures can be
    predicted more accurately;
.   The prediction approachproposed for the validation
    phases yields good resultsover a time period of a few
    months, showing that reliability modeling constitutes
     a major aid tool for test/maintenanceplanning and
     follow up.

                        References

[II DerriennicH., and Gall G-, "Useof Failure-IntensityModels
    in the Software Validation Phase for Telecommunications",
    IEEE Trans~on Reliability, Vol. 44, No. 4, December 1995.
    pp. 658-665.
[2] GoeI A.L, and Okumoto K., Time dependent error
    detection rate model for softwareand other performance
    measures",IEEE Trans-on Reliability, Vol. R-'28~No~ 3,
    August 2979, pp. 206 212.
[3] KanounK.. MartiniM.R.B., and Souza J.M., "A Method for
    Software Reliability Analysis and Prediction Applicationto
    the TftOPICO-RSwitching System",IEEE Trans. on
    Software Engineering, Vol. 27. No. 4, April 2991, pp. 334
    344.
[4} Lyn M.R., "Handbookof Software Reliability Engineering",
    Publishedby IEEE ComputerSociety Press and McGraw-
    Hill Book Company.
[5] YarnadaS.. Hishitani J., and Osaki S.. .Software Reliability
    growth with a Weibull Test Effort: A Model &
    Application",IEEE Trans. on Reliability,Vol. 42, No, l,
    March 1993, pp. 2OO-206.




174 / QuaTIC'2OOI