=Paper=
{{Paper
|id=Vol-1284/paper20
|storemode=property
|title=Software Reliability Measurement Experiences Conducted in Alcatel Portugal
|pdfUrl=https://ceur-ws.org/Vol-1284/paper20.pdf
|volume=Vol-1284
|dblpUrl=https://dblp.org/rec/conf/quatic/Lourenco01
}}
==Software Reliability Measurement Experiences Conducted in Alcatel Portugal==
Software Reliability Measurement Experiences Conducted in Alcatel Portugal
Rui Loureno, Alcatel Portugal, S.A.
Abstract With respect to the software life cycle, the phases
requiringcarefulreliability evaluation are:
Sofhvare Reliabz.lity measurement is essential for . Test: To quantify the efficiency of a test set and
examining the degree of qualz.ty or relz"abilz.tyof a detect the saturationinstant i.e., the instant when the
developed sojiware system. This paper describes the probability of test failure detection becomes very
experz`ments conducted at Alcatel Portugal, concerning low.
the use of Sofiware Reliabilz.ty models. The results are . Qualification:To demonstrate quantitatively that the
general and can be used to monitor sofiware relz"abilz.ty software has reached a specified level of quality.
growth in order to attain a certain qualz`ty withz.n . Maintenance: To quantify the efficiency of
schedule. A method based on the analysis of the trend maintenance actions. At the starting of operational
exhibz"tedby the data collected is used to z.mprove the life, the software might be less reliable as the
predictions. The results show that-. operational environment changes. Maintenance
. It is dicultfor the models to reproduce the observed actionsrestore the reliability to a specified level.
faz"luredata when changes in trend do notfoLLowthe
models assumptions. Data requirements needed to implement
. The Laplace trend test is a major toolfor guz.dingthe these models
partitionz.ng of failure data according to the
assumptions of models relz.abz`Iz.tygrowth. To implement these software reliability models, a process
. Predz-ctz`on yields good results over a -timeperiod of a needs to be set up in order to collect, verify and validate
few months, showing that reliabilitY modeling is a the errordata needed to be used as an input"
major tooLfor test/maz"ntenanceplanning and foLlow Since we are using a defect management tool to submit,
up. manage and track defects detected during the software
development life cycle phases mentioned above, it is
thereforerelatively easy for us to retrieve errordata from
Wtroduction the software defects database.
This way, it is possible to collect historical and actual
Software reliability models are used to monitor, evaluate error data from projects, in the form of time intervals
and predict the quality of software systems. Quantitative between failures and/or number of failures per time unit,
measures provided by these models are a key help to the as softwarereliability models usually request.
decision-makingprocess in our organization. Normally, the following dataneeds to be available before
Since a lot of resources are consumed by software we startusing the models:
development projects, by using software reliability . The fault counts per time unit (where repeated
models, our goal is to optimize the use of these resources, failures are not counted)
in order to achieve the best quality with lower costs and . The elapsed time between consecutive failures
optimized schedules. . The length of each time unit used
They enable us to estimate software reliability measures . The effortspent on test per time unit
such as:
. The number of failures that will be found during ModeBng approach
some time period in the future
. How much time will be required to detect a certain The basic approachhere is to model past failure data to
numberof failures predict future behavior. This approach employs either the
. What is the mean time interval between failures and observed number of failures discovered per time period,
what resources are needed (testing + correction) to or the observed times between failures of the software.
achieve a given quality level The models used therefore, fall into two basic classes,
. To perform comparative analysis: "how does my depending upon the types of data the model uses:
productcomparewith others"
1. Failuresper time period
QuaTlC'2001/ 169
2. Times between failures analysis is needed and the model must be applied from the
time period presentingreliability decay.
These classes are, however, not mutually disjoint. There However, this is not the only way of looking at the
are models that can handle either data type. Moreover, problem. Assuming that the error detection rate in
many of the models for one data type can still be applied software testing is proportional to the current error
even if the user has data of the other type, applying data content and the proportionalitydepends on the currenttest
transformationsprocedures. effort at an arbitrarytesting time, a plausible software
For example, one of the models we use with more success reliability growth model based on a Non-Homogeneous
is the S-shaped (SS) reliability growth model. For this Poisson Process has also been used.
model, the software error detection process can be
described as an S-shaped growth curve to reflect the How to obtain predictions of future
initial learning curve at the beginning, as the test team reliability
members become familiar with the software, followed by
a growth and then leveling off as the residual faults In a predictive situation, statements have to be made
become more difficult to uncover. regarding the future reliability of software, and we can
Like the Goel Okamoto COO)and the Rayleigh models, only make use of the informationavailable at that time. A
that we also use very often, they can be classified as trend test carried out on the available data helps choose
Poisson type models (the numberof failures per unit of the reliability growth model(s) to be applied and the
time is an independentPoisson random variable). Their subset of data to which this (or these) model(s) will be
performancedepends basically on 2 parameters: applied.
. One that estimates the total number of software As mentioned before, the models are applied as long as
failures to be eventually detected. the environmental conditions remain significantly
. Another that measures the efficiency with which unchanged (changes in the testing strategy, specification
software failures are detected. changes, no new system installation...).
In order to estimate the models parameters, we use the In fact even in these situations, reliability decrease may be
tool CASRE: Computer-AidedSoftware Reliability tool. noticed. Initially, one can consider that it is due to a local
This is a PC based tool that was developed in 1993 by the random fluctuation and that reliability will increase
Jet PropulsionLaboratoryfor the U.S. Air Force~ sometime in the near future. In this case predictions are
still made without partitioning data. If reliability keeps
Models assumptions decreasing, one has to find out why and new predictions
may be made by partitioningdata into subsets according
The modeling approach described here is primarily to the new trend displayed by the data.
applicable from the testing phase onward. The software If significant changes in the development or operational
must have maturedto the point that extensive changes are conditions take place, greatcare is needed since reliability
not being routinely made. The models can't have a trend changes may result, leading to erroneous
credible performanceif the software is changing so fast predictions. New trend tests have to be carriedout.
that gathering data on one day is not the same as If there is insuf6cient evidence that a different phase in
gathering data on another day. Different approaches and the programs reliability evolution has been reached,
models need to be consideredif that is the case. applicationof reliability growthmodels can be continued.
Another important issue of the modeling procedure, is If there is an obvious reliability decrease, reliability
that we need to know the inflection points, i.e.. the points growth model's application has to be stopped until a new
in time when the software failures stop growing and start reliability growth period is reached again. Then, the
to decrease. Reliability growth models cannot follow observed failure data has to be partitioned according to
these trend variations, thus our approach consists of the new trend.
partitioning the data into stages subsequent to applying
the models. Inflection points are the boundariesbetween Number of models to be appEed
these stages. A simple way to identify inflection points is
by performing trend tests, such as the Laplace trend test With respect to the number of models to be applied,
[3]. previous studies indicated that there are not "universally
The use of trend tests is particularlyimportantfor models best" models. This suggests that we try several models
such as the S-shaped, on which predictions can only be and examine the quality of predictionbeing obtained from
accurate as long as the observed data meet the model each of them and that even doing so, we are not able to
assumption of reliability decay prior to reliability growth. guaranteeobtaining good predictions.
The model CS-shaped) cannot predict future reliability During the development phases of a running project, it is
decay, so that when this phenomenon occurs, a new not always possible to apply several models, because of
lack of time, experience, and analytical and practical
170 t QuaTIC'2001
tools. Usually people only apply one, two or three models . Decrease in reliability,
to their data. Analysis of the collected data and of the . Stable reliability.
environmental conditions helps us understanding the
evolution of software reliability, and data partitioninginto Case study
subsets helps us improve the quality of the predictions.
We are going to apply the previously described
Models caRbratJonand appBcation methodology to the software validation phase of one of
the software projects currently in the maintenance phase
The models may be calibrated either after each new in our company.
observed data (step-by-step) or periodically after The project in question is a large telecom network
observation of a given number of failures, say y, (y-step- managementsystem, with more than 350 000 source lines
ahead). Stepby-step prediction seems more interesting. of code. The volume and complexity of this software
However, one needs to have a good data collection system make it difficult, if not impossible, to eliminate all
process set up to implement this procedure, since data software defects prior to its operational phase. Our aim
might not always be available immediately. In operational was to evaluate quantitatively some operational quality
life, longer inter-failure times allow step-by-step factors, particularly software reliability, before the
predictions. software package startedits operationallife.
Since we have a database with error data from running The software validationphase for this project is a 4 step
projects in our organization (the defects are collected testing process I) integration test, 2) system test, 3)
from the test phase onwards), we have a forma\ procedure qualification test and 4) acceptance test. The first 3 steps
to regularly retrieve, analyse and verify this data. correspond to the test phases usually defined for the
Then we use a periodical approach, to make predictions, software life cycle. Acceptance test consists of testing the
which can be summarizedas follows: entire software system in real environment, which
. Every week, we retrieve errordata from the projects approachesthe normal operating environment. It uses a
we are interestedin evaluate software reliability. system configuration (hardware and software) that has
. We analyze and validate this data and look for reached a sufficient level of quality after completing the
possible trends,in order to select the best data set that first 3 test phases describedabove,
could be used for doing predictions. After the validation phase has started, software errors
. If models assumptions are met, we apply the models, detected by the test team were submitted into the defects
validate them and analyze the results they provide. database.
. Then we collect feedback from people involved in Failure reports that had at least one of the following
the projects and, if necessary. take actions that help in characteristicswere rejected:
improving productsreliability. . Failures not due to software, but data, documentation
or hardware
Laplace trend test . Reportsrelated to an improvement request
. Results in accordanceto specifications
The Laplace trend test [3] is used to determine the . Reports failures alreadyaccounted for
software reliability trend using data on failures relating to In order to collect the test effort spent per time unit on a
software: given project, we used data existent in another database
. Time intervalbetween failures,or specially created to collect manpower figures.
. Number of failures per time unit. Ourgoal was to evaluate:
This test calculates an indicator u(n), expressed according . The number of failures that still remained in the
to the data (time interval between failures or number of software, after half of the test"planned time was
failuresper time unit). A negative u(n) suggests an overall completed
increase in reliability between dataitem 1 and dataitem n. . The time-point when 95% of all software failures
A positive u(n) suggests an overall decrease in reliability existing in the software (forecasted by the models)
between data items I and n. Therefore, if we notice an were found
increase (decrease) in u(n) then we have a period of local . The mean time between failures achieved by the end
reliability decrease (growth). of system, qualification and acceptance tests
The Laplace trend test is straightforwardand much faster . The amount of test effort (testing + correction) still
to use than models. The reliability study can be stopped at needed to achieve the target of 95% of software
this stage if it is believed that the information obtained defects found.
has, indeed, answered the proposed questions. Of course, When we first decided to apply these models, we were
the obtained informationis restrictedto: half the way through the system test phase. At that time
. Increase in reliability, we were interested in determine the number of defects
QuaTIC 2001 / 171
remaining in our application so we could reevaluate our predictions for the next 20 time units, which revealed
test strategy. The first approach consisted in considering more accurate.
the entire set of software failures collected up to that time, After these results have been analysed, the project team
to model the software reliability. agreed that the system test had serious chances to be
To meet this goal, we selected a set of models that used delayed, so they had to rethink their test strategy.
failures per time period as an input The S-shaped (SS) Lateron in the project, right after the qualification test
and the Broof;:s/MOdey(BM) models were chosen, phase has started, the questions were whether if the
independently of the severity (critical, major and rninor), software would be ready for shipping by the end of this
of the observed failures. phase and in case it didn`t, how much effort (testing +
Figure 1 shows that the models had difficulty in corrections) was still required to achieve the target of
modelling the entire [ailure process. 95% of all defects forecasted found. It was an important
decision to be made and the conclusions couid have
seriousimplications in the project schedule.
1800 ` - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
In orderto improve the accuracy of the new predictions
we decided to restrictthe data set to be used by applying
the Laplacetrend test.
As it can be seen in figure 2 the Laplace trend test
graphicallowed us to observe periods of local reliability
growth and decrease in the data.
Figure 1: BM and SS models fxttfng to the fust data set
Despite the fitting and adjustmentproblems observed, we
can notice two different behaviours in the software
models predictions. The SS model presents a more
"optimistic" vision of the failure process than the BM.
These differences are often observed. and to identify Figure 2: Laplace trend test applied to the
which model was trustworthyfor predictions, some expert observedfailure data.
judgement was needed, since validation statistics, namely
the residue and the Chi-Squared statistics, were not Consideringthe models assumptions, the periods selected
enough to help us deciding. for a new reliability evaluation were P2 for the SS model,
The following table summarizes the models results: since we can notice that there is a decrease in reliability
followed by a reliability growth period, andPI for the GO
and BM models, since there is only reliability growth
observed.
By runningthe models again we noticed that the deviation
was significantly reduced, thus improving the reliability
evaluation (see figure 3).
Notice that the total number of failures predicted by the
BM model was extraordinarybig. This didn't mean that
we didn't consider this model prediction. Instead of using
ifs asymptotic measures, we only considered the
172 / QuaTIC`2001
defects found. Figure 4 and table 3 below summarize the
147 - - - - - - - - - - - - - - - - - - - - - - - -
results obtainedby using this model,
Figure 3: SS and BM models fitting to the data by using
P2 and Pl datasets as inputs, respectiveJy
The validation statistics told us that that the observed
residues were now lower, which gave us more confidence
in the models results. Figure 4: SSM model fitting to P2 data set adjusted
The following table summarizesthe new results observed:
set
adjusted
As it can be seen, the model fitting is quite accurate and
reasonably adapted to the failure data observed. These
results were a major help to the project team, who was
able to make more accuratedecisions based on the results
Based on these results, plus the expert judgement providedby this model.
provided by the project team we considered the S-shaped
As it was mentioned before, expert judgement provided
model values for reference(optimistic view). by people from the projects,plays an essential role in the
However there was still a question that needed an answer
process of deciding which model results to select. Unless
How much test effort sciJl had to be spenc in order the
we are pretty shore about the stability of our product, i.e.,
software could be 95% error free? To answer to that we know that we shouldn'texpect too many defects in the
question a different model with a different approach was
near future, and the test environment is not suppose to
needed. Since tesc effort is clear2y correlated with the change much, we can not rely significantly on these
defects found during the test phases, we decided to use
results.
test effort inputs, in the S-shaped model instead of
calendartime,
To include the test effort data in the model, we had to Conclusions
restrict the data range to the period from where we had
reliable effort data figures. By doing so, it was possible Software reliability models are an important aid to the
for us to evaluate with the same model, the remaining test/maintenance planning and reliability evaluation,
failures in the software and the test effort needed to find a However, it is well known that no particular model is
given amountof defects in the software system. better suited for predicting software behaviour for all
We decidedto apply this new model Cs-shapedmodified - software systems in any circumstances. Our work helps
SSM) to the da set containing , suggest by e the already existing models to give better predictions
pJace end tt (s 68e 2). th a few adjustmen since they are applied to data displaying trends in
in order for the test eo to reBect more accurately e accordanceto their assumptions,
QuaTIC 2001 / 173
With respect to the applicationof the proposed method to
the failure data of our network management project 2
models, namely, the S-shaped and Brooks/Motley, have
been analysed according to their predictive capabilities.
The results obtained show that:
. The trend test help partitionthe observed failure data
according to the assumptions of reliability growth
models; it also indicates the segment of data from
which the occurrence of future failures can be
predicted more accurately;
. The prediction approachproposed for the validation
phases yields good resultsover a time period of a few
months, showing that reliability modeling constitutes
a major aid tool for test/maintenanceplanning and
follow up.
References
[II DerriennicH., and Gall G-, "Useof Failure-IntensityModels
in the Software Validation Phase for Telecommunications",
IEEE Trans~on Reliability, Vol. 44, No. 4, December 1995.
pp. 658-665.
[2] GoeI A.L, and Okumoto K., Time dependent error
detection rate model for softwareand other performance
measures",IEEE Trans-on Reliability, Vol. R-'28~No~ 3,
August 2979, pp. 206 212.
[3] KanounK.. MartiniM.R.B., and Souza J.M., "A Method for
Software Reliability Analysis and Prediction Applicationto
the TftOPICO-RSwitching System",IEEE Trans. on
Software Engineering, Vol. 27. No. 4, April 2991, pp. 334
344.
[4} Lyn M.R., "Handbookof Software Reliability Engineering",
Publishedby IEEE ComputerSociety Press and McGraw-
Hill Book Company.
[5] YarnadaS.. Hishitani J., and Osaki S.. .Software Reliability
growth with a Weibull Test Effort: A Model &
Application",IEEE Trans. on Reliability,Vol. 42, No, l,
March 1993, pp. 2OO-206.
174 / QuaTIC'2OOI