<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Software Reliability Measurement Experiences Conducted in Alcatel Portugal</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Rui Loureno</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alcatel Portugal</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2001</year>
      </pub-date>
      <fpage>169</fpage>
      <lpage>174</lpage>
      <abstract>
        <p>Sofhvare Reliabz.lity measurement is essential for examining the degree of qualz.ty or relz"abilz.tyof a developed sojiware system. This paper describes the experz`mentsconducted at Alcatel Portugal, concerning the use of Sofiware Reliabilz.ty models. The results are general and can be used to monitor sofiware relz"abilz.ty growth in order to attain a certain qualz`ty withz.n schedule. A method based on the analysis of the trend exhibz"tedby the data collected is used to z.mprove the predictions. The results show that-. . It is dicultfor the models to reproduce the observed faz"luredata when changes in trend do notfoLLowthe models assumptions. . The Laplace trend test is a major toolfor guz.dingthe partitionz.ng of failure data according to the assumptions of models relz.abz`Iz.gtyrowth. . Predz-ctz`oynields good results over a -timeperiod of a few months, showing that reliabilitY modeling is a major tooLfor test/maz"ntenanceplanning and foLlow up.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Wtroduction</title>
      <p>Software reliability models are used to monitor, evaluate
and predict the quality of software systems. Quantitative
measures providedby these models are a key help to the
decision-makingprocess in our organization.</p>
      <p>Since a lot of resources are consumed by software
development projects, by using software reliability
models, our goal is to optimize the use of these resources,
in order to achieve the best quality with lower costs and
optimizedschedules.</p>
      <p>They enable us to estimate software reliability measures
such as:
. The number of failures that will be found during
some time period in the future
. How much time will be required to detect a certain
numberof failures
. What is the mean time interval between failures and
what resources are needed (testing + correction) to
achieve a given quality level
. To perform comparative analysis: "how does my
productcomparewith others"
With respect to the software life cycle, the phases
requiringcarefulreliabilityevaluationare:
. Test: To quantify the efficiency of a test set and
detectthe saturationinstant i.e., the instant when the
probability of test failure detection becomes very
low.
. Qualification:To demonstratequantitativelythatthe
softwarehasreached a specified level of quality.
. Maintenance: To quantify the efficiency of
maintenanceactions. At the starting of operational
life, the software might be less reliable as the
operational environment changes. Maintenance
actionsrestore thereliability to a specified level.</p>
    </sec>
    <sec id="sec-2">
      <title>Data requirements these models needed to implement</title>
      <p>To implementthese softwarereliability models, a process
needs to be set up in order to collect, verify and validate
the errordata needed to be used as an input"
Since we are using a defect management tool to submit,
manage and track defects detected during the software
development life cycle phases mentioned above, it is
thereforerelatively easy for us to retrieve errordata from
the software defectsdatabase.</p>
      <p>This way, it is possible to collect historical and actual
error data from projects, in the form of time intervals
between failuresand/or number of failuresper time unit,
as softwarereliability models usually request.</p>
      <p>Normally, the following dataneeds to be available before
we startusing the models:
. The fault counts per time unit (where repeated
failuresare not counted)
. The elapsedtime between consecutive failures
. The lengthof each time unitused
. The effortspent on test per time unit</p>
    </sec>
    <sec id="sec-3">
      <title>ModeBng approach</title>
      <p>The basic approachhere is to model past failure data to
predict futurebehavior.This approachemploys either the
observed number of failures discovered per time period,
or the observedtimes between failures of the software.
The models used therefore, fall into two basic classes,
dependinguponthe types of data the model uses:</p>
      <sec id="sec-3-1">
        <title>1. Failuresper time period</title>
      </sec>
      <sec id="sec-3-2">
        <title>2. Times between failures</title>
        <p>These classes are, however, not mutually disjoint. There
are models that can handle either data type. Moreover,
many of the models for one data type can still be applied
even if the user has data of the other type, applying data
transformationsprocedures.</p>
        <p>For example, one of the models we use with moresuccess
is the S-shaped (SS) reliability growth model. For this
model, the software error detection process can be
described as an S-shaped growth curve to reflect the
initial learning curve at the beginning, as the test team
members become familiar with the software, followed by
a growth and then leveling off as the residual faults
become moredifficult to uncover.</p>
        <p>Like the Goel Okamoto COO)and the Rayleigh models,
that we also use very often, they can be classified as
Poisson type models (the numberof failures per unit of
time is an independentPoisson random variable). Their
performancedepends basically on 2 parameters:
. One that estimates the total number of software
failuresto be eventually detected.
. Another that measures the efficiency with which
software failuresare detected.</p>
        <p>In order to estimate the models parameters,we use the
tool CASRE: Computer-AidedSoftware Reliability tool.
This is a PC based tool thatwas developed in 1993 by the
Jet PropulsionLaboratoryfor theU.S. Air Force~</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Models assumptions</title>
      <p>
        The modeling approach described here is primarily
applicable from the testing phase onward. The software
must have maturedto the pointthat extensive changes are
not being routinely made. The models can't have a
credible performanceif the software is changing so fast
that gathering data on one day is not the same as
gathering data on another day. Different approaches and
models need to be consideredif thatis the case.
Another importantissue of the modeling procedure, is
that we need to know the inflection points, i.e.. the points
in time when the software failuresstop growing and start
to decrease. Reliability growth models cannot follow
these trend variations, thus our approach consists of
partitioning the data into stages subsequent to applying
the models. Inflection points are the boundariesbetween
these stages. A simple way to identify inflection points is
by performing trendtests, such as the Laplace trendtest
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The use of trendtests is particularlyimportantfor models
such as the S-shaped, on which predictions can only be
accurate as long as the observed data meet the model
assumption of reliability decay priorto reliability growth.
The model CS-shaped)cannot predict future reliability
decay, so that when this phenomenon occurs, a new
analysis is needed and the model must be applied from the
time period presentingreliability decay.</p>
      <p>However, this is not the only way of looking at the
problem. Assuming that the error detection rate in
software testing is proportional to the current error
content and the proportionalitydepends on the currenttest
effort at an arbitrarytesting time, a plausible software
reliability growth model based on a Non-Homogeneous
Poisson Process has also been used.</p>
      <p>How to
reliability
obtain
predictions
of
future
In a predictive situation, statements have to be made
regarding the future reliability of software, and we can
only make use of the informationavailable at thattime. A
trend test carried out on the available data helps choose
the reliability growth model(s) to be applied and the
subset of data to which this (or these) model(s) will be
applied.</p>
      <p>As mentioned before, the models are applied as long as
the environmental conditions remain significantly
unchanged (changes in the testing strategy, specification
changes, no new system installation...).</p>
      <p>In fact even in these situations,reliability decreasemay be
noticed. Initially, one can consider thatit is due to a local
random fluctuation and that reliability will increase
sometime in the near future. In this case predictions are
still made without partitioning data. If reliability keeps
decreasing, one has to find out why and new predictions
may be made by partitioningdata into subsets according
to the new trenddisplayed by the data.</p>
      <p>If significant changes in the development or operational
conditions take place, greatcare is needed since reliability
trend changes may result, leading to erroneous
predictions. New trendtests have to be carriedout.
If there is insuf6cient evidence that a different phase in
the programs reliability evolution has been reached,
applicationof reliability growthmodels can be continued.
If there is an obvious reliability decrease, reliability
growthmodel's application has to be stopped until a new
reliability growth period is reached again. Then, the
observed failure data has to be partitioned according to
the new trend.</p>
      <sec id="sec-4-1">
        <title>Number of models to be appEed</title>
        <p>With respect to the number of models to be applied,
previous studies indicated that there are not "universally
best" models. This suggests that we try several models
and examine the quality of predictionbeing obtainedfrom
each of them and that even doing so, we are not able to
guaranteeobtaining good predictions.</p>
        <p>During the development phases of a runningproject, it is
not always possible to apply several models, because of
lack of time, experience, and analytical and practical
tools. Usually people only apply one, two or three models
to their data. Analysis of the collected data and of the
environmental conditions helps us understanding the
evolution of softwarereliability,and data partitioninginto
subsets helps us improvethe quality of the predictions.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Models caRbratJonand appBcation</title>
      <p>The models may be calibrated either after each new
observed data (step-by-step) or periodically after
observation of a given number of failures, say y,
(y-stepahead). Stepby-step prediction seems more interesting.
However, one needs to have a good data collection
process set up to implement this procedure, since data
might not always be available immediately.In operational
life, longer inter-failure times allow step-by-step
predictions.</p>
      <p>Since we have a database with error data from running
projects in our organization (the defects are collected
from the test phase onwards), we have a forma\ procedure
to regularlyretrieve, analyse and verify this data.
Then we use a periodical approach, to make predictions,
which can be summarizedas follows:
. Every week, we retrieve errordata from the projects
we are interestedin evaluatesoftware reliability.
. We analyze and validate this data and look for
possible trends,in order to select the best data set that
could be used for doing predictions.
. If models assumptions are met, we apply the models,
validate them and analyzethe results they provide.
. Then we collect feedback from people involved in
the projectsand, if necessary. take actions thathelp in
improvingproductsreliability.</p>
      <sec id="sec-5-1">
        <title>Laplace trend test</title>
        <p>
          The Laplace trend test [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] is used to determine the
software reliability trendusing data on failuresrelatingto
software:
. Time intervalbetween failures,or
. Number of failuresper time unit.
        </p>
        <p>This test calculatesan indicator u(n), expressed according
to the data (time interval between failures or number of
failuresper time unit). A negative u(n) suggests an overall
increase in reliabilitybetween dataitem 1 and dataitem n.
A positive u(n) suggests an overall decrease in reliability
between data items I and n. Therefore, if we notice an
increase (decrease) in u(n) then we have a period of local
reliability decrease (growth).</p>
        <p>TheLaplace trendtest is straightforwardand much faster
to use than models. The reliabilitystudy can be stopped at
this stage if it is believed that the information obtained
has, indeed, answeredthe proposed questions. Of course,
the obtainedinformationis restrictedto:
. Increase in reliability,
.
.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Decrease in reliability, Stablereliability.</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Case study</title>
      <p>We are going to apply the previously described
methodology to the software validation phase of one of
the software projects currently in the maintenancephase
in our company.</p>
      <p>The project in question is a large telecom network
managementsystem, with more than 350 000 source lines
of code. The volume and complexity of this software
system makeit difficult, if not impossible, to eliminate all
software defects prior to its operational phase. Our aim
was to evaluate quantitatively some operational quality
factors, particularly software reliability, before the
softwarepackage startedits operationallife.</p>
      <p>The software validationphase for this project is a 4 step
testing process I) integration test, 2) system test, 3)
qualification test and 4) acceptance test. The first 3 steps
correspond to the test phases usually defined for the
softwarelife cycle. Acceptance test consists of testing the
entire software system in real environment, which
approachesthe normal operating environment. It uses a
system configuration (hardware and software) that has
reached a sufficient level of quality after completing the
first 3 test phases describedabove,
After the validation phase has started, software errors
detected by the test team were submitted into the defects
database.</p>
      <p>Failure reports that had at least one of the following
characteristicswere rejected:
. Failures not due to software, but data,documentation
or hardware
. Reportsrelated to an improvementrequest
. Results in accordanceto specifications
. Reports failuresalreadyaccounted for
In order to collect the test effort spent per time unit on a
given project, we used data existent in another database
specially created to collect manpowerfigures.</p>
      <p>Ourgoal was to evaluate:
. The number of failures that still remained in the
software, after half of the test"planned time was
completed
. The time-point when 95% of all software failures
existing in the software (forecasted by the models)
were found
. The mean time between failures achieved by the end
of system, qualification and acceptance tests
. The amount of test effort (testing + correction) still
needed to achieve the target of 95% of software
defects found.</p>
      <p>When we first decided to apply these models, we were
half the way throughthe system test phase. At that time
we were interested in determine the number of defects
remaining in our application so we could reevaluate our
test strategy. The first approachconsisted in considering
the entire set of softwarefailures collected up to thattime,
to model thesoftware reliability.</p>
      <p>To meet this goal, we selected a set of models that used
failures per time period as an input The S-shaped (SS)
and the Broof;:s/MOdey(BM) models were chosen,
independentlyof the severity (critical, major and rninor),
of the observedfailures.</p>
      <p>Figure 1 shows that the models had difficulty in
modelling the entire [ailureprocess.</p>
      <p>1800 ` - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Despite the fitting and adjustmentproblems observed, we
can notice two different behaviours in the software
models predictions. The SS model presents a more
"optimistic"vision of the failure process than the BM.
These differences are often observed. and to identify
which model was trustworthyfor predictions,some expert
judgement was needed, since validation statistics, namely
the residue and the Chi-Squared statistics, were not
enough to help us deciding.</p>
      <p>The following table summarizes the models results:
Notice that the total number of failures predicted by the
BM model was extraordinarybig. This didn'tmean that
we didn'tconsider this model prediction. Insteadof using
ifs asymptotic measures, we only considered the
predictions for the next 20 time units, which revealed
more accurate.</p>
      <p>After these results have been analysed, the project team
agreed that the system test had serious chances to be
delayed,so they had to rethink theirtest strategy.
Lateron in the project, right after the qualification test
phase has started, the questions were whether if the
software would be ready for shipping by the end of this
phase and in case it didn`t, how much effort (testing +
corrections) was still required to achieve the target of
95% of all defects forecasted found. It was an important
decision to be made and the conclusions couid have
seriousimplicationsin the projectschedule.</p>
      <p>In orderto improve the accuracy of the new predictions
we decided to restrictthe data set to be used by applying
the Laplacetrendtest.</p>
      <p>As it can be seen in figure 2 the Laplace trend test
graphicallowed us to observe periods of local reliability
growthand decreasein the data.
Consideringthe models assumptions,the periods selected
for a new reliability evaluationwere P2 for the SS model,
since we can notice thatthere is a decrease in reliability
followed by a reliabilitygrowthperiod, andPI for the GO
and BM models, since there is only reliability growth
observed.</p>
      <p>By runningthe models again we noticed thatthe deviation
was significantly reduced, thus improving the reliability
evaluation(see figure3).
defects found. Figure 4 and table 3 below summarizethe
results obtainedby using this model,
The validation statistics told us that that the observed
residues were now lower, which gave us more confidence
in the models results.</p>
      <p>The following table summarizesthe new results observed:
Based on these results, plus the expert judgement
providedby the project team we considered the S-shaped
model values for reference(optimistic view).</p>
      <p>However therewas still a question that needed an answer
How much test effort sciJl had to be spenc in order the
software could be 95% error free? To answer to that
question a differentmodel with a different approachwas
needed. Since tesc effort is clear2y correlated with the
defects found during the test phases, we decided to use
test effort inputs, in the S-shaped model instead of
calendartime,
To include the test effort data in the model, we had to
restrict the data range to the period from where we had
reliable effort data figures. By doing so, it was possible
for us to evaluate with the same model, the remaining
failures in the software and the test effort needed to find a
given amountof defects in the software system.
We decidedto apply this new model Cs-shapedmodified
SSM) to the da set containing , suggest by e
pJace end tt (s 68e 2). th a few adjustmen
in order for the test eo to reBect more accurately e
As it can be seen, the model fitting is quite accurateand
reasonably adapted to the failure data observed. These
results were a major help to the project team, who was
able to make more accuratedecisions based on the results
providedby this model.</p>
      <p>As it was mentioned before, expert judgement provided
by people from the projects,plays an essential role in the
process of deciding which model results to select. Unless
we are pretty shore aboutthe stability of our product,i.e.,
we know that we shouldn'texpect too many defects in the
near future, and the test environment is not suppose to
change much, we can not rely significantly on these
results.</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>Software reliability models are an important aid to the
test/maintenance planning and reliability evaluation,
However, it is well known that no particularmodel is
better suited for predicting software behaviour for all
software systems in any circumstances. Our work helps
the already existing models to give better predictions
since they are applied to data displaying trends in
accordanceto their assumptions,
With respect to the applicationof the proposedmethod to
the failure data of our network management project 2
models, namely, the S-shaped and Brooks/Motley, have
been analysed according to their predictive capabilities.
The resultsobtainedshow that:
.
.</p>
      <p>The trendtest help partitionthe observed failuredata
according to the assumptions of reliability growth
models; it also indicates the segment of data from
which the occurrence of future failures can be
predictedmore accurately;
The prediction approachproposed for the validation
phases yields good resultsover a time period of a few
months, showing thatreliability modeling constitutes
a major aid tool for test/maintenanceplanning and
follow up.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[II DerriennicH</article-title>
          ., andGall G-,
          <article-title>"Useof Failure-IntensityModels in the SoftwareValidationPhase forTelecommunications"</article-title>
          ,
          <source>IEEETrans~on Reliability</source>
          ,Vol.
          <volume>44</volume>
          , No.
          <issue>4</issue>
          ,
          <string-name>
            <surname>December</surname>
          </string-name>
          <year>1995</year>
          . pp.
          <fpage>658</fpage>
          -
          <lpage>665</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>GoeI A.L</surname>
          </string-name>
          , and
          <article-title>OkumotoK</article-title>
          .,
          <article-title>Time dependent error detection ratemodel for softwareand otherperformance measures"</article-title>
          ,
          <source>IEEE Trans-on Reliability</source>
          ,Vol. R-'
          <volume>28</volume>
          ~No~
          <issue>3</issue>
          ,
          <string-name>
            <surname>August</surname>
            <given-names>2979</given-names>
          </string-name>
          , pp.
          <fpage>206</fpage>
          <lpage>212</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>KanounK.. MartiniM.R.B.</given-names>
            ,
            <surname>andSouza</surname>
          </string-name>
          <string-name>
            <surname>J.M.</surname>
          </string-name>
          ,
          <article-title>"AMethod for Software Reliability Analysis andPrediction Applicationto the TftOPICO-RSwitching System"</article-title>
          ,
          <source>IEEETrans.on SoftwareEngineering</source>
          ,Vol.
          <volume>27</volume>
          . No.
          <issue>4</issue>
          ,
          <string-name>
            <surname>April</surname>
            <given-names>2991</given-names>
          </string-name>
          , pp.
          <fpage>334</fpage>
          <lpage>344</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4}
          <string-name>
            <surname>Lyn</surname>
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <article-title>"Handbookof SoftwareReliability Engineering"</article-title>
          , Publishedby IEEEComputerSociety Press and
          <article-title>McGrawHill Book Company</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>YarnadaS..Hishitani J.</surname>
          </string-name>
          , and
          <string-name>
            <surname>OsakiS.. .Software</surname>
          </string-name>
          <article-title>Reliability growth with a Weibull TestEffort:A Model &amp; Application"</article-title>
          ,
          <source>IEEETrans.on Reliability</source>
          ,Vol.
          <volume>42</volume>
          , No, l,
          <source>March</source>
          <year>1993</year>
          , pp.
          <fpage>2OO2</fpage>
          -
          <lpage>06</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>