=Paper= {{Paper |id=Vol-2010/paper1 |storemode=property |title=Service Reliability And Availability Model With Petri Nets: A New Hybrid Approach For Service Availability |pdfUrl=https://ceur-ws.org/Vol-2010/paper1.pdf |volume=Vol-2010 |authors=Lucrezia Palummo,Rachele Meriggiola,Emanuele Guidolotti,Damiano De Luca |dblpUrl=https://dblp.org/rec/conf/ciise/PalummoMGL17 }} ==Service Reliability And Availability Model With Petri Nets: A New Hybrid Approach For Service Availability== https://ceur-ws.org/Vol-2010/paper1.pdf
 SErvice Reliability and Availability model with Petri
 Nets: a new hybrid approach for service availability

L. Palummoa*, R. Meriggiolaa*, E. Guidolotti*, D. De Luca+
*System Engineering Department
*Aster S.p.A.
*via Tiburtina 1166, 00156 Rome - Italy
+
  Telespazio Via Tiburtina 965, 00156 Rome - Italy
a - corresponding author: lucrezia.palummo@aster-te.it, rachele.meriggiola@aster-te.it

                                                     Copyright © held by the author

    Abstract— Most part of the existing analytical models to          Unfortunately, it becomes not applicable due to the size and
predict service reliability and availability assume a static          complexity of models or due to non-linear nature of the
behavior of the service and do not take into account the              problem involved.
correlation between the invoked system components. In order to
take into account the dynamic aspects of a service as functional          The Reliability Modeling and Analysis to improve a
chains, operational processes and logistic support, a hybrid          Service Reliability have been proposed using the two-state
approach is here introduced: a dynamic SERA (SErvice                  model or finite state machine, Model–Based approach and also
Reliability Availability) Model, including a service simulation       proper algorithms [8,9,10,11], to reduce the inefficient of the
model based on hybrid Petri Nets. The main goal of the proposed       approximation methods and using a simulation to predict the
model is to determine the reliability/availability of a service       behavior of system/service. Various simulation methodologies
taking into account the characteristics of the service (functional    such as Monte Carlo simulation, Discrete event (DE)
chains and operative processes), as well as the SW/HW                 simulation, Subset simulation, Hybrid subset simulation,
dependability figures (MTBF, MDT). In the proposed approach           Simulated annealing, Stochastic simulation, Digital
the service and its invoked system components are represented         simulation, and Markov System Dynamics (MSD) simulation
through Hybrid Petri Nets where the SW/HW failures have been          can be used in reliability engineering [12], and also a new
modeled with stochastic distribution through kinematic Monte-         method as RAMSAS based on SoS (System of System) Model
Carlo time simulations.                                               using suitable model-driven techniques and simulation
   In order to refine and validate the proposed model a case
                                                                      technique to evaluate the Reliability performance of the system
study based on a simple user registration service has been
developed. The results show the feasibility of the proposed
                                                                      and possibly, compare different design alternatives and
approach along with a set of metrics used to quantify service         parameters settings [13].
performances on a statistical basis and evaluate service quality.     The simulation becomes need to predict the performance of the
                                                                      Service and drive the design.
   Keywords—RAMS, Availability, Service, Simulation, Petri Nets

                                                                                          II. BASIC CONCEPTS
                      I. INTRODUCTION
    Service Reliability significantly affects the operational         A. Reliability Definition
transitions, the potential users, and the degree of adherence to          Service reliability should not be confused with network
requirements can affect the customer satisfaction and                 reliability, which is instead related to the overall availability of
perception of service quality. If a user request it is not            the system. For this reason the following definitions are here
completed on schedule, service is perceived as unreliable,            introduced:
being the requested output delivered on longer times. Service             Reliability - the probability that service will be
unreliability can have a great impact on the system and its           continuously available over a given period of time.
users.                                                                    Service – an available system function which can be used
    The increasing demand for flexibility and extensibility of        by a person or a machine and it is based on a sequence of
the services has resulted in a wide adoption of web services and      operations (called transactions) focused on state transitions.
SOA (Service-Oriented Architectures) [1,2,3] applications.                Service Reliability - the probability that a service
Even if several studies and modeling of services have been            infrastructure will be continuously available in a given time
used in the past to estimate and improve reliability and              considering hardware failures, software faults and human
availability [4,5,6,7], the evaluation of modern systems              errors and focuses on the state of service execution.
remains a challenging problem due to the increased level of               Transactions are instead specific instances of a service use
complexity. One of the most commonly applied approaches is            (e.g listening to the radio station; a user connects pc to the
the analytical methodology, which produces accurate results.          Internet; a shopper pays for a purchase using a credit card).
    On several cases the Service overall of the System is
composed by different services that share HW and some
functions are interlocked between them. A service is evaluated            For availability assessment, the suitable various methods
by a list of success criteria to be fulfilled in order to achieve a    can be performed:
continuous delivery of required outputs and the execution of           • Analytical method
transactions. Such a list is defined using the attributes of the       • Markov process
service.                                                               • Monte-Carlo simulation
    A figure of merit is any quantitative expression, expressed
by means of a probability or other statistical parameters, used
to describe a specific aspect of the study target. For example,            This last numerical technique allows the evaluation of
the expected downtime during one operative year is a figure of         availability taking into account in a realistic way all aspects
merit for the reliability of a maintained system. A metric is          associated with the design, logistics and operations. The main
instead a quantity used to evaluate the degree of adherence to a       advantage of Monte-Carlo simulation is the capability to
requirement (expressed by a figure of merit).                          represent complex system scenarios with deterministic or
    Service Reliability can be measured by several metrics,            probabilistic delays.
related to different aspects:
   1.    End-User: Service Accessibility, Continuity, Release          III. HYBRID APPROACH TO SERVICE AVAILABILITY
                                                                           A possibility to determine a Service availability is given by
    2. Internal Metrics: outages, duration, task interruption,         the combination of the traditional methods to evaluate
failure distribution, incomplete instance and features not             Availability (Combinatorial method, Enumeration method,
available.                                                             Simulations method) with the Service attributes. Traditionally
    3. Performances - total delay time during transaction,             these methods are separately used in the analysis framework,
delivered products with or without delay.                              but in the proposed approach these four assessment kinds are
                                                                       unified with the goal to achieve a complete prediction analysis.
B. Definitions of Availability
   The concept of Availability was originally introduced for
repairable systems, which are required to operate 24/7; in this
case a failure could randomly occur along the operational life
and a maintenance intervention is required to restore operations
in a minimum time. There are several definitions of
Availability on literature. A general definition is:
                       Downtime      Uptime
          Aop = 1 −               =
                      Total _ Time Total _ Time                  (1)
    where the Downtime includes a repair time (corrective and
preventive maintenance time), a management time and a
logistic time. In several cases it is worthwhile also to consider
O&M organization, plans, procedures and tools dedicated to
system management during the operational phase. In this case,          Fig. 1 – hybrid approach to service availability modeling
Operational Availability can be defined:
                         MTBF                                              The steps of the process to define a hybrid approach model
             Aop =                                                     are listed as follow:
                       MTBF + MDT                          (2)
                                                                          1. Define the estimators of a Figure of Merit of the
 Where MTBF is the Mean Time Between Failure, MDT is the               Service (Service)
Mean Down Time, equal to: MDT=MTTR+LDT, where LDT
                                                                          2. Define a flexible model taking into account the use
is the Logistic Delay Time. For some applications, the user-
                                                                       case of the service and the SW/HW dependability figures
oriented approach can characterize the system in a “black-             (MTBF, MDT) (Combinatorial Method)
box” manner and specifying availability according to the
number off, for instance, delivered products, services, or               3. Use the representation of a complex system with the
mission data with respect to user demands or nominal scenario          Hybrid Petri Nets (Simulation, Enumeration Method)
[14]. If the availability is specified by a percentage or number           4. Model the SW/HW failures with stochastic
of successfully delivered products, the Service Availability           distribution through kinematic Monte-Carlo time simulations;
(SA) shall be expressed as the ratio between the number Nc of          inputs are injected in the model and the delivered outputs are
completed requests and the number Nt of total requests:                computed taking into account the functional chains and the
                                                                       operational processes (Simulation, Service)
                                                     (3)
    To discuss service reliability, the persistence of service       A. Proposed Methodology
quality over time or the absence of service failures over time, it       Define One of the main goals of the current study was to
is necessary to know what failure means for a service. The key       prove the effectiveness of the proposed hybrid approach to the
idea is that service failures are usually traceable to events or     evaluation of Service Reliability and Availability. For this
conditions in the infrastructure whose occurrence (or failure to     reason a feasibility study based on a simple test case of an user
occur) causes the service failure. That is, service failure          registration service was performed. The study allowed to refine
mechanisms are found in the delivery infrastructure for the          the proposed model and to clarify several aspects of the
service. It is apparent that models for failure of transactions in   Simulation Model. In this Section the proposed methodology
a given service will depend heavily on the specific details of       along with details of its application to the study case are
that service.      This Section develops ideas for service           reported.
reliability/availability modeling for each kind of services,
using the same network topology (nodes & paths) as a way of              The purpose of the user registration service is to allow to
illustrating how those details are used in creation and use of       the end-user to insert a set of initial parameters to be recorded
service reliability models.                                          on database in order to receive credentials for future login and
    The process of reliability/availability model is described as    access to a system. In the current example the user registration
follow:                                                              service is deployed on three different sub-systems, each of
                                                                     those constituted by a SW and HW component:
   1. Define the steps of the reference Use Case and the
involved HW/SW to execute the Service, according SE                      1. Client – in charge to provide a Graphical User
methodology.                                                         Interface to the user for data entry and display
    2. Map the steps of the service execution; identify the             2. Server – deputed to manage the registration requests
nodes and find all possible paths related to the steps of service    and to interrogate the database
execution; the paths represent the connection between HW                3. Database (DB) – deputed to record the registration
nodes, which involves a software function. Transform the paths       requests and to provide the related feedback
into logical equations by applying “&” (AND) operators
between nodes in the same path and the “||” (OR) operator
between parallel paths.
   3.    Draw the operational workflow of Service.
   4. Transform the logical expression of service into a
Reliability Block Diagram (RBD) taking into account possible
redundancy configurations.
    5. Collect the system information (e.g. architecture, the
reliability figures of each equipment, and SW application,           Fig. 2 – User Registration Service
maintainability figures as mean time to restore and logistic
delay time).                                                             The user will access to the Client (e.g. a web page via
    6. Compute the Service Reliability and Availability              browser) and insert into the Client GUI data required for the
figures according the Prediction models on the RBD base.             user registration; after a data check, the Client will create a
                                                                     registration request and send it to the central server for request
   7. List the traceable System events or conditions whose           management and interrogation to database. The database will
occurrence or failure leads to the service failure and identify      provide a feedback to the server about the correct user
the permanent o transient failures that affecting metrics            registration, and the server will turn it back to the Client in
(FMECA or FTA can be used).                                          order to inform the end-user on the accomplished registration
    This method can estimate objectively the Service                 process. (Figure 2).
reliability/availability starting by the functional chains and          1) Define the Use Case
operational processes, according the chronology execution of             In a complex system the definition of a service can be
the service and considering the architecture needs to fulfill its    based on the definition of the correspondent use case of the
performances. Moreover considering the RBD and the support           System along with the tracing of the involved System
of techniques such as Failure Modes Effect and Criticality           components invoked by the process. This task can be
Analysis and the Fault Tree Analysis, it is possible to identify     demanding for all those services invoking several configuration
which failures and their impact on the Metrics and Figures of        items or entire sub-systems, and in general it requires a
Metrics.                                                             preliminary analysis of the System and its components. Every
    This flexible model above described, is the base on drive        logical step of the use case should be identified and correlated
the simulation, without which time evolution and transitions         with the involved hardware and software components. During
on the service states can be not considered.                         this phase it is also mandatory to define the level of abstraction
                                                                     to apply to the use case definition: for example, in the current
                                                                     test case HW and SW components are considered as single,
                                                                     independent units, characterized by its own reliability and
                                                                     availability, and no further level of detail is required. Fix the
abstraction level is fundamental to determine the level of detail             where no redundancy has been applied. Any occurring failure
into the reliability/availability representation of each                      will lead to an interruption of the service, which could be
component, to be used as input into the SERA computation.                     permanent or temporary.
Table 1 reports the use case of the service example (user
registration), along with the involved HW/SW components.


               TABLE I                           STUDY CASE                   Fig. 4       RBD of the analyzed User Registration Service
                          Use Case of a User Registration Service
       Step
                           Description                  HW            SW
                                                                                3) Compute the HW/SW Availability
                 The User access to the System
   1                                                   Client       Client        After the RBD definition and the detection of the single
                 GUI
                 The User inserts the registration                            points of failure, it is needed to define as input data for the
   2                                                   Client       Client    service availability analysis the availability figures of each
                 parameters and submits them
   3
                 The      System      checks   the
                                                      Server        Server
                                                                              HW/SW component invoked by the service process. Even if
                 registration request                                         there are several parameters which could be used as availability
                 The System interrogates the                                  metrics, in the current study is focused on the use of most
   4             database for a new user              Server        Server
                 registration
                                                                              common and applied parameters:
                 The System creates an user                                       •        Mean Time Between Failure - (MTBF)
   5             account choosing the proper user       DB            DB
                 profile.                                                         •        Mean Delay Time - (MDT)
                 The System provides the user
                                                     Server +      Server +
   6             with     the    credentials   for                                These parameters will be computed according to the
                                                      Client        Client
                 accessing the system.                                        Prediction models on the RBD base. For the current test case,
                                                                              the following values have been reported on Table 2.
    Along with the use case correspondent to the service it is
required also to define the operational workflow of the
analyzed service (e.g. by means of sequence diagrams or                                TABLE II             COMPONENT MTBF and MDT
                                                                                                        Use Case of a User Registration Service
equivalent), in order to fix the sequence of logical operations
that the System must perform to execute the service and in                                                        Number of
which order the System components are invoked (Figure 3).                                                          Failures     MTBF
                                                                                   Sub-System      Component                               MDT (h)
                                                                                                                    (for 30      (h)
                                                                                                                     days)
                                                                                                   SW                 3           240         4
                                                                                  Client
                                                                                                   HW                 1           720         4

                                                                                                   SW                 2           360         4
                                                                                  Server
                                                                                                   HW                 1           720         4

                                                                                                   SW                 3           240       1 or 4
                                                                                  Database
                                                                                                   HW                 1           720         4


                                                                                 4) Define the Service Failures
                                                                                  Once the model is defined, it is required to analyze and
                                                                              define the System events/conditions whose occurrence or
                                                                              failure leads to the service failure. It is possible to define
                                                                              service failures as Permanent (no recovery of the service) or
                                                                              transient (temporary failure which causes to the system to not
Fig. 3        Operational workflow of a service for the proposed study case   be available within a predefined time period). In the proposed
                                                                              example, all failures occurred on HW have been defined as
    2) Compute the Reliability Block Diagram                                  permanent, as well as failures occurring on Client and Server
    Once the use case is defined and the operational workflow                 SW; failures occurred on Database SW have instead been
are defined, it is needed to understand how the connections                   defined as transient, assuming that the database is able to
between the different HW/SW components can affect the                         record the registration request and reprocess it later, keeping it
global reliability (and availability) of the functional chain. For            as pending. This definition leads to the concept that a failure
this reason it is required to define the Reliability Block                    can affect both the functional aspect (the service process is
Diagram (RBD) according to the System HW/SW architecture.                     interrupted) and the performance aspect of a service (the
This analysis is generally based on system design                             service process is not interrupted but the service is not in line
documentation and on RAMS analysis documentation; in the                      within the expected performances).
proposed study case, the RBD is a simple chain (Figure 4),
                                                                      proposed approach allows to compute and monitor the SERA
                                                                      evolution along time, as well as to evaluate its sensitivity to
                                                                      single system components. SERA simulations allow to perform
                                                                      feasibility studies, providing a powerful and flexible support to
                                                                      the system design phase. This aspect proved particularly useful
        TABLE III                          SERVICE FAILURES           in order to track and prevent unexpected service failures and/or
        Sub-               Use Case of a User Registration Service    system trends at system level.
       System         Component        Permanent          Transient       Nevertheless, there are also some drawbacks to the choice
                      SW                   X                          to perform a SERA simulation w.r.t. a traditional, qualitative
       Client                                                         evaluation (e.g. FMEA/FMECA): simulations are time-
                      HW                   X                          consuming, require longer start-up times, and have high
                      SW                   X                          computational costs, which can be minimized by the adoption
       Server                                                         of proper computational facilities and by an adequate level of
                      HW                   X
                                                                      abstraction into the system components representation.
                      SW                                      X
       Database
                      HW                   X                          B.     Comparative overview of Simulation Models
                                                                          The availability of a service will be naturally dependent
  5) Define the Service Metrics                                       from the availability of the involved components; for this
    In the definition of a SERA model it is fundamental to            reason any service availability model will be based on the
define a quantitative approach to evaluate the service                operational workflow including the involved components. A
availability along with the metrics to evaluate the robustness        service model will have to represent a specific functionality,
and the performance of the analyzed service. Several figures of       always related to an input request (e.g. a number of products
merit can be defined to support such an analysis; here only a         to be released) and to an output generation (the released
subset of most significant metrics have been taken into               products). According to this view, a service availability can
account, according to the study case.                                 be modeled, among the several possibilities, by three main
    Along with the service availability metrics, the current          approaches:
study introduced two distinct sets of metrics to evaluate the             •     Adaptation of System simulators
analyzed service: the Internal Metrics, used to monitor a                 •     System Engineering SysML Models
specific aspect of the service related to single system                   •     Petri Nets
components and the Performance Metrics used to quantify the               System simulators are quite efficient to represent a system
service performances.                                                 behavior and they can be adapted to retrieve the information
   •       Internal Metrics                                           related to a specific service, but at the price of an increased
                                                                      level of complexity in its representation, and into a limitation
                  o    Failure Distribution per Component             of the retrieved information (dedicated service modeling).
   •       Performance Metrics                                            System Engineering SysML Models are also an efficient
                                                                      tool to represent system behavior and they can be modeled to
                  o    Number of Failed Requests
                                                                      include a service representation; this requires nevertheless the
                  o    Number of Recovered Requests with delay        availability of a refined SysML Model at an enlarged level of
                                                                      detail, a prerequisite condition that often is not satisfied, for
                                                                      medium and complex systems.
  6) Build the Simulation Model                                           Petri Nets are a flexible mathematical method used to
    After the collection of all the required inputs to describe the   represent discrete, continuous and stochastic variables [15].
service and the invoked components, along with the definition         Historically, Petri nets (PNs) are widely used to model
of the metrics to evaluate the SERA, the final step consists into     discrete systems (computer systems, manufacturing systems,
build a flexible and robust model capable to represent the            communication systems), but in the latter years, with the
service on the base of all the information provided from the          introduction of a representation for continuous and stochastic
hybrid approach. This process will be extensively described on        variables, their use has been enlarged to other fields (e.g.
Chapter 4.                                                            biology) [16]. Hybrid Petri Nets allow to represent stochastic
                                                                      and discrete behavior of system components at the same time
                                                                      with a good level of flexibility and scalability; this is the
 IV. SERVICE AVAILABILITY SIMULATION MODEL                            reason why they have been selected to determine the service
                                                                      availability on a statistical basis (Montecarlo simulations).
A. Introduction
    The main advantages to develop dedicated simulations              C.     The Hybrid Petri Nets Model
result into the capability to provide a quantitative evaluation of       Petri nets have been used to represent the use case
SERA and a rigorous representation of a specific Service and          workflow (see Section III). A Petri net is a mathematical
its dependency from the system components interaction. The            modeling language used to describe distributed systems. It is
not the purpose of the current paper to describe Petri Nets,         distribution. When a failure occurs, the data (token) are
extensively reported on related literature [15]; here only the       removed from the operational sequence (P2 and P3) and a
basic elements required for the model understanding are              failure counter (P4 and P5) is updated. This simple mechanism
briefly recalled. A Petri net is formed by the following             allows to model a permanent failure for the Client component,
elements:                                                            according to the Service Failure definition reported on 3.1.4.
                                                                               This means that the data contained on P2 at the
•   Places/States (P) – circular elements used to describe the       moment of a failure occurring on T2 will not be recovered and
    state of a system component at a predefined time t;              it will be considered as lost. In this very simple example, the
•   Tokens – black marks describing the data flowing into the        failure is assumed to endure for a time step, but in the final
    system; at each time step of the simulation the tokens are       model the failure time span has been easily expanded to
    added/removed from one place to the other according to           endure for a specific time. Failure counters are fundamental,
    the arc connections and the transition type.                     since they allow to monitor the failure trend of each
•   Transitions (T) – rectangular elements used to describe          component along a Montecarlo simulation, and to compute the
    the data flow from one place to the other. At each time          internal metrics defined on Section III. On each Montecarlo
    step of the simulation transitions can fire and change the       simulation run the Service Availability is computed from the
    status of the places. The Hybrid Petri Nets Model includes       comparison between the number of tokens present at the initial
    several transition types, the most used for Service              and final place of the represented chain.
    Availability modeling are:
         o Discrete - tokens flow are added/removed as
              discrete values;
         o Continuous – tokens are managed as continuous
              (fractional) values;
         o Stochastic – tokens flow is managed according
              to a stochastic distribution along the time span of
              the simulation.
•   Arcs (arrows) – connections between places and
    transitions, which describe the token flow conditions and
    directions.



                                                                     Fig. 6 Failure distribution per component

                                                                       1) Failure Modeling
                                                                     The proposed approach allows to model the failure occurrence
                                                                     for each HW/SW component through the Probability Density
                                                                     Function (PDF) of a stochastic distribution. For the specific
                                                                     test case a PDF of an exponential distribution was used:

                                                                                                                                 (3)
                                                                     where λ is the Failure Rate, computed as:


Fig. 5 – Detail of a Hybrid Petri Net                                                                                            (4)

On Figure 4 a detail of an Hybrid Petri Net sample model for         The adopted Failure rates are derived from MTBF values
the user registration test case is reported. P1 represent the        reported on Section 3.1.3. On Figure 5 it is possible to observe
place of the input user registration requests (250 tokens are the    the failures distribution in a time interval of 30 days for a
input data flow). Client HW and SW components are                    Montecarlo simulation of 50 runs.
represented by P2 and P3 places. T1, T3 and T5 are the
discrete transitions that, firing at every time step, allow to the      2) Maintenance & Recovery
tokens (input requests) to move to the next place. The chain             The SERA Model takes into account both permanent and
T1-T3-T5 represents the nominal sequence of the service: the         transient failures; this aspect leaded to the introduction of a
input registration request is saved on HW and processed by           system chronology and a maintenance process. In the current
Client SW.                                                           study case the Petri Net elements have been used to build a
          T2 and T4 are instead stochastic transitions and           clock to monitor system chronology and a simple maintenance
represent the occurrence of a failure, according to an input         process to test the effectiveness of the model on real
conditions. For each component affected by transient failures
(Section III) the model verifies the time of the day at which a
failure occurs; if the failure occurs in the time span between the
8:00 AM and the 18:00, it is assumed an immediate
intervention of the maintenance, with a MDT of 4 hours. If
instead the failure occurs in the time span between the 18:00
and the 8:00 of the following day, the intervention is delayed to
the 8:00, with a MDT of 1 hour (nominal process will then be
restored at 9:00). On both cases during the unavailability time
window the received registration requests are collected in a
queue, waiting for the component restoration. The request
queue will then be run out according to the simulated
processing times and remaining input requests to process. The
introduction of this maintenance modeling is fundamental to
                                                                     Fig. 7 – Global & restricted Service Availability
reproduce the real conditions of a processing queue and the
related delay into the final request release. It is important
underline the logistic support can also drive the system design          The second remarkable result from simulations is the
and operation phases; in this study case the Logistic Delay          relevant difference occurring between the SA and the SAr,
Times have been choose only to highlight as the failure              especially in two worst cases where the random distribution of
recovery can influence the completed requests, fundamental           failures had a dramatic impact on the SAr value (<65% and
requirement for service success. Further evaluations, such as        <25% respectively). This result led to the conclusion that in a
the different logistic scenarios, will affect Service Availability   real condition of stochastic distribution of failures there is a
will be discussed in the next studies.                               significant probability (>1/50) to have at least a service
                                                                     availability value far below the commonly accepted standards
                                                                     (80% or higher). Most of all, this is not due to a failure of a
                                                                     specific component, but to the way the failures occur and at
   3) Simulation Scenario                                            which time. On the SAr < 25% case, one failure occurred on
    The current study was tested with a series of dedicated          DB-SW at 19:00, generating a large request queue; another
Montecarlo simulations using open-source Snoopy Petri Net            failure occurred at 10:00, giving rise to a unavailability
Tool [17]. Snoopy proved to be efficient and adapt to the            window of 16 h on a total of 17 h (from 9:00 to 10:00 the
feasibility study, with some limitations into the missing            component was available). In other terms, the results show
possibility to use timed transition, which limited the modeling      how, in a real stochastic distribution of failures, the
possibilities for the recovery times.                                maintenance policy can have a significant weight on the final
    The SERA Petri Net model of the User Registration Service        service availability performance. Such a result is fundamental
was implemented and tested on different time windows, from           especially in a system design phase, where the selection and the
30 days up to several years, in the case of realistic MTBF           RAMS analysis of the single components is not the only
values. Please notice that for the current test case the time        element to take into account for the implementation of reliable
interval of each simulation was set to 30 days (720 h) at a time     services. The proposed method provided a refined picture of
step of 1 h, with low MTBF values, with the declared purpose         the reliability and availability of a service at a very reduced
to put in evidence the method capabilities and limits of the         cost (one single Petri Nets), proving its flexibility and its
model. In the model each HW and SW component can be                  effectiveness.
affected by failures, according to the aforementioned modeling.
                                                                                              CONCLUSIONS
                         V. RESULTS
                                                                     The traditional Prediction Model gives an Availability figure
    The results of the SERA Model are reported on Figure 6. A        on steady-state (asymptotic condition) without taking into
specific percentage value of Service Availability was computed       account time evolution and transitions on the state of the
for each Montecarlo run, showing the statistical trend due to        service. The proposed SERA Model allows to determine the
the interaction of stochastic failures. The most remarkable          failure distribution and the impact on the service outputs,
result is that the computed value of SA (average value: 97.94%)      taking into account the logistic support and the operative
is very far from theoretical predictions (99.99%), based on a        process. By Monte-Carlo runs it is possible to predict the
static view of system components. It is also relevant to notice      Availability mean value and its distribution on a statistical
that it is possible to characterize the SA by its related            basis, replicating the operational conditions. The proposed
distribution, in order to define a trend and derive an expected      approach proved to be suitable especially as support method
value. On the reported study case the SA distribution shows that     into system design, allowing to detect possible criticalities and
a statistically expected value of SA is between 97% and 98%, a       to predict on a statistical basis a reliable value of Service
value decreasing if the distribution is computed for the SAr.        availability.
These results show how the application of a SERA model
based on Montecarlo simulations can provide a reliable
estimate of the service availability, to be compared to the
results from operational life.
                         ACKNOWLEDGMENTS                                         [7]  Y. Bai, H.Zang, F.Yangzhen, Reliability modeling and analysis of cloud
                                                                                      service based on complex network. Prognostics and System Health
    The authors thank L. Tirone, S. Sorge and System                                  Management Conference- 2016
Engineering team from ASTER for the provided support and                         [8] S. M. Iyer, M. K. Nakayama, and A. V. Gerbessiotis, A markovian
the constructive comments. The authors are grateful for support                       dependability model with cascading failures,” IEEE Transactions on
to A. Di Bona from Telespazio for his support into define the                         Computers, vol. 58, pp. 1238–1249, September 2009.
logistic support aspect according the O&M organization                           [9] C.A.Ardagna, E. Damiani, R. Jhawar, V.Piuri A Model-Based Approach
                                                                                      to Reliability Certification of Services Digital Ecosystems Technologies
                                                                                      (DEST), 2012 6th IEEE International Conference on
                                                                                 [10] L.Jereb, Efficient Reliability Modeling and Analysis of
                                                                                      Telecommunication Networks (1998)
                                                                                 [11] Thirumaran, M., et al. "Finite State Machine Based Evaluation Model
                               REFERENCES                                             for Web Service Reliability Analysis." International Journal of Web &
                                                                                      Semantic Technology 2.4 (2011): 125.
[1]   G. Eason, B. Noble, and I.N. Sneddon, “On certain integrals of
      Lipschitz-Hankel type involving products of Bessel functions,” Phil.       [12] M. Srinivasa Rao, V. N Anaikan, Review of Simulation Approaches in
      Trans. Roy. Soc. London, vol. A247, pp. 529-551, April 1955.                    Reliability and Availability Modeling. International Journal of
      (references)                                                                    Performability Engineering, Vol. 12 No. 4, July 2016
[2]   V. Cortellessa, V. Grassi, Reliability modeling and analysis of service-   [13] A.Garro, A.Tundis. On the Reliability Analysis of Systems and Systems
      oriented architectures, in Test and Analysis of Web Services. -2007             of Systems: the RAMSAS method and related extensions, IEEE Systems
                                                                                      Journal, 9(1):232-241, 2015, ISSN. 1932-8184, IEEE Systems Council
[3]   M. del Mar Gallardo, J. M. P. Merino, G. Rodrıguez, Integration of
                                                                                 [14] ECCS-Q-ST-30-09C Availability analysis. 31 July 2008
      Reliability and Performance Analyses for Active Network Services.
      Electronic Notes in Theoretical Computer Science - 2005                    [15] David, R., & Alla, H. (2010). Discrete, continuous, and hybrid Petri
                                                                                      nets. Springer Science & Business Media.
[4]   M. Rout, P. Bhuyan, A Survey Report on Reliability Models and
      Frameworks in SOA, IJARCSSE-2014                                           [16] Goss, P. J., & Peccoud, J. Quantitative modeling of stochastic systems in
[5]   C. Xie, B. Li, X. Wang, A Web Service Reliability Model Based on                molecular biology by using stochastic Petri nets. Proceedings of the
                                                                                      National Academy of Sciences, 95(12), 6750-6755. (1998).
      Birth-Death Process. SEKE, 2011K. Elissa, “Title of paper if known,”
      unpublished.                                                               [17] Heiner, M., Herajy, M., Liu, F., Rohr, C., & Schwarick, M. (2012).
                                                                                      Snoopy–a unifying Petri net tool. Application and Theory of Petri Nets
[6]   Y. Dai , B. Yang , J. Dongarra , G. Zhang. Cloud Service Reliability:
      Modeling and Analysis -2010                                                     398-407.
                                                                                      .