Integrating System Modeling and Cost Models Using
                 Meta-Modeling Techniques

            Viktor Steiner                                       Gergely Mezei

        Evopro Innovation Ltd.                                   BUTE DAAI
          Budapest, Hungary                                    Budapest, Hungary
    steiner.viktor@evopro.hu                                 gmezei@aut.bme.hu


       Abstract. The precise estimation of time and resource consumption plays a piv-
       otal role in planning software development projects at their earliest development
       phase. Since cost parameters are mostly determined by the architecture, a possi-
       ble approach is to design a platform independent architectural model of the pro-
       spective software and estimate the cost based on it.
          In this paper, we introduce a method, which produces a cost estimate by pro-
       cessing the architectural model of the software being designed. The provided
       method analyzes the architectural models, and utilizes a modified version of
       Function Point Analysis to determine the probable cost based on the analysis.
       The paper also presents a preliminary verification process to evaluate the accu-
       racy of the cost estimation method. The main achievement of the introduced
       method is that it estimates cost in platform independent units, which can be re-
       fined to give accurate cost estimation for different platform implementations.

       Keywords: Meta-modeling, Cost Estimation, Function Point Analysis


1      Introduction

Cost estimation is a major challenge in software industry. It is hard to find features that
can precisely predict the expected cost of the complete development process. Since
architecture has the greatest impact on development costs, architectural models can
provide a basis for the estimation. In this paper, we provide a method, which analyzes
the architectural models created in the design phase and estimates the expected cost of
the software to build.
   Our solution is unique among cost modeling methods, since it applies Multi-Para-
digm Modeling techniques to achieve its goal. Compared to other existing cost estima-
tion techniques, our approach does not require creating a separate cost model manually.
Instead, we analyze architectural models and generate the cost model from them. Our
method maps model elements of the software architecture domain to the concepts of
the cost modeling domain. Since we use a platform independent architectural and cost
modeling domain, our results can be applied early in the development process, before
deciding which technology to use for the implementation. This also means that our
method is also useful for facilitating the decision between possible implementation


Proceedings of MPM 2014                                                                    1
Integrating System Modeling and Cost Models Using Meta-Modeling Techniques


technologies, because the cost estimation result can be refined into estimations on dif-
ferent platforms and compare the cost predictions.
   The paper is organized as follows: In Section 2, we give a short summary of the state
of the art in the field of cost estimation approaches. Section 3 introduces the Visual
Modeling and Transformation System [1], which we used to implement the cost esti-
mation method. In Sections 4 and 5, our cost estimation method is presented. The
method is a modified version of Function Point Analysis [2], adapted to SysML [3]
models. Section 6 introduces a verification process, which is used to test the accuracy
of the results provided by the cost estimation method. Finally, Section 7 concludes the
outcome of our work and gives main directions of future work.


2       Related Work

Existing cost estimation methods typically do not use existing resources such as re-
quirements, specification, or architectural models for calculating the probable cost, they
use their own cost model, which must be prepared separately. This is problematic for
various reasons, e.g. (i) It takes extra time and effort to estimate probable development
cost. (ii) Cost estimation becomes a mostly manual task, since the cost model does not
rely explicitly on existing resources. Manual steps increase the probability of errors in
the estimation. (iii) A cost modeling expert is always needed, who prepares and ana-
lyzes the cost model. These issues arise in most of the existing cost estimation methods,
for example, in COCOMO II [4].
   However, there are some methods, which use resources from the development pro-
cess. An estimation technique that measures development effort based on use cases [5]
and another that uses requirements as a basis [5] can be mentioned here. Although these
methods use available development resources, they still need too many manual steps to
produce the estimation. This is because both requirements and use cases are high level
concepts, which can hardly be formalized, which could enable programmatic analysis.
On the other hand, use cases and requirements are available very early in the develop-
ment process, therefore the estimation can be performed earlier compared to our
method. However, performing the analysis on architectural models can be much more
accurate, since more formalized data is available. Since these two approaches can both
be performed during the development process, they can complement each other, by
giving a vague initial estimation, and then later calculate a refined, more accurate esti-
mation.
   Although we had not found methods that estimate development cost from architec-
tural models, there are some methods that are based on similar concepts, of which [6]
is the most closely related. The method uses architectural models to predict perfor-
mance, and to facilitate architectural design decisions. The latter is among our goals as
well, as our method can be used to compare and evaluate different architectural versions
based on their estimated costs.


Proceedings of MPM 2014                                                                 2
Integrating System Modeling and Cost Models Using Meta-Modeling Techniques


3       The modeling environment

A cost estimating algorithm requires a modeling environment, in which architectural
system models can be created, models can be processed and analyzed programmati-
cally. We had chosen SysML [3] to model the architecture. The main reason behind
this decision was that SysML is a widely used and accepted general purpose systems
modeling language, and we used it in several projects previously in Evopro Innovation
Ltd. However, our method is only partially specific to SysML, it can be applied with
other architecture modeling languages as well.
The selected modeling tool, in which the SysML language environment was created, is
the Visual Modeling and Transformation System [1]. In VMTS, any modeling language
can be defined by creating its metamodel. The framework offers a highly customizable
workbench to edit the models visually and the models can be processed programmati-
cally using the VMTS Domain Specific Language API.
Our SysML dialect was defined by a meta-model based on the OMG SysML and the
related parts of the UML specification. Creating the whole meta-model of the UML and
SysML languages was not our goal, we intended to calculate our cost estimation in the
architectural design, thus, we focused on those parts of the UML/SysML meta-models
that describe the architecture. As described later, in sections 4 and 5, we identified the
following aspects of SysML as required: (i) the Block Definition Diagram, (ii) the In-
ternal Block Diagram, (iii) the Requirement Diagram, (iv) the Use Case Diagram and
(v) the Sequence Diagram. As the first step of our work, we created these languages
and customized their visual appearance and behavior according to the SysML standard.


4       Cost estimation

In the past decades, different methods were developed for software cost estimation. Our
goal was to find the best suitable method among these for our purposes. The selected
method had to be: (i) current, (ii) used in software industry (to ensure that it predicts
the development cost correctly) and (iii) publicly documented to avoid copyright issues.
Moreover, we have decided to focus on solutions capable of estimating the size of sys-
tems created with object oriented principles. Finally, we have chosen the method de-
scribed in [7]. [7] describes a collection of methods, each usable in different phases of
software development projects. We only needed the ones that deal with calculating the
size of the software, since the size has the greatest impact on development costs, and it
can be measured in the architectural design phase. In our method, the cost is estimated
based on the software size, which is typically measured in two ways: (i) Source Line of
Code (SLOC) and (ii) Function Points. In case of SLOC, the number of source lines
required to implement the software in a particular language is measured. In contrast,
function point measuring methods quantify the functionality of software in an abstract,
platform independent unit. We selected the later one, since it is platform and technology
independent. Moreover, function points, despite they are abstract measurement units,
can be converted to an estimated number of source lines, based on past development
experiences.


Proceedings of MPM 2014                                                                 3
Integrating System Modeling and Cost Models Using Meta-Modeling Techniques


4.1     Function Point Analysis
Function point measuring methods are collectively referred to as Function Point Anal-
ysis (FPA) methods. FPA has no official standard, several different implementations
exist. In our solution, we used the approach described in [7], and a more detailed version
of the same method in [2]. As we mentioned before, FPA measures the size of software
based on its functionality. In FPA’s interpretation, functions of a software are always
transactions, which are executed on some kind of data set. Therefore, function point
count is determined by logically related data sets, and by the transactions associated to
them. The basic terms of FPA can be seen on Fig. 1.


                 Fig. 1. Overview of the basic terms of Function Point Analysis

We modified the original Function Point Analysis to adapt it to SysML architectural
models. Firstly, we examined, which of the basic FPA terms are necessary in order to
implement the method properly:

 Application Boundary: It specifies the communication interface between the system
  and the outside world.
 Internal Logical File (ILF): Logically related set of data, maintained by the appli-
  cation.
 Transaction: An elementary process, which obtains data through the application
  boundary. There are three different kinds of transactions we distinguish:
  ─ External Input (EI): A transaction obtaining data from the environment
  ─ External Output (EO): A transaction submitting data to the outside environment.
  ─ External Inquiry (EQ): A transaction, which gets data from inside of the appli-
     cation, through the application boundary. The data is queried according to query
     parameters. The requested data cannot be derived (calculated) data.

We discovered that the above terms are necessary to implement the method, and can be
matched to SysML concepts, as described in section 4. However, the remaining terms
from Fig. 1. are not needed for the implementation. External Interface File (EIF) is an
ILF, maintained by another application. It is not part of our model, because it is not an
Object Oriented Programming concept and our focus was on estimating the cost based
on OOP software models. Transformation & Transition: A transformation is a sequence


Proceedings of MPM 2014                                                                 4
Integrating System Modeling and Cost Models Using Meta-Modeling Techniques


of mathematical calculations transforming the input data into the required form. A tran-
sition is an event, which changes the state of the application. These concepts can clarify
the results of the estimation, but they are not elaborated in the architectural design
phase.


5       Mapping Function Point Analysis to SysML

In this section, we present how we managed to map the basic FPA terms into SysML
concepts, and calculate function points by analyzing SysML models.


5.1     Application boundary

    The first step of FPA is to define the application boundary. Here we have to analyze
the data used by the application, and determine whether it is maintained by the appli-
cation. If it is, then the data resides inside the application boundary, otherwise it belongs
to the outside environment. When we design the architecture of software in SysML, the
first step is the definition of the application boundary, however, it is not displayed ex-
plicitly as a model item: The first step of software modeling is usually the definition of
functionality, in the form of Use Case diagrams. In Use Case diagrams, actors belong
to the environment, and the highest level use cases, which they are associated with, are
matched to FPA transactions that obtain data through the application boundary.


5.2     Data types

The second step of FPA is to identify and rate the data sets maintained by the applica-
tion. These data sets always appear as an ILF. An ILF is a user identifiable group of
logically related data that resides entirely within the application boundary, and is main-
tained by External Inputs. An ILF has an inherent meaning, it is internally maintained,
it has some logical structure and it is stored in a file, as defined in section 9 of [2].
According to this definition, ILFs are almost identical to persistent entities of an OOP
application, whose structure and connections can be modeled on an Entity Relationship
diagram, or in our case, on a SysML Block Definition diagram. However, this data
model must be programmatically distinguishable from the other system elements that
are also modeled on Block Definition diagrams. This can be achieved by performing a
small modification on the original SysML meta-model and adding an attribute – a flag
– to the Package meta element. FPA analyzer can decide whether to search for the data
model elements in those Packages, or not.
   After identifying ILFs, the next step is to evaluate their complexity and rate them.
Complexity analysis is based on two concepts: (i) A Record Element Type (RET) is a
user recognizable sub group of data elements within an ILF. (ii) A Data Element Type
(DET) is a unique, user recognizable, non-recursive (non-repetitive) and dynamic field
in a RET. Additionally, a DET can invoke transactions or can act as additional infor-
mation regarding transactions. During the evaluation process, Record Element Types
and Data Element Types within an ILF are counted.


Proceedings of MPM 2014                                                                    5
Integrating System Modeling and Cost Models Using Meta-Modeling Techniques


   By definition, an ILF itself is also a user recognizable group of data elements, there-
fore it always consists of at least one RET. ILFs consist of more than one RET, if they
contain multiple logically related sub groups of data, which have no meaning on their
own and can only be interpreted inside the ILF. The RET concept can be illustrated
using the following two cases:

 There are two logically related sets of data (A and B), which have a common subset,
  a key field, by which they are connected to each other. Both A and B can be inter-
  preted on their own, thus they are both considered a separate ILF.
 There are two sets of data (A and B), where B is a subset of A. In this case, B cannot
  be interpreted on its own. Therefore, it cannot be an ILF, it can only be considered a
  RET inside A. An example for this case is a music CD that contains songs. Both the
  CD and the songs have attributes, but the songs cannot be interpreted on their own,
  without the data contained by the CD.

   When interpreting the RET concept on SysML models, we assumed that the data
model of the designed application is available in the form of SysML Block Definition
diagrams. The persistent entities of the data model can be interpreted as ILFs or RETs,
as it was mentioned above. According to the definition, a data set can only be consid-
ered a RET if it is a real subset of an ILF. This means that an entity in the data model
can only be considered a RET if it has only one parent, and its children are RETs. If the
above condition is satisfied, an entity is considered a RET, if not, it is considered an
ILF. Note the parent-child relation here means the usual one-to-many relation used in
Entity-Relationship diagrams.
   The Data Element Type concept can be easily mapped to SysML models. According
to the definition, a DET is very similar to a data field of a persistent entity. Since we
have already identified ILFs and RETs, the only remaining task is to count the data
fields for each identified RET, and the evaluation of the data model is complete. The
actual weight of an ILF can be read out from the corresponding cell of the table defined
in Section 9 of [2].


5.3     Transactions

The third step of Function Point Analysis consists of the identification and evaluation
of transactions. Our method is based on Use Case diagrams of the system at this step.
As it was pointed out previously, top level use cases – which are directly connected to
actors – represent transactions, thus their automatic identification is easy. On the other
hand, programmatic determination of the transaction kind (External Input, External
Output or External Inquiry) is not possible. This is mainly because if we want to distin-
guish between the types, we need to

 Determine the main direction of the data flow. This would distinguish input trans-
  actions (EI) from output transactions (EO and EQ). Here we use the term “main
  direction” instead of simply using “direction” because according to FPA definition,
  all three transaction types can send data in both directions. For example, an input
  transaction can send back a status code, which indicates whether the transaction is


Proceedings of MPM 2014                                                                 6
Integrating System Modeling and Cost Models Using Meta-Modeling Techniques


  successful or not. In SysML, however, we can only show the direction(s) in which
  data flows, there is no such concept as main direction.
 Check whether the output data is derived or not. This would distinguish between
  output transaction types (EO and EQ). Based on FPA definition, derived data is the
  result of some kind of calculation. In case of SysML models, the only indicator of
  data derivation is that data type changes during the execution of a transaction. This,
  however, is not an accurate conclusion, because it is possible that no calculation is
  performed, the data is just transformed into another form. For example, an array of
  items is transformed into a linked list of the same items. In this case, type of the data
  is changed, but the actual information remains the same.

Consequently, in case of SysML models, we cannot distinguish FPA transaction types
from each other. Because of this, we decided to give up the idea of fully automated cost
analysis, and add some additional information to the SysML meta-model, which helps
identifying transaction types. As mentioned before transactions are mapped to top level
Use Cases in SysML models. Therefore, we decided that the most optimal solution is
to add an attribute to the Use Case meta-element, which marks the transaction type.
After setting this attribute, the evaluation of transactions can be performed automati-
cally. The process consists of two steps: counting of (i) data element types and (ii)
referenced file types.
   The first step is to calculate the data element types. Parameter and return values get
into and out of the application through the application boundary. In case of OO appli-
cations, the boundary is usually an interface, whose operations start the execution of
transactions. In SysML, this concept can be modeled as Use Case and Sequence Dia-
grams, where there is a Sequence Diagram associated to each use case shown on the
Use Case Diagrams. The Sequence Diagrams show the order of operations that imple-
ment the particular use case. Here we only analyze the first operation of a sequence,
which is the interaction point between the application and the environment. The param-
eter and return types of that operation are used to calculate the complexity of the trans-
action being analyzed, therefore, these are the types that have to be counted. Note that
there are transactions that cannot be analyzed this way. For example, take a transaction,
which queries the database for some data, and displays the results on the GUI. The
operation that triggers the transaction does not give back a return value, it just updates
some part of the GUI, but it is clear that data gets through the application boundary. We
solved this problem by creating a new descendant of the Operation element in the
SysML meta-model, called GUIOperation. On this kind of operation, the modeler can
set the properties that it updates, thus, the complexity of the function can be fine-tuned.
   The second step is to count the file type references that are (by definition) unique
ILFs that a transaction references during its execution. To count them, the prerequisites
are the same as they were in the previous step. Namely, each top level use case should
have a corresponding Sequence Diagram, which shows the order of operations imple-
menting the particular use case. When the necessary diagrams are ready, processing
them to find referenced file types is an easy task. We only need to analyze the opera-
tions of a transaction, and count their parameter and return types. Each type is counted
only once, because file type references are unique by definition.


Proceedings of MPM 2014                                                                  7
Integrating System Modeling and Cost Models Using Meta-Modeling Techniques


   By now we identified the transactions, and determined their types. We have counted
the referenced file types and data element types, thus the evaluation process of transac-
tions is complete. The actual weights of the transactions can be read out from the cor-
responding cells of the tables in Sections 5, 6 and 7 of [2].


5.4     Evaluation
   The last step of FPA is to calculate the Function Point count, by using the following
formula [2]:
                                                     𝑝
𝐹𝑃 = ∑𝑚           𝑛          𝑜
      𝑖=1 𝐼𝐿𝐹𝑖 + ∑𝑖=1 𝐸𝐼𝑖 + ∑𝑖=1 𝐸𝑂𝑖 + ∑𝑖=1 𝐸𝑄𝑖 ,                                     (1)

where ILFi, EIi, EOi and EQi are the weights of the files and transactions that were
calculated according to the methods defined in earlier sections. The resulting value is a
platform independent quantity that not only measures software functionality, but it can
be converted to platform specific source code estimations as well. For the conversion,
a table is used, which is maintained and updated by Quantitative Software Management
Inc. [8] The table contains factors to convert Function Points into SLOC estimation on
different programming languages. The conversion factors are based on historical data
from completed software projects (currently 2192 different projects). The converted
SLOC values provide a basis for comparing source code estimations on different plat-
forms, and selecting the most suitable platform.
   Note that the original FPA method uses a Value Adjustment Factor to fine tune the
Function Point count, based on non-functional requirements. This adjustment can be
done with our result as well, as described in Sections 11 and 13 of [2].


6       Verification

In order to check that our method produces correct results, a verification method was
implemented. Our original plan was to apply the method from the beginning of a new
project and validate its results after completing the project. We realized that this would
require months to apply and we had difficulties in convincing the project management.
We had strict time constraints in the current projects and the project management also
wanted to have a preliminary validation of the results before introducing the proposed
method in real development. We have decided to use a simpler but not as precise
method: we generated SysML models from the source code of projects already com-
pleted. We used the source code to generate an architectural model from a complete
software and run the cost estimation method on it, thereby verifying its accuracy. We
were aware that the accuracy in this case depends on how precisely the generated model
complies with the source code, and how much the generated models differ from the
architectural models created in the design phase. We made assumptions (e.g. the archi-
tecture does not change in a large extent during the development) keeping in mind that
the verification method is only preliminary and it is necessary to prove the correctness
of our method, which can be fine-tuned later, once it is used in production scenarios.


Proceedings of MPM 2014                                                                 8
Integrating System Modeling and Cost Models Using Meta-Modeling Techniques


   As the subject project, we used an mCPS system developed by Evopro Innovation
Ltd. [9] The system consists of the following components: (i) database, (ii) server, (iii)
a thick administrator client and a (iv) mobile client. The first three components were
using Microsoft technologies (Microsoft Azure, .NET WCF and WPF), while mobile
client applications were implemented on every significant mobile platform (Android,
iOS, Windows Phone, Windows 8). We parsed the source code mainly by the open
source tool, NRefactory [10] which produced an AST. From the AST, we generated the
architectural model by using the DSL API of VMTS. Note that the method used here is
not specific to this case study, it can be applied on any projects written mainly in C#.
   We created two test cases, i.e. a combination of components that build up the test
system, on which the verification process is performed. We have defined the following
two test cases: (i) Database + server: here, the application boundary is an API, since the
database and the server does not have any graphical user interface. (ii) Database +
server + admin client: in this case, the application boundary is the user interface of the
admin client. After performing the verification process, we compared the real and the
estimated source lines of code (SLOC). We calculated estimated SLOC values from
function points based on the previously mentioned table [8]. However, there is no way
to convert a fraction of a FP value to an SLOC estimation in language 1 and the remain-
der to language 2. This affects the accuracy of the estimation, because the tested appli-
cation is not a pure C# application in either test case, there are some T-SQL and XAML
language parts in the components as well. The estimated SLOC values, however, are
probabilistic values (most likely, minimum and maximum values), which compensates
the above inaccuracy of the conversion factors.
   In test case 1 (Database + server), we identified 421 Function Points, which is trans-
lated to 12209 – 29470 (most likely 22734) lines of C# code. The real application con-
sisted 19696 lines of C# and 3055 lines of T-SQL code (22751 in all). In test case 2
(Database + server + admin client), the result of the estimation was 699 Function Points,
which is converted to 20271 – 48930 (most likely 37746) lines of C# code. The actual
SLOC values were 38365 lines of C#, 3055 lines of T-SQL and 5056 lines of XAML
code (46476 in all).
   As it is shown above, the first estimation is almost exactly the same as the most
likely estimation value. The difference is less than 0,1%! The second estimation is not
that accurate, but the estimation is between the limits. After analyzing the verification
results, we discovered that the cause of the relative inaccuracy in the second test case
was the amount of XAML code, since a high percentage of the code was duplicated.
This indicates that the verification algorithm should be enhanced with the capability of
detecting code duplication. Apart from this, the results were convincing, the project
management decided that the method is ready to be tested in production environment.


7       Conclusion

We designed and implemented a method that analyzes architectural models, and pro-
vides a cost estimation based on it. We did so because we discovered that nowadays,
there is a great need for a cost estimation method that produces results automatically


Proceedings of MPM 2014                                                                 9
Integrating System Modeling and Cost Models Using Meta-Modeling Techniques


based on already available resources. Our technique is implemented using VMTS [1],
a meta-modeling tool capable of creating and processing architectural models in the
SysML language [3]. The advantage of the solution is that it does not need separate cost
models, information is extracted from the architectural models automatically. Note that
although we rely on an extended version of SysML, the extensions are used to create a
more precise architecture model and they are not only used by cost estimation.
    By automatizing the cost estimation, there is no need for extra time and effort allo-
cated to the cost estimation. Another benefit is that the method uses a modified version
of Function Point Analysis [2], which produces a platform independent result. In this
way, the estimation process can be performed before even deciding, on which platform
the software should be implemented. The result of the estimation can also be refined
into source code estimations on different platforms. In this way, possible implementa-
tions using different technologies can be compared and the best can be selected. Besides
presenting the method, we elaborated a basic, preliminary verification method and dis-
cussed the results. Although the verification was not based on real production process,
its promising results show that the method is worth to examine further. In the future,
we plan to test the method on several production scenarios and use it in real develop-
ment environments and add support for estimating the cost of mixed platform projects.


8       Acknowledgement

This work was partially supported by the European Union and the European Social
Fund through project FuturICT.hu (grant no.: TAMOP-4.2.2.C-11/1/KONV-2012-
0013) organized by VIKING Zrt. Balatonfüred.


9       References
 1. Visual        Modeling          and         Transformation        System          (VMTS):
    https://www.aut.bme.hu/en/Pages/Research/VMTS/Introduction
 2. David Longstreet: Function Point Training and Analysis Manual:
    http://www.softwaremetrics.com/Function%20Point%20Training%20Booklet%20New.pdf
 3. OMG SysML 1.3: http://www.omg.org/spec/SysML/1.3/
 4. Constructive     Cost    Model      II    (COCOMO        II):   http://csse.usc.edu/csse/re-
    search/COCOMOII/cocomo_main.html
 5. Arlene F. Minkiewicz, Estimating Software from Use Cases & Estimating Software from
    Requirements: http://legacy.pricesystems.com/research/white_papers.asp
 6. Steffen Becker, Heiko Koziolek, Ralf Reussner, The Palladio component model for model-
    driven performance prediction, Journal of Systems and Software, v.82 p.3-22, January, 2009
 7. STSC, Software Development Cost Estimating Guidebook: http://www.stsc.hill.af.mil/con-
    sulting/sw_estimation/softwareguidebook2010.pdf
 8. Quantitative Software Management Inc., Function Point Languages Table:
    http://www.qsm.com/resources/function-point-languages-table
 9. mCPS - End to End Mobile Publication: http://www.evoprogroup.com/page/mcps
10. Daniel Grunwald, Using NRefactory for Analyzing C# Code: http://www.codepro-
    ject.com/Articles/408663/Using-NRefactory-for-analyzing-Csharp-code


Proceedings of MPM 2014                                                                      10