<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Software Migration Project Cost Estimation using COCOMO II and Enterprise Architecture Modeling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexander Hjalmarsson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matus Korman</string-name>
          <email>matusk@ics.kth.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Lagerström</string-name>
          <email>robertl@ics.kth.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Royal Institute of Technology</institution>
          ,
          <addr-line>Osquldas v. 10, 10044 Stockholm</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Large amounts of software are running on what is considered to be legacy platforms. These systems are often business critical and cannot be phased out without a proper replacement. Migration of these legacy applications can be troublesome due to poor documentation and a changing workforce. Estimating the cost of such projects is nontrivial. Expert estimation is the most common method, but the method is heavily relying on the experience, knowledge, and intuition of the estimator. The use of a complementary estimation method can increase the accuracy of the assessment. This paper presents a metamodel that combines enterprise architecture modeling concepts with the COCOMO II estimation model. Our study proposes a method combining expert estimation with the metamodel-based approach to increase the estimation accuracy. The combination was tested with four project samples at a large Nordic manufacturing company, which resulted in a mean magnitude of relative error of 10%.</p>
      </abstract>
      <kwd-group>
        <kwd>Software migration estimation</kwd>
        <kwd>Enterprise architecture modeling</kwd>
        <kwd>Software engineering</kwd>
        <kwd>Expert estimations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        When having a software product portfolio spanning over hundreds of legacy systems,
maintenance becomes a problem. Expensive hardware as well as lack of experienced
developers in the environment drives the cost of maintenance each year. These legacy
systems are often crucial to the businesses and cannot be phased out without proper
replacement [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Even though new computing technologies have emerged on the market, a
considerable amount of software still runs on legacy systems. It is estimated that
around 200 billion lines of Cobol code are running in live operation and that 75% of
the world’s business data are processed in Cobol [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ]. With an estimated shortfall in
Cobol developers in the 2015-2020 timeframe, as the older generation leaves the
workforce, it is imminent that migration from the legacy mainframes becomes a
priority for many organizations [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. There are many difficulties involved in the
migration process. Understanding the design and functionality of the legacy systems
may be troublesome due to the fact that many of these systems have poor, if any,
documentation. Because of this, interaction from a system expert is often required [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
These experts need to analyze the old systems to create accurate requirement
specifications regarding technical functionality. This documentation is crucial for the
developers and architects involved in the migration process.
      </p>
      <p>
        Because of the importance of these systems the replacement often needs to suit
both new business objectives while maintaining functionality for legacy systems that
have not yet been migrated. These factors all come into play when estimating the cost
of a migration software project. A case study made by [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] showed that as much as
72% of 145 studied maintenance projects used expert opinion as method for
estimating software development costs. Another survey showed that out of 26 studied
industrial projects 81% were based on expert estimates [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. One of the problems with
expert estimates is that these can be strongly biased and misled by irrelevant
information, which can lead to over-optimism and inaccurate estimations. This often
cause project over-runs and may be avoided with an unbiased estimation model [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        There are claims that a combination of estimates from independent sources,
preferably applying different approaches, will on average improve the estimation
accuracy. Research has shown that a combination of model and expert estimates
produces up to 16% better than the best single decision [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        This paper proposes a metamodel based on the ArchiMate modeling language [
        <xref ref-type="bibr" rid="ref8">8,9</xref>
        ]
combined with the COnstructive COst MOdel II (COCOMO II) [10]. In our case
study we found that the estimation capabilities of the proposed metamodel together
with expert estimation is acceptable. Therefore, we suggest that the metamodel should
be used as a complement to expert estimations in order to provide more accurate
assessment of migration projects.
      </p>
      <p>The remainder of this paper is structured as follows: Section 2 describes
COCOMO II; Section 3 presents enterprise architecture modeling; Section 4 describes
the proposed estimation metamodel; Section 5 presents the case study; and Section 6
concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2 COCOMO II</title>
      <p>COCOMO, COnstructive COst MOdel, was in its first version released in the early
1980’s. It became one of the most frequently used and most appreciated software cost
estimation models of that time. Since then, development and modifications of
COCOMO has been performed several times to keep the model up to date with the
continuously evolving software development trends. The latest version of COCOMO,
called COCOMO II, had its estimation capabilities calibrated in the year 2000 with
the help of information from 161 project data points and eight experts [10].</p>
      <p>In the COCOMO II model, the final cost in person-months (PMs) is calculated as:
(1)
Where A is a calibration constant that depends on the organizations practices and the
type of software migrated. E is a constant used to scale projects depending on size. E
reflects the fact that cost and size are not perfectly linear. EMs are so called Effort
Multipliers.</p>
      <sec id="sec-2-1">
        <title>2.1 Scale Factors</title>
        <sec id="sec-2-1-1">
          <title>The constant E is derived using the following formula:</title>
          <p>(2)
Where SFs are five scale factors. These are precedentedness, development flexibility,
architecture/risk resolution, team cohesion, and process maturity. Boehm et al. [10]
selected these five factors that describe economies or diseconomies of scale in
software projects. This is based on the theory that depending on these variables, the
productivity in the project can increase or decrease as it gets larger.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 Effort Multipliers</title>
        <p>COCOMO II [10] contains seventeen so called Effort Multipliers (EM). These cost
drivers affect the software development project in either positive or negative way. The
EMs are divided into four categories: product factors, platform factors, project factors
and personnel factors. They each have a different set of factors within their respective
category. The product factors are; required software reliability (RELY), database size
(DATA), product complexity (CPLX), developed for reusability (RUSE), and
documentation match to life-cycle needs (DOCU). The platform factors are; execution
time constraint (TIME), main storage constraint (STOR), and platform volatility
(PVOL). The personnel factors are; analyst capability (ACAP), programmer
capability (PCAP), personnel continuity (PCON), applications experience (APEX),
platform experience (PLEX), and language and tool experience (LTEX). The project
factors are; use of software tools (TOOL), multisite development (SITE), and
requirement development schedule (SCED).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Enterprise Architecture Modeling</title>
      <p>Enterprise architecture analysis has emerged during the last decade as an approach to
assess different types of non-functional requirements in a company. Migration
projects are common projects in an enterprise today, thus including cost estimation for
these projects with enterprise architecture could appeal to architects. Research in the
area has proposed a framework of enterprise architecture analysis using ArchiMate
and a computational model “The Predictive, Probabilistic, Architecture Modeling
Framework” (P2AMF) [11]. P2AMF can enable calculation on entities in for instance
an ArchiMate model. This framework will be the basis of the metamodel used to
enable COCOMO II estimations.</p>
      <sec id="sec-3-1">
        <title>3.1 ArchiMate</title>
        <p>
          ArchiMate is a modeling language intentionally resembling the Unified Modeling
Language (UML) [
          <xref ref-type="bibr" rid="ref8">8,9</xref>
          ]. The reason of using ArchiMate as the basis of graphical
notation framework is due to its generality, making it possible to extend existing
metamodels with change project estimation as well as providing a solid ground for
future adaptions.
        </p>
        <p>
          The ArchiMate language consists of three core concepts, namely the active
structure, passive structure, and behavioral elements. The passive structure elements
are elements on which behavior is performed while the active structure is the entity
performing the behavior. These concepts are then specialized in each of the three
layers specified in ArchiMate [
          <xref ref-type="bibr" rid="ref8">8,9</xref>
          ]; the business layer that offers products and
services to external customers, the application layer that supports the business layer
with application services which are realized by software applications, and the
technology layer containing the infrastructure services needed to run applications,
realized by computers, communication hardware and system software. The classes
found in ArchiMate is for instance; business process, software application, and
infrastructure service.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 The Predictive, Probabilistic, Architecture Modeling Framework (P2AMF)</title>
        <p>The Predictive, Probabilistic Architecture Modeling Framework (P2AMF) is a generic
framework for system analysis [11] based on OCL and used in order to describe
expressions in the Unified Modeling Language (UML). P2AMF is fully implemented
in the Enterprise Architecture Analysis Tool (EAAT) [12,13]. The framework has
been utilized to calculate the formulas in the COCOMO II model accordingly.</p>
        <p>The end result of this would be that the algorithmic formula used in the model
would have a probability distribution indicating the probable cost range of the project
rather than a specific mean value. This, in combination with the ArchiMate language,
provides a strong basis for using the P2AMF for cost estimation. However, due to
space limitations we have not made use of the probability distributions in this paper.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 The Proposed Estimation Metamodel</title>
      <p>This section presents the metamodel for migration project cost estimation. The
metamodel is heavily influenced by COCOMO II [10] and the previously proposed
metamodel by [14] and [15]. The most relevant parts of COCOMO II are included in
the metamodel proposed while Lagerström’s previous work has served as an influence
and guideline for the metamodel construction and is thus left out of this description.</p>
      <p>
        ArchiMate is in general used to describe the layers in enterprises’ architectures and
to for example show what applications are used in what business processes.
ArchiMate is tailored for describing as-is and to-be scenarios [
        <xref ref-type="bibr" rid="ref8">8,9</xref>
        ]. In this paper we
present a specialization of ArchiMate that handles project specific factors. The project
specific metamodel elements are then combined with the regular ArchiMate
metamodel classes to calculate the migration cost estimate.
      </p>
      <p>The combined metamodel contains the seventeen effort multipliers as well as the
five scale factors in a combination. The metamodel differentiates between the three
ArchiMate layers as well as the new project specific metamodel classes (see Fig. 1):
the business layer (in red) contains the class “Personnel;” the application layer (in
green) contains the classes “ApplicationComponent,” “ApplicationFunction,” and
“ApplicationService;” the infrastructure layer (in yellow) contains the class
“InfrastructureService;” and the project entities (in blue) are
“SoftwareDevelopmentProcess,” “SoftwareDevelopmentProject,” “Activity,”
“Change,” and “EffortDivisor.”
Our study was conducted at a large Nordic manufacturing company. The data points
used in order to validate and calibrate the metamodel are projected as having been
closed during the last six months and satisfy the constraint of having &gt; 2000 SLOC
produced in the project. The data was collected through interviews with managers,
developers, and architects in the projects. Project reports were also used to validate
the information elicited and as a source of the project costs (effort in
personhours/man-months). In total we looked at four different migration projects. Due to
space limitation we provide some more details regarding Project B (below) before
presenting the analysis and results. The complete study can be found described in
[16].</p>
      <sec id="sec-4-1">
        <title>5.1 Project B</title>
        <p>This project was initiated for the purpose of replacing an old application with a new
one running on the company’s standardized platform with included support and
development agreements. The old application was based on old technology and could
not run on modern PC’s such as the ones based on the x64 architecture. The software
is used to determine variables of the propeller shaft used in vehicles produced by the
company. It is only used by the experts in the area and the old application did only
run on one PC. Overall, the project was deemed successful. Deviations in the project
schedule occurred due to the complexity in the algorithms that were implemented.
The project utilized a software development method working iteratively in sprints
with demonstrations to customers after each of the sprints. The project had an 18%
overrun of the estimated budget due to new requirements added to the migrated
version of the software, which increased the scope of the project. The size of Project
B was straight forward as it only consisted of migrating one application. The project
resulted in 5,500 SLOC developed with the .NET platform. Table 1 presents the data
for Project B.</p>
        <p>LOW
HIGH
HIGH</p>
        <p>RELY
DATA
CPLX
RUSE
DOCU
TIME
STOR
PVOL
ACAP</p>
        <p>LOW
VERY HIGH
VERY HIGH</p>
        <p>NOMINAL
NOMINAL</p>
        <p>NOMINAL
Actual
PCON
APEX
PLEX
LTEX
TOOL
SITE
SCED</p>
        <p>HIGH
HIGH
HIGH</p>
        <p>HIGH
VERY HIGH</p>
        <p>NOMINAL</p>
      </sec>
      <sec id="sec-4-2">
        <title>5.2 Validation Method</title>
        <p>The validation consists of measuring the accuracy of the model. The accuracy is
measured by using the Mean Magnitude of the Relative Error (MMRE) and the
Magnitude of the Relevant Error (MRE) [17].</p>
        <sec id="sec-4-2-1">
          <title>Where E is the actual result and is the estimate.</title>
          <p>A model has an acceptable accuracy level if 75% of the projects’ estimations are
higher or equal to 75% [17]. This is called the prediction quality (PRED) and has
been used frequently when comparing models and methods within the area of
software estimation [14,18]. The prediction quality formula (formula 5) where n is the
complete set of projects and k is the amount of projects that have greater or equal
accuracy as q.</p>
          <p>An acceptable accuracy level for a model can be denoted PRED(0.25) = 0.75,
meaning that 75% of the projects shall be within 25% of the actual result.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>5.3 Accuracy</title>
        <p>Even before calibration the model conforms rather well to the data gathered. The two
largest projects, Project A and C are within the predictive quality margin of 25%
(16% and 4%). Project B is not estimated accurately and has a MRE of 44%. The
model underestimates the effort needed for the project which partly may be because
of the additional effort needed due to the problems found in the old application that
was migrated.
(3)
(4)
(5)</p>
        <p>Compared to the expert estimates the model produces competitive estimates. In the
table the mean relevant error has been computed with four different measures. These
are the model and expert estimates as well as two combinations of them. The two
combinations are the result of the optimal combination between model and expert
estimates for the specific purpose. Optimal predictive quality (Opt. pred) ensures that
all projects are within 25% of the real effort outcome. The optimal mean relevant
error (Opt. MRE) uses the combination that gives the lowest average MRE for the
projects.</p>
        <p>
          Opt. pred is using 24% model and 76% expert. Opt. MRE is using 59% model and
41% expert. Table 2 shows that heading for the optimal predictive quality in the
model would lower the mean magnitude of relevant error, while the optimal MRE
achieves a very good mean magnitude of relevant error. From the result it also can be
seen that by combining the expert judgments with the model both increases the
predictive quality as well as the MMRE. This is in line with previous research [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>5.4 Calibration</title>
        <p>Calibrating COCOMO II with organizational specific data typically results in better
estimates [10]. One way of calibrating COCOMO II to existing project data is by
using the multiplicative constant A (see [10,16] for the exact calibration equations).
The local calibration usually improves the prediction accuracy due to the use of
subjective factors in the model. Further, the lifecycle activities in the projects covered
by COCOMO II may differ from the ones in the particular organization [10].</p>
        <p>The calibration resulted in an increased value of the multiplicative constant A used
in the effort estimation from 2.94 to 3.23. As can be seen in</p>
        <p>Table 3, the calibration yields a lower MMRE for the model estimation. This is
because the calibration is minimizing the sum of squared residuals in log space rather
than the MRE. Opt. pred was achieved using 31% model and 69% expert, while Opt.
MRE was achieved by using 46% model and 54% expert.
The results of the case study validates that the combination of COCOMO II with the
ArchiMate modeling language works as predicted and that the model estimates are on
par with the managers at the case study company. The combination between model
and expert estimates performs far better than single selections of model or expert
estimations. Without calibration, optimal MMRE strategy achieved a MMRE of 12%
with PRED(.25) = 75%. When adding the constraint of PRED(.25) = 100%, the
MMRE rose to 18% which was slightly better than the expert estimates (22%) and on
par with the model (18%).</p>
        <p>One question that might arise is: Why combining EA and COCOMO II and not
only use COCOMO II? As we see it, there is a strength of using EA models as input
together with project specific data. ArchiMate as-is and to-be models that already
contain information can easily be re-used for every software migration project and the
project specific information is the only part that needs to be up-dated. Also, many
companies today struggle with maintaining their EA models since new projects alter
the as-is architecture continuously. With this approach one could align the as-is and
to-be models with all the on-going projects and automatically update the models once
the projects are finished. Also, for architects it provides an instrument to work with
when creating to-be models and assessing if future scenarios are appropriate for
change projects.</p>
        <p>In this paper we have presented a metamodel for software migration project
estimation. The metamodel was constructed based on metrics from COCOMO II,
modeling elements from ArchiMate, and an analysis engine of P2AMF. The
metamodel was tested in four cases at a large Nordic manufacturing firm. Our results
show that the metamodel itself performs rather well but as COCOMO II suggests it
performs even better when calibrated with data from the company under analysis. In
software cost estimation research it has been shown that model estimates and expert
estimates complement each other in a good way and that the combination often
outperforms the two approaches. This was also the case in our study. Therefore, we
conclude that our proposed metamodel is useful, especially after company specific
calibration and in combination with expert estimates.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bennet</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Legacy Systems - Coping with Stress</article-title>
          .
          <source>IEEE Software 12(1)</source>
          ,
          <fpage>19</fpage>
          --
          <lpage>23</lpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Datamonitor: COBOL - Continuing to Drive
          <source>Value in the 21st Century. London: Datamonitor</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Barnett</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>The Future of the Mainframe</article-title>
          . London: Ovum (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bisbal</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <article-title>A Survery of Research into Legacy System Migration</article-title>
          . Dublin: Trinity College Dublin (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kitchenham</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pfleeger</surname>
            ,
            <given-names>S. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McColl</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eagan</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>An Empirical Study of Maintenance and Development Estimation Accuracy</article-title>
          .
          <source>Journal of Systems and Software</source>
          <volume>64</volume>
          (
          <issue>1</issue>
          ),
          <fpage>57</fpage>
          --
          <lpage>77</lpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Molokken</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jørgensen</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A Review of Software Surveys on Software Effort Estimation</article-title>
          .
          <source>Empirical Software Engineering</source>
          ,
          <fpage>223</fpage>
          -
          <lpage>230</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Blattberg</surname>
            ,
            <given-names>R. C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoch</surname>
            ,
            <given-names>S. J.</given-names>
          </string-name>
          :
          <source>Database Models and Managerial Intuition: 50% model + 50% manager. Management Science</source>
          <volume>36</volume>
          (
          <issue>8</issue>
          ),
          <fpage>887</fpage>
          -
          <lpage>899</lpage>
          (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>8. The Open Group: ArchiMate 1.0 Specification. http://pubs.opengroup.org/architecture/archimate-doc/ts_archimate/</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>