<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Analysis of Students' Teams Scores at the 2022 Software Estimation Challenge</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Donatien Koulla Moulla</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jean-Marc Desharnais</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alain Abran</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>École de Technologie Supérieure</institution>
          ,
          <addr-line>1100, rue Notre-Dame Ouest, Montréal, Québec, H3C 1K3</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Maroua</institution>
          ,
          <addr-line>Maroua, P.O. Box 46</addr-line>
          ,
          <country country="CM">Cameroun</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of South Africa</institution>
          ,
          <addr-line>The Science Campus, Florida, 1710</addr-line>
          ,
          <country country="ZA">South Africa</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper presents an analysis of the 2022 edition of the 'Software Estimation Challenge' organized by the COSMIC Group. The challenge is based on best practices in software effort estimation, including the use of the COSMIC - ISO 19761 standard for sizing software requirements and the early sizing of software functional and non-functional requirements allocated to software functions. The three major components of this challenge consist of sizing the software requirements of a case study, developing an estimation model, and using it to estimate the development effort for the case study provided. While a previous study was based on a survey of a sub-group of 22 teams who participated in the 2022 edition of this challenge, the study reported here is based on the analysis of students' team scores across teams and contexts of participation. To help the teams' tutors and students plan and prepare for future challenges, this study presents an analysis of how teams performed across each of the challenge tasks. In summary, the teams performed best in the tasks limited to the application of preprogrammed statistical formula, and much worse in tasks requiring analytic skills for the sizing of the requirements or in the application of their reasonably well-built estimation model to the practical case study they had sized (that is, moving from theory to practice).</p>
      </abstract>
      <kwd-group>
        <kwd>1 Software competition</kwd>
        <kwd>software estimation</kwd>
        <kwd>COSMIC</kwd>
        <kwd>ISO gamification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Software project estimation is important for allocating resources and planning reasonable work
schedules. Although there is a large body of literature on software estimation, estimating software
projects remains a challenging activity, and software managers still struggle to deliver their projects
according to the estimated budget, deadlines, and expected functionalities.</p>
      <p>Identifying the best practices in software estimation and teaching them to students who are often
more interested in technical tasks than management tasks is challenging. There are also difficulties in
teaching software estimation when it comes to moving from theory to practice [1], and there is typically
a lack of case studies on learning purposes and acquiring estimation skills [2].</p>
      <p>To enable students to make this link, some approaches have been proposed, such as in [3], with data
for educational purposes, covering various estimation topics with related questions at the end of each
chapter. A more proactive approach is Humphrey's Personal Software process (PSP) and Team
Software Process (TSP) to assist engineers in developing their own individual software engineering
skills, including estimation by coaching and guidance [4]. Other studies have proposed gamification as
a better alternative for teaching software engineering estimation [5, 6, 7]. This approach creates a
competitive and collaborative environment for learners to improve their interest, knowledge, and skills
in the application of software estimation best practices. For instance, the SIMSE (SIMulation of
Software Environment) software game developed by Navarro [8] and the ‘Easy Estimation with Story
Points’ game developed by Agile Learning Labs [9]. Similarly, Marin et al. in 'An adventure Serious
Game for Teaching Effort Estimation in Software Engineering' [10] designed and developed the “Back
to Penelope” game based on the COSMIC sizing method to make effort estimation more attractive for
students: this game consists of, among other tasks, sizing software functions from the class diagrams.
Appendix A provides an overview of the COSMIC sizing method.</p>
      <p>The Ouhbi et al. survey [11] of software engineering faculty and experts also reported that students’
engagement is a great concern, and to address the issue, new teaching methodologies are needed,
including gamification and problem-based learning. However, the implementation of a game-based
learning approach is challenging for instructors [12].</p>
      <p>To increase students’ awareness of the importance of software estimation and the necessity of
acquiring the estimation knowledge and skills required, the COSMIC Group2 was designed and
implemented in 2020 as an international software estimation challenge in which university students
compete in a team mode. This challenge addresses the concerns discussed in [11] to address students’
engagement, gamification, problem-based learning, and easing instructors’ challenges in [12]. This
estimation challenge is designed based on best practices in software effort estimation, including the use
of the COSMIC – ISO 19761 standard for sizing software requirements, which consists of determining
the size of the software to be developed and described in a case study, developing an estimation model
using a given set of historical data, and estimating the effort to develop both the given functional and
non-functional requirements.</p>
      <p>The aim of this challenge is to provide students with team-based learning experience in software
effort estimation. Furthermore, the study material recommended by the COSMIC Group on some
estimation best practices, as well as challenge preparatory material provided on the web by the COSMIC
Group, guide university teachers-mentors in structuring and delivering the necessary teaching material
to prepare their participating students’ teams for the challenge, either as an elective or mandatory course
activity.</p>
      <p>In the 2022 edition of this challenge, 53 teams participated, for a total of 233 students from five
countries. An initial study [13], based on a survey of a sub-group of 22 teams who participated in the
2022 edition of this challenge, reported that in students’ opinions, this kind of competition experience
was helpful in learning the COSMIC software sizing method and improving their own knowledge and
skills in software estimation. The study reported here differs from [13] in the following way: it is based
on the analysis of students’ team scores, rather than on opinions, and analyzes these scores across teams
and contexts of participation. Of course, both types of study are useful and complementary.</p>
      <p>This paper presents the design of this challenge in terms of the estimation tasks to be carried out by
students as well as an analysis of how teams from various contexts performed overall and across each
of the challenge tasks. The findings reported here can provide future challenges to participants with
additional insights into the knowledge acquired prior to the challenge. These findings can also help
teams’ mentors to improve their software estimation teaching strategies and related training materials.</p>
      <p>The remainder of this paper is organized as follows. Section 2 presents an overview of the estimation
approach for this challenge, including the COSMIC software-sizing method. Section 3 presents the
students’ team scores in the 2022 challenge, as well as the strengths and weaknesses identified by the
estimation tasks. Section 4 presents the discussion and insights learned.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Challenge Design</title>
    </sec>
    <sec id="sec-3">
      <title>2.1. Challenge structure</title>
      <sec id="sec-3-1">
        <title>This Estimation Challenge was designed as a three-phase process (Figure 1):</title>
        <p>Phase A - Sizing of the case study.
2 COSMIC: Common Software Measurement International Consortium - a non for profit organization responsible for maintaining the
COSMIC – ISO 19761 standard for functional size measurement - see https://cosmic-sizing.org</p>
        <p>The challenge input is a Case Study describing a number of systems and software requirements,
some with detailed information and others at a high-level only. This phase consisted of four tasks for
sizing with COSMIC – ISO 19761 of the software requirements provided in the case study:
• Task 1 – Precise sizing of the functional requirements described at a detailed level.
• Task 2 – Approximate sizing of functional requirements available only at a high level of
description.
• Task 3: Sizing of the system non-functional requirements allocated to software functions.
• Task 4 – Calculation of the total COSMIC size of the case study.</p>
        <sec id="sec-3-1-1">
          <title>Phase B - Building an estimation model.</title>
          <p>The challenge inputs for Phase B are a dataset of project data in an Excel format, including size,
effort, etc. Phase B consisted of two tasks.</p>
          <p>• Task 5: Constructing a linear regression estimation model using the dataset provided.
• Task 6: Documenting the errors intervals of the estimation model.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Phase C- Effort estimation for the case study.</title>
          <p>The challenge inputs for Phase 3 are the outputs of Phases A and B as well as a specified
development environment. This phase consists of the following two tasks:
• Task 7: Estimate the development effort for the case study size in Phase 1 using the estimation
model developed in Phase B.</p>
          <p>• Tasks 8: Convey the estimations outcomes to management in a PowerPoint format.</p>
          <p>Figure 2 presents suggested durations per task and their respective scoring percentages for a total
maximum of 3 hours for the challenge and a maximum score of 100 % for all tasks.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Recommended resources in preparation for the challenge</title>
      <p>•
•</p>
      <p>Phase A: For sizing the requirements, the following documents, available free on the web, are
recommended:
- COSMIC Measurement Manual for ISO 19761 Part 1: Principles, Definitions and Rules,
https://cosmic-sizing.org/publications/measurement-manual-v5-0-may-2020-part-1principles-definitions-rules/;
- COSMIC Measurement Manual for ISO 19761 Part 2: Guidelines,
https://cosmicsizing.org/publications/measurement-manual-v5-0-may-2020-part-2-guidelines/;
- Course Registration (‘C-REG’) System Case Study, v2.0.1,
https://cosmicsizing.org/publications/course-registration-c-reg-system-case-study-v2-0-1/;
- Software Project Estimation book (Chapters 5, 6 &amp; 9) [14];
- Early Software Sizing with COSMIC: Practitioners Guide,
https://cosmicsizing.org/publications/early-software-sizing-with-cosmic-practitioners-guide/;
- Non-Functional Requirements and COSMIC Sizing Practitioner’s Guide,
https://cosmic-sizing.org/publications/non-functional-requirements-and-cosmic-sizingpractitioners-guide/.</p>
      <p>These resources provide definitions, rules, and guidelines for the COSMIC method, including
many examples and some of the best practices in software project estimation.</p>
      <p>Phase B: To develop an estimation model and identify related error intervals, any textbook on
linear regression models is recommended, including Chapters 5, 6, and 9 of [14].</p>
      <p>Of course, professors-mentors can recommend to their students whatever additional material they
deem relevant in preparation for the challenge.</p>
    </sec>
    <sec id="sec-5">
      <title>3. Teams’ Performance in the 2022 Estimation Challenge</title>
      <sec id="sec-5-1">
        <title>An estimation challenge, by design, is not expected to be easy:</title>
        <p>• On one hand, a challenge provides an opportunity for all students to learn new skills and
demonstrate their skills in applying them through a case study.
• On the other hand, it should allow the identification of the students’ teams that perform best in
software estimation.</p>
        <p>This section presents the teams’ performance in terms of the total scores of the teams, followed by
the scores per task and contexts of participation.</p>
        <p>Teams’ overall challenge performance</p>
        <p>The distribution of the overall challenge scores is presented in Table 1, with intervals of 10 points
from the highest to the lowest, with an average of 56% and a median of 61% for all teams.
• The three winning teams from two distinct universities achieved scores over 80%, clearly above
all other teams.
• 25 out of 53 teams achieved scores between 60 and 79%, doing reasonably well in this challenge.</p>
        <p>Together with the three winning teams, 64% of the teams scored higher than 60%.
• The Teams with some scores lower than 15% did not perform some of the tasks.</p>
        <p>Teams’ performance per task</p>
        <p>This section presents an analysis of the teams’ performances per task, including the 53 teams
average, median, and percentage of scores per task (see Table 2). In Table 2, tasks 3 and 4 are combined,
and task 4 consists of a simple addition of the size of the nonfunctional requirements to the size of the
functional requirements.
From Table 2 and Figure 4, it can be observed that the teams’ average scores were:
• the best (e.g., over 60%) of the two tasks limited to the application of pre-programmed statistical
formula to the provided set of project data:
o task 5: Building an effort estimation model using the linear regression technique,
o Task 6: Calculating the estimation intervals in terms of Mean Magnitude of Relative</p>
        <p>Error (MMRE) and coefficients of determination (R2).
• lower (below 51%) in all other tasks requiring analytic skills for the sizing of the requirements
or in the skills at the mapping of their reasonably well-built estimation model to the practical
case study (that is, moving from theory to practice).</p>
        <p>Figure 4 shows a major left skew with very low average scores in the following tasks, indicating
that a large number of teams had either not been aware of the underlying concepts or had not figured
out how to apply them in the practical situation presented in the case study:
- Tasks 2: application of early sizing techniques to high level requirements.
- Tasks 3-4: application of sizing of nonfunctional requirements allocated to software
functions.
- Task 7: application in practice of the mathematical models to the case study they had sized.
- Task 8: presentation to management of key findings using the mandatory PowerPoint
approach of communication (note that a few teams did not prepare such a PowerPoint,
indicating that the information was already available in the details of the previous task,
without acknowledging that ‘information has to be communicated’).</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>3.3. Teams’ performance by participation context</title>
    </sec>
    <sec id="sec-7">
      <title>3.3.1. Participation contexts</title>
      <p>The participation context of the 53 teams is characterized in the following way:
A. An elective context outside the course curriculum: Within this context, the students did not
receive any formal academic recognition. The seven teams within this ‘elective’ context
were mostly master’s degree students from five countries: Cameroon, Canada, Egypt,
Mexico, and Turkey.</p>
      <p>B. Mandatory academic activity within a scheduled course: In this context, students received
formal academic recognition, a portion of which was based on their teams’ overall score at
the challenge. The students within this ‘mandatory’ context were all bachelor’s degree
students who were formally registered in a software project management course. This group
can be further distinguished in the following way:
B1. From a single course at a Turkish university, co-taught by two teachers as a single
teaching group, the students split themselves into 22 teams. This was the first university to
participate in this challenge.</p>
      <p>B2. From a single course at a Canadian university, where students were registered in two
teaching groups with two distinct teachers using the same course content and set of
exercises, together with a single lab instructor for both groups: these students split
themselves into 25 teams. This was the second participation of this university with the same
teachers and lab instructor in charge of tutoring the students in preparation for the challenge
for both years of participation.</p>
    </sec>
    <sec id="sec-8">
      <title>3.3.2. Average score per context</title>
      <p>To analyze whether these three contexts led to differences in team performance, the full dataset was
split into corresponding subsets. Figure 5 presents the average (in blue) and median (in red) scores by
the participation context.</p>
      <sec id="sec-8-1">
        <title>From Figure 5:</title>
        <p>• The teams in their first year of participation (i.e., A and B1 contexts at the master and bachelor
levels, respectively) achieved similar scores.
• The teams from the university with two years of participation (e.g., the B2 context at the
bachelor’s level) achieved the highest scores in comparison to the two other contexts.
In Figure 5, the average and median values are similar within each participation context.
Distribution of scores across contexts</p>
        <p>Histograms of the distribution of scores across the contexts are presented in Figure 6. It should be
noted that on the horizontal axis, the range of scores varies across contexts, with the following
observations:
- The maximum score was 70% in the elective context A.
- The maximum score is higher than 100% in context B1 and B2 since a few teams got some
‘bonus’ points for additional relevant estimation information provided by best teams.
In terms of the distribution of the team scores across contexts, it can be observed that:
• for the A and B1 contexts of first year of participation, many teams have scores lower than 50%,
none above 70% for the A context, and few for the B1 context.
• for the B2 context of the 2nd year of participation: there is a single team with a very low score,
while all others have a score higher than 60%. It should be noted that, in this context, the team
mentors improved their teaching and mentoring based on lessons learned in the previous
participation, which benefited the students.</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Teams’ performance by team sizes</title>
      <p>From Table 3, it can be observed that the teams’ performance in terms of their average score
improved with the number of additional participants. It was also observed that the average team size
for contexts A and B1 was three students, whereas the average team size for B2 context was twice as
large for six students per team.</p>
    </sec>
    <sec id="sec-10">
      <title>4. Discussion and Insights Learned</title>
      <p>The study reported here is based on the analysis of students’ team scores across teams and contexts
of participation. The students who participated in the challenge were from different countries, the size
Number of teams</p>
      <p>Average Score</p>
      <p>Median Score
of the teams varied, and the participating teams received different training in software requirement
sizing and estimation tasks. Teaching strategies, as well as students’ motivations and abilities, also
influence their knowledge acquisition and skills development.</p>
      <p>For the analysis of the teams’ performance, they were classified into three main contexts of
participation: A (Elective), B1 (Mandatory – 1st year of participation), and B2 (Mandatory – 2nd year of
participation). For the overall performance on this challenge, teams from the B2 context came from an
environment with previous experience in the challenge and having developed some prior ‘institutional
knowledge’: same teachers and instructor, but with different students and students’ teams. To facilitate
fair play in the 2022 edition of this challenge, these teachers and instructor volunteered through the
COSMIC Group to share publicly with all 2022 registered teams an improved version of the initial
training material they had developed for their first year of participation.</p>
      <p>While a second year of institutional participation with additional expertise in mentoring participating
teams may be posited to provide an edge in terms of improved performance, it may well be argued that
the larger teams provided such an edge for their higher performance.</p>
      <p>In summary, the teams performed:
• best (e.g., over 60%) in the two tasks limited to the application of preprogrammed statistical
formula to the provided data set of project data:
- task 5: Building an effort estimation model using a regression, and
- task 6: Calculating the estimation intervals.
• much worse (below 51%) in tasks requiring analytic skills for the sizing of the requirements or
in the skills at the mapping of their reasonably well-built estimation model to the practical case
study the teams had sized (that is, moving from theory to practice).
• Some of the teams might not have been mentored-trained into the most recent best practices
developed by the COSMIC Group and included in this challenge, such as in the early sizing
techniques of software requirements with few details, as well as in the sizing of system
nonfunctional requirements allocated to software functions.</p>
      <p>The findings reported here can provide future challenges participants with additional insights into
the knowledge acquired prior to the challenge. These findings can also help teams’ mentors to improve
their software estimation teaching strategies and related training materials. In future work, we plan to
investigate in a much deeper way the quantitative data we gathered. Moreover, we plan to make further
analyses by taking into account contextual information per team/student (e.g., process
compliance/understanding, knowledge/background of students). We also plan to include a more
indepth discussion about the distinctive results of the B2 teams.</p>
    </sec>
    <sec id="sec-11">
      <title>5. Acknowledgment</title>
    </sec>
    <sec id="sec-12">
      <title>6. References</title>
      <p>We are grateful to the anonymous reviewers.
[1] A. Dagnino, Increasing the effectiveness of teaching software engineering: A University and
industry partnership, in: Proceedings of the 27th Conference on Software Engineering Education
and Training (CSEE&amp;T), IEEE, Klagenfurt, Austria, 2014, pp. 49-54. doi:
10.1109/CSEET.2014.6816781.
[2] N.C. Flores, M.A. Muñoz and J.G. Hernández Reveles, Guide to teach Lean Startup methodology
to software engineering students using a serious game, in: Proceedings of the Mexican
International Conference on Computer Science (ENC), IEEE, Morelia, Mexico, 2021, pp. 1-8. doi:
10.1109/ENC53357.2021.9534827.
[3] B. Boehm, Software Engineering Economics. In: Broy, M., Denert, E. (eds) Pioneers and their</p>
      <p>Contributions to Software Engineering. Springer, Berlin, Heidelberg, 2001.
[4] W.S. Humphrey, PSP(SM): A Self-Improvement Process for Software Engineers,
Addison</p>
      <p>Wesley Professional, Upper Saddle River, 2005.
[5] L.S. Furtado and S.R.B. Oliveira, A teaching proposal for the software measurement process using
gamification: an experimental study, in: Proceedings of the IEEE Frontiers in Education
Conference (FIE), IEEE, Uppsala, Sweden, 2020, pp. 1-8.
https://doi.org/10.1109/FIE44824.2020.9274194.
[6] M.M. Alhammad, A.M. Moreno, Gamification in software engineering education: A systematic
mapping, Journal of Systems and Software, 141, 2018, pp. 131-150.
https://doi.org/10.1016/j.jss.2018.03.065.
[7] G. Rong, H. Zhang and D. Shao, Applying competitive bidding games in software process
education, in: Proceedings of the 26th International Conference on Software Engineering Education
and Training (CSEE&amp;T), IEEE, San Francisco, CA, 2013, pp. 129-138.
https://doi.org/10.1109/CSEET.2013.6595244.
[8] E. Navarro, SimSE: A Software Engineering Simulation Environment for Software Process
Education, Ph.D. thesis, Donald Bren School of Information and Computer Sciences, University
of California, Irvine, 2006.
[9] C. Sims, H. L. Johnson, and Agile Learning Labs, How to play the Team Estimation Game, 2012,</p>
      <p>URL: https://agilelearninglabs.com/2012/05/how-to-play-the-team-estimation-game/.
[10] B. Marin, M. Vera, and G. Giachetti, An adventure Serious Game for Teaching Effort Estimation
in Software Engineering, in: Proceedings of the International Workshop on Software Measurement
and International Conference on Software Process and Product Measurement
(IWSM/MENSURA), CEUR Workshop Proceedings, Haarlem, The Netherlands, 2019, pp.71-86.
[11] E. Jääskä, and K. Aaltonen, Teachers’ experiences of using game-based learning methods in
project management higher education, Project Leadership and Society, vol. 3, 2022, pp. 1-12.
[12] S. Ouhbi and N. Pombo, Software engineering education: Challenges and perspectives, in:
Proceedings of the IEEE Global Engineering Education Conference (EDUCON), IEEE, Porto,
Portugal, 2020, pp. 202-209.
[13] T. Hacaloglu, B. Say, H. Unlu, N. K. Omural, O. Demirors, A Survey on COSMIC Students
Estimation Challenge, in: Proceedings of the 31st International Workshop on Software
Measurement and International Conference on Software Process and Product Measurement
(IWSM/MENSURA), CEUR Workshop Proceedings, Izmir, Turkey, 2022.
[14] A. Abran, Software Project Estimation: The Fundamentals for Providing High Quality Information
to Decision Makers, Wiley &amp; IEEE-CS Press, Hoboken, New Jersey, 2015.
[15] COSMIC, The benefits of COSMIC software sizing, 2022, URL:
https://cosmicsizing.org/cosmic-sizing/intro/benefits-of-cosmic/.
[16] ISO/IEC 19761: Software engineering - COSMIC: a functional size measurement method, ISO,</p>
      <p>Geneva, (2011, reviewed and confirmed in 2019).
[17] C. Commeyne, A. Abran, and R. Djouab, Effort estimation with story points and COSMIC
function points - an industry case study, Software Measurement News, vol. 21, no. 1, 2016.
[18] M. Salmanoglu, T. Hacaloglu, and O. Demirors, Effort estimation for agile software development:
comparative case studies using COSMIC functional size measurement and story points, in:
Proceedings of the 27th International Workshop on Software Measurement and International
Conference on Software Process and Product Measurement (IWSM/MENSURA), ACM,
Göteborg, Sweden, 2017, pp. 41-49.
[19] C. Symons, A. Abran, C. Ebert, and F. Vogelezang, Measurement of Software Size: Advances
Made by the COSMIC Community, in: Proceedings of the 26th International Workshop on
Software Measurement and International Conference on Software Process and Product
Measurement (IWSM/MENSURA), IEEE, Berlin, Germany, 2016, pp. 75-86. doi:
10.1109/IWSM-Mensura.2016.021.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>