Teaching Empirical Software Engineering Using Expert Teams Marco Kuhrmann, University of Southern Denmark kuhrmann@acm.org Abstract surable, empirical software engineering is applied to Empirical software engineering aims at making soft- (i) analyze/understand phenomena in software de- ware engineering claims measurable, i.e., to analyze velopment, (ii) identify/evaluate strengths and weak- and understand phenomena in software engineering nesses of software engineering approaches, and (iii) and to evaluate software engineering approaches and investigate the state of the art/practice to identify solutions. Due to the involvement of humans and promising solutions/approaches. This makes empiri- the multitude of fields for which software is crucial, cal software engineering hard to apply for researchers software engineering is considered hard to teach. Yet, and practitioners, but it makes it even harder to teach. empirical software engineering increases this difficulty Wohlin et al. [45] consider the engineering method by adding the scientific method as extra dimension. In and the empirical method variants of the scientific this paper, we present a Master-level course on empir- method [3]. That is, teaching empirical software en- ical software engineering in which different empirical gineering means teaching the scientific method and instruments are utilized to carry out mini-projects, i.e., their adaptation to software engineering. However, students learn about scientific work by doing scientific scientific work differs from “pure” system develop- work. To manage the high number of about 70 stu- ment. For example, while a development project can dents enrolled in this course, a seminar-like learning be carried out in a semester project, a sound empirical model is used in which students form expert teams. investigation is harder to implement, since resources Beyond the base knowledge, expert teams obtain an required for this purpose would then be missing in extra specific expertise that they offer as service to the development. Also, students would need to know, other teams, thus, fostering cross-team collaboration. e.g., how to set up experiments or surveys, how to The paper outlines the general course setup, topics conduct them, and how to analyze and make use addressed, and it provides initial lessons learned. of the findings—again, not directly contributing to a small project with a deadline and a working piece of software as the desired outcome. 1 Introduction So, what to do? Wohlin [43] considers three gen- Software engineering aims at the systematic applica- eral options to teach empirical software engineering: tion of principles, methods, and tools for the develop- integration in software engineering courses, as a sepa- ment of complex systems. This comprises the software rate course, and as part of a research method course. technology as well as the software management part of Yet, these approaches have some difficulties. For in- this discipline [5], and in each of these parts, humans stance, in a theoretical course, students would hear are involved. Due to this human involvement and about different empirical instruments, could train se- the multitude of fields for which software has become lected methods, or review and discuss research papers. crucial, software engineering is considered hard to According to Dale’s Cone of Learning [10], those ac- teach. Literature is rich and discusses different experi- tivities would largely remain at the passive level (see mental settings [15], in-class projects [23], or project further Section 2). So, what would remain? Under- courses [6] in general—each addressing technology standing the scientific method in general and empir- (e.g., analysis, coding, and testing) or management is- ical software engineering in particular and to see its sues (e.g., project management, the software process, value requires hands on. That is, staying in Dale’s and teams and soft skills [7, 30]). model, a course on empirical software engineering However, most of the software engineering courses also needs to cover the active levels of the cone. address the system/product development. Yet, when can a project be considered efficient? How to select Objectives This paper aims at providing a course methods having a higher probability of success in a that helps students learning scientific work by do- specific context? How can the dis-/advantages of cer- ing scientific work. However, scientific work requires tain technologies, methods, or tools be evaluated? collaboration, causes effort, and consumes time. Fur- In order to make software engineering claims mea- thermore, quite often, students lack skills crucial to Bernd Bruegge, Stephan Krusche (Hrsg.): SEUH 2017 20 Marco Kuhrmann - Teaching Empirical Software Engineering Using Expert Teams scientific work, such as to carry out a comprehensive the related ideas and concepts associated with the literature research, exact problem definition, statistics, phenomenon. However, this shows the two different or professional writing. streams in using empirical instruments in teaching: Therefore, the main challenge to be addressed is On the one hand, students are educated such that they to define a teaching format that (i) provides students can serve as subjects in empirical studies (including with the basic knowledge concerning scientific work, all the risks as mentioned by Runeson [37]), and, on (ii) enables students to understand the role of em- the other hand, empirical studies are used as teach- pirical research in software engineering, and (iii) to ing tools (as for instance done in economy for years, train scientific work by carrying out (small) research cf. [1, 33]). And it must not be questioned that em- projects and to run through a research cycle, including pirical instruments provide a good basis to organize presentation, writing, and reviewing. whole courses or individual sessions, e.g., [24, 26]. Contribution The paper at hand contributes a After 2 weeks, we tend to remember… course design for an empirical software engineering course and experiences from a first implementation. 10% of what we The overall design follows an approach that brings Reading read teaching closer to research [27], and that was suc- cessfully applied to different methodical topics, in Hearing words 20% of what we hear sive particular software process modeling [24] and ad- Pas vanced project management [26]. Different that the Seeing 30% of what we see other implementations of the base concept, the course Watching a movie, looking at presented in the paper at hand addresses large classes an exhibit, watching a 50% of what we demonstration, seeing it done see and hear (50+ students). To keep this course manageable, ex- on location pert teams were introduced. Each of these teams fo- 70% of what we cuses on a specific competency beyond the general Participating in a discussion, giving a talk say ive Act knowledge and offers this competency to other teams. Doing a dramatic presentation, simulating 90% of what we That is, in addition to the intra-team collaborations, a the real experience, doing the real thing say and do cross-team collaboration pattern is implemented. The course evaluation shows the selected approach reason- Figure 1: Dale’s cone of learning (according to [10]). able; in particular, students consider the course chal- lenging yet good. More important, students changed However, these approaches aim at utilizing em- their view on scientific work and started to consider it pirical instruments to support courses. Usually, stu- valuable. dents only get in touch with empirical instruments as subjects in an empirical inquiry, and they have Outline The remainder of this paper is organized as to carry out tasks, e.g., in a controlled experiment, follows: Section 2 sets the scene by providing back- e.g., [15–17, 25, 26]. Teaching empirical software en- ground information. Section 3 presents the course gineering as a subject, however, would require a self- design including learning goals, organization, course contained course—or as Wohlin [43] mentioned: a layout, team structures, and deliverables. Section 4 self-contained course or as part of a course on research provides insights into the initial implementation and methods. In respect of Dale’s Cone of Learning [10], and evaluation. The paper is concluded in Section 5 such a course would need to cover the different levels with discussion on the lessons learned so far. of the learning cone (Figure 1). Yet, while the passive parts of the cone are easy to implement, addressing 2 Fundamentals and Related Work the active levels is way more challenging, since this Empirical software engineering and its integration requires the students to carry out actual research. with software engineering curricula was for instance In [27], we proposed a teaching model to better elaborated by Wohlin [43], who mentioned three gen- align research with Master-level courses—mainly uti- eral levels of integration: integration in software engi- lizing empirical instruments to re-organize exercise neering courses, as a separate course, and as part of a parts to bring students closer to real cases, but in a pro- research method course. Wohlin argues that introduc- tected environment, which, inter alia, allows for simu- ing empirical software engineering will provide more lation of critical or even failure situations [25,26]. Ap- opportunities to conduct empirical studies in student plying this approach to several more method-focused settings. However, he also mentions a need to balance courses, experience gathered so far was used to apply educational and research objectives. Similar argu- this approach to empirical software engineering. The ments are provided by Dillon [12], who states that paper at hand thus provides a new building block in successful observation of a phenomenon as part of an software engineering education, which proposes an empirical study should not be an end in itself, and that initially evaluated template for setting up courses on students should have enough time to get familiar with empirical software engineering. Bernd Bruegge, Stephan Krusche (Hrsg.): SEUH 2017 21 Marco Kuhrmann - Teaching Empirical Software Engineering Using Expert Teams Learning Goals Instrument Summary G1 Learn the scientific way of work: Students are intro- Experiment Experiments investigate effects of treat- duced to the scientific method, learn the relevant ments under controlled conditions. They terminology and concepts, and learn about the are rigorously designed and results consti- process of planning, conducting, and reporting sci- tute tests of a theory. Experiments can be, entific work. for instance, (semi-)formal, address mul- G2 Learn to work with scientific literature: Students tiple factors, and they can be conducted learn how to find, read, and how to critically evalu- under lab conditions or in the field [45]. ate scientific literature. Students carry out reviews Case Study Case studies aim to investigate a phe- of real conference papers (training with given cri- nomenon in its natural context. They teria), and students carry out group-based peer help answering explanatory questions, reviews of the essays written in the course. and they should be based on an articu- G3 Obtain detailed knowledge about scientific methods: lated theory regarding the phenomenon Complementing the overview, students get detailed of interest. Case studies can be imple- knowledge about selected scientific methods. The mented in a variety of setups, e.g., sin- students are enabled to explain, discuss, and apply gle case, multi-case, and longitudinal case the chosen methods. studies [39]. G4 Carry out a scientific study: In small teams, students Survey A survey aims at collecting information learn science by doing science. Studies are carried Research from or about people to describe, compare out by team setups comprising theory and practice or explain their knowledge, attitudes, and teams; cross-cutting teams provide support, e.g., behavior [14]. Surveys can, for instance, for reporting, data analysis, and data visualization. be implemented as interview studies or as online questionnaire [29]. G5 Train and improve communication and collabora- tion skills: Students go through large parts of a Simulation Simulation refers to the use of a simu- scientific investigation and, thus, need to collab- lation model as an abstraction of a real orate with other teams. Furthermore, they need system or process. Typical purposes for to give presentations about their topics and they using such models are experimentation, have to write “conference” papers (course essays) increased understanding, prediction, or to report their findings. decision support. Simulations can, for in- stance, be carried out as people-based or Table 1: Summary of the course’s learning goals. computer simulations [31, 45]. Literature A literature study aims at collecting re- Study ported evidence to (i) capture and struc- 3 Overall Course Design ture a domain of interest, (ii) to aggre- This section presents the learning goals, the general gate available knowledge, and (iii) to synthesize generalized knowledge about organization model, the overall course design, the the topic of interest. Literature studies group setup and the topics handed out to the stu- in software engineering come as system- dents. Furthermore, in the section, we explain how atic review [19] or as systematic mapping the different student groups form the team of experts study [36]. throughout the course. Table 2: Summary of the main empirical instruments Learning Goals With the course contents, structure (study types) to be considered. and the team setup presented, the course addresses the learning goals summarized in Table 1. cal instruments, the course implements expert teams Empirical Methods A variety of empirical meth- according to the general organization model as illus- ods/techniques is subject to teaching. Before going trated in Figure 2. For each empirical method, two into the details, we briefly summarize the methods selected for the course in Table 2. The instruments listed in Table 2 are of interest when setting up the Method joint research I actual “research work” for the students, since every 1 request advice 0..* method has certain constraints. For instance, while Theory Team Practice Team smaller experiments or (partial) literature studies are 0..* provide advice 1..* 0..* 0..* suitable for an educational setting, a real case study S provide provide shared is difficult to implement (time and effort). The actual advice advice research design selection is further discussed in Section 3.3. Cross-cutting Team 3.1 General Organization Model To address the different learning goals, and, at the Figure 2: Overall organization model of the Scientific same time, to cover the variety of different empiri- Methods course. Bernd Bruegge, Stephan Krusche (Hrsg.): SEUH 2017 22 Marco Kuhrmann - Teaching Empirical Software Engineering Using Expert Teams types of teams are built. A theory team is supposed The course starts with a general introduction to to build competency about the actual method, e.g., the topic, which covers the foundations of scientific what is the method about in detail, how to apply work (session 1) and basic knowledge regarding re- it, and how to report findings. Practice teams take porting and presenting scientific work (session 2). over an actual research task to be implemented fol- In the first session, the different “Mini-Projects” are lowing a specific method, e.g., an experiment or a introduced, and students select their topics for the literature review. Theory teams then provide advice semester (the final selection from the topic pool is to the practice teams and monitor the implementation shown in Table 3). In the second session (as part of of the task, and practice teams request services from an introduction to publication processes) the review the theory team, e.g., feedback on the procedures. process is introduced. Based on this introduction, stu- Furthermore, practice teams can be connected with dents are handed out an assignment in which they each other. Figure 2 defines the two relationships have to review 2 randomly selected papers following a “I” (Interface) for joint research, i.e., research on one review template (session 3; homework). In session 4, topic but from different perspectives, and “S” (Shared) students get an introduction (or re-cap) on the basic for an independently conducted research task, i.e., a maths of empirical research, e.g., hypothesis construc- shared research design is independently implemented tion, statistical tests, and errors. In the first part of by multiple teams. Finally, for specific topics that ad- the course (4 weeks), students are introduced to the dress cross-cutting concerns, like statistical analyses or subject, basic elements of the work procedures are data visualization, cross-cutting teams are established. introduced, and students carry out first activities. These teams serve all theory and practice teams. While the actual scientific methods (Table 2) were only presented as “teasers” in the first block, the sec- 3.2 General Course Layout ond part of the course starts with detailed elaborations Figure 3 illustrates the overall structure of the course on these methods. The teams, which opted for theory Scientific Methods and shows how the different topics topics (Table 3) present their respective methods. The (Section 3.3) are aligned in the course. presentations include an overview of the method, a description of how the method is applied (in general Introduction and 4 hours per and illustrated by examples), and the presentations Fundamentals session conclude with recommendations regarding the imple- mentation for the practice teams. After the presenta- Lecture Exercise Presentation and Scientific Writing tions, the theory teams switch their role and become Model “consultants” for the practice teams (cf. Figure 2). Paper Reviews The following five weeks are fully devoted to project work, i.e., the practice teams work on their topics. In Introduction to this 5-weeks slot, two in-class sessions are scheduled Empirical Research in which the teams report the current project state. These sessions comprise guest lectures by researchers, Self-directed learning: working on the selected topics, e.g., SLRs or surveys (incl. presenting and writing) Presentations: Theory and who present their research and explain how it was Tutorial Topics conducted, and tutorials are implemented, such as Status Control and implementing a survey as online questionnaire. Guest Lectures theory experts In the next slot, the outcomes of the respective help practice teams… projects are presented. In parallel, students started Status Control and Guest Lectures writing their essays, which have to be handed in the week after the last student group presentation. These Presentations: Secondary Studies essays are written as conference papers following the rules of a scientific conference, i.e., structure, page Presentations: Experiments and limits, and so forth. These papers are collected and Simulations distributed for peer-review among the groups. The course layout from Figure 3 directly addresses Presentations: Survey Research the learning goals (Table 1): the first part addresses the learning goals G1 and G2 , the second part ad- Group Papers’ dresses the learning goals G3 , G4 , and G5 , and the last Peer Review part addresses G2 again. Evaluation and 3.3 Topic Overview Wrap-Up The choice of topics for the course presented is influ- enced by (i) available topics from ongoing research, Figure 3: Overview of the overall structure of the (ii) available options to replicate completed research, Scientific Methods course. and (iii) a share of theoretical topics for students that Bernd Bruegge, Stephan Krusche (Hrsg.): SEUH 2017 23 Marco Kuhrmann - Teaching Empirical Software Engineering Using Expert Teams No. Topic Kind Team References M A R S 1 What is a Survey? Theory 3 2 [20, 29] [4, 13] 2 What is an Experiment? Theory 3 3 [2, 45] [7, 8] 3 What is a Systematic Review? Theory 3 3 [19] [11, 42] 4 What is a Systematic Mapping Study? Theory 3 3 [35, 36] [21, 34, 44] 5 What is a Simulation? Theory 3 3 [31, 45] [28, 32, 41] 6 What is a Case Study? (C) Theory 3 3 [38, 39] [9, 40] 7 What are Threats to Validity? (C) Theory 3 3 [45] — 8 Introduction to R (C) Tutorial 4 4 self-search 9 Introduction to Data Visualization (C) Tutorial 4 4 self-search 10 Test approaches in Agile SW-development Experiment 4 3 [45] [15–17] 11 Perception of SE Semester Projects (Students) Survey 4 4 [20, 29] — 12 Perception of SE Semester Projects (Teachers) Survey 4 4 [20, 29] — 13 Quality Management in SPI SLR 4 3 [19] [11, 18, 42] 14 Industry exceptions on Testing Research (Group A) Survey 4 4 [20, 29] — 15 Industry exceptions on Testing Research (Group B) Survey 4 4 [20, 29] — 16 Success Factors in SPI SMS 4 3 [19] [11, 21, 42] 17 Agility as SPI Paradigm (Group A) SLR 4 4 [19] [11, 21, 42] 18 Agility as SPI Paradigm (Group B) SLR 4 3 [19] [11, 21, 42] 19 Comparison of Place Cell Models Simulation 4 4 [31, 45] [28, 32] 20 Comparison of Navigation Strategies Simulation 4 4 [31, 45] [41] Table 3: Overview of the topics in the Scientific Methods course including a classification (kind), team setup (M: max. team size, A: actual team size), and provided references (R: reference publications explaining the method, S: reference/input studies on this particular research project). are reluctant towards working “in the wild”. Table 3 some consultancy, or even give a guest lecture on the gives a short overview by naming the topics, categoriz- respective research. ing them, and providing information regarding maxi- mum team size and references to respective research 3.4 Team Structure and Collaboration projects/publications if applicable. These topics repre- This section introduces the actual team setup of the sent the finally selected topics from a pool of about 30 initial implementation. Figure 4 illustrates the bird’s- proposals. As mentioned before, the list comprises a eyes perspective on the team setup and shows the number of theory and tutorial topics. Groups having relation of the theory teams and the practice teams.. selected those topics did not carry out “real” research, The teams’ numbers in Figure 4 correspond with the but built the methodical competence and consulted topic numbers from Table 3. the practice teams. That is, it was ensured that each Due to the sponsors and the research they brought practice team has a consultant team (in addition to to the table, practice teams had three different types the teacher) available. of projects: individual projects, interfaced projects, and shared projects. In interfaced projects (teams 11 The practice topics from Table 3 are selected from and 12), students set up a study on the same sub- ongoing research (or from completed research that ject, but had to take different perspectives and slightly was identified worth replication). For these topics, ex- different methods to be applied. Nevertheless, both isting research collaborations were triggered to iden- teams needed coordination, notably concerning the tify topic sponsors. For instance, potential topic spon- questionnaire designs and the scheduling of interview sors were asked: Do you have ongoing research that we slots. In shared projects, students either shared a could contribute to?, Do you have research designs that study design (and applied it to different target groups, we could use?, Do you have data that you would like e.g., teams 14, 15) or implemented independently con- to have a preliminary analysis for? However, the con- ducted research based on an identical task (e.g., teams ditions were made clear: (i) the topics must be man- 17, 18). ageable within 4 weeks, (ii) for secondary studies, a pre-digested dataset has to be delivered, (iii) sponsors must not expect a full and mature, i.e., publication- Survey Research Teams The survey research teams ready, result set, and (iv) sponsors should be willing (Figure 5) worked on two tasks: one interfaced task to carry out quality assurance tasks and, if applicable, and one shared task. The interfaced task means that Bernd Bruegge, Stephan Krusche (Hrsg.): SEUH 2017 24 Marco Kuhrmann - Teaching Empirical Software Engineering Using Expert Teams Cross-cutting Theory Teams Practice Teams Concerns Survey Team 1 P E Team 11 P E Team 14 P E Team 6 P E Research I S Team 12 P E Team 15 P E Team 7 P E Team 8 P T E Controlled Team 2 P E Team 10 P E Team 9 P T E Experiment Systematic Team 3 P E Team 13 P E Team 17 P E Reviews S Team 18 P E P Presentation Systematic Team 4 P E Team 16 P E Mapping E Essay T Tutorial Computer Team 5 P E Team 19 P E Simulation I Interface S Team 20 P E Shared Study S Design Figure 4: Overview of the teams structure: theory and practice teams, method-based team clusters, and teams addressing cross-cutting concerns. Deliverable types are explained in detail in Table 4. ext. Sponsor two groups. Both groups implemented the research Same topic, same task (independently conducted) designs, yet surveying different groups of which the contact information was provided by two local indus- Teacher Team 14 P E try clusters. In this case, the two groups received a Topic S Team 15 P E predefined research kit and had to implement this kit, Research Design Practice i.e., organize and conduct interviews. Topic Topic Same topic, different Systematic Review Teams Two systematic review task/perspective provide advice to Research (SLR) topics were selected from the topic pool. Since practice teams… Design 1 SLRs are time-consuming, especially the actual search Team 1 P E Team 11 I P E and selection stages, both teams were provided with Theory Team 12 P E pre-digested datasets emerging from a systematic map- Cross-cutting Research Practice ping study (scoping study: [21]) and two selected sub- Teams provide advice to Design 2 all other teams… studies [18, 22] thereof. For the SLRs, two external sponsors were acquired, who contributed to the topic and research design definition, provided pre-digested Figure 5: Group setup of the survey teams (theory, datasets, and supported the quality assurance of pre- practice, cross-cutting). liminary results. The general organization follows the setup shown both teams worked on related research designs de- in Figure 5, yet, the shared study design followed rived from the shared topic: Analyze the perception a slightly different approach. Both teams 17 and 18 of the semester projects from the perspective of the received the same research kit and were asked to carry students and from the perspective of the teachers. out the same tasks in an independent manner. The For this, several individual and joint sessions were purpose was to carry out the systematic review from organized, inter alia, to elaborate shared questions of two different groups to, eventually, demonstrate the the respective questionnaires to allow for discussing expected difference in the results caused by personal the overall topic from different perspectives, e.g., stu- decisions of the respective reviewers. dents’ vs. teachers’ perspectives of project topics or group setups. Cross-cutting Concerns The teams covering the The shared task means that an external sponsor cross-cutting concerns have a special role in this setup. shared a research design, which was handed out to In particular, every team has to report their results. Bernd Bruegge, Stephan Krusche (Hrsg.): SEUH 2017 25 Marco Kuhrmann - Teaching Empirical Software Engineering Using Expert Teams Deliverable Types Id Question Type R1 Review: In the first individual assignment, students General Criteria have to deliver reviews for two randomly selected GC1 Please rate the course according to the LI conference papers (following a given template, following criteria: general cause com- about 1 page per review). plexity, course speed, cause content vol- P Presentation: Each team has to give a 15-minute ume, and the appropriateness of the presentation on its topic. The presentations are course in terms of ECTS scheduled in topic slots as shown in Figure 3. GC2 Please rate the following course compo- LI T Tutorial: For the teams 8 and 9, the students have nents: lecture, exercise, relation to prac- to prepare a 15-minute tutorial, which can be done tice in class as well as “offline”. Free-form comments (1-minute paper) E Essay: Each team has to submit an up to 10-page es- FF1 Please name up to 5 points you consid- Text say in which the project is described. For the theory ered good teams, the essay must comprise definitions, sum- FF2 Please name up to 5 points you consid- Text maries about the application of a method based on ered bad and that need improvement further studies, check lists for the practice teams, and observations of the practice teams. The prac- FF3 Anything else you want to say? Text tice teams report their findings from their respec- Course-integrated Mini-Projects tive projects. The essay is developed in LATEX fol- MP1 What is your general opinion about the 0/1 lowing the latest ACM conference templates. mini-projects? R2 Review: Each team has to review two papers from MP2 Please evaluate the statements: changed LI other project teams. Other than R1 , this review is my view on science, improved learning carried out as a group task. experience, better understanding of con- cepts, helpful for later career, and built Table 4: Summary of the expected deliverable types expertise to share. (related to the teams from Figure 4). MP3 If you had the choice, would you have LI more focus on mini-projects or more clas- sic exercises? Since there were no case studies among the topic pro- MP4 Is there anything else you want to say? Text posals1 , team 6 was asked to focus on (case) study GC and MP Extension (final evaluation only) reporting and to offer respective knowledge to the MP5 Looking back, the mini-projects con- LI practice teams. The topics 7, 8, and 9 are true cross- tributed to my learning experience. cutting topics, i.e., all practice teams have to discuss MP6 Looking back, the team work within the LI threats to validity, have to carry out some sort of data mini-projects was good. analysis, and have to visualize their findings. There- MP7 Looking back, the cross-team collabora- LI fore, the cross-cutting concerns teams are (potentially) tion among the different project groups consulted by all the other teams. was good. 3.5 Deliverables and Examination GC3 What is your major take-home asset Text from the course? Each team has to deliver a number of deliverables, which are summarized in Table 4. Table 5: Questionnaire used for the mid-term evaluation (simplified version for space limitations; 4 Evaluation and Discussion LI=Likert scale, 0/1=decision on a statement). We report our experiences and lessons learned from the initial implementation of the course. Furthermore, we provide some discussion using the in-course feed- The questionnaire comprises quantitative as well back collected in two evaluation rounds. ans qualitative questions: the general criteria (GC) serve the general analysis whether students consider 4.1 Course Evaluation the course fair2 . The second part of the questionnaire The evaluation presented in this section is based on comprises the 1-minute-paper part (FF) in which stu- two evaluation rounds, which were carried out in the dents are asked for providing feedback to capture the seventh session (mid-term) and the closing session current mood in the course and to support the course’s (final). The evaluation was conducted using the ques- improvement. Finally, in the third part, the perceived tionnaire presented in Table 5. value of the course-integrated mini-projects (MP) is 1 As this was the first time the course was run this way, case study evaluated. The subsequent sections provide the quan- research was excluded from the portfolio due to the expected effort titative and qualitative analysis of the mid-term feed- of running a “true” case study. Also, only one experiment group was accepted. Yet, these methods will be included in upcoming course instances as soon as there is sufficient experience available 2 Note: These questions are kept stable since [27] in order to regarding the options to integrate these methods properly. also validate the teaching model proposed; see also [24]. Bernd Bruegge, Stephan Krusche (Hrsg.): SEUH 2017 26 Marco Kuhrmann - Teaching Empirical Software Engineering Using Expert Teams back, and present the evaluation of the mini-projects the figure shows that the majority of the students rate and the perceived value. the course good to very good. The overall grades from 4.1.1 Standard Quantitative Evaluation the parts GC1 and GC2 are shown in Table 6 (with In total, 68 students are active in the course of which 1.00 as best and 5.00 as worst grade). 39 students participated in the mid-term evaluation and 38 in the final evaluation respectively. The sub- Criterion Mid-Term Final sequent discussion is focused on the final evaluation, MD Avg MD Avg and results from the mid-term evaluation are pre- Course Complexity 3 2.69 % 3 2.87 sented, but only used for discussing changing percep- Course Speed 3 2.85 % 3 3.05 tions over time. Content Volume 2 2.26 % 2 2.58 The question GC1 addresses the general rating of Appropriateness 2 3.00 1 4 2.87 the course and whether the students consider the Lecture 2 2.23 3 2.21 course appropriate. Figure 6 shows the absolute rating Exercise 2 2.44 1 2 2.32 and shows that students consider the course’s volume high to very high (19 out of 38), but at the same time, Relation to Practice 2 2.21 1 2 2.11 the majority of the students consider complexity and Table 6: Overall rating of the course (MD: modal speed fair. In summary, 24 out of 38 students consider values; Avg: average ratings; mid-term: n=39, final: the appropriateness of the ECTS for this course fair to n=38; arrows indicate the trend from the mid-term absolutely appropriate. to the final evaluation). Appropriat. Final 4 12 8 13 1 4.1.2 Qualitative Standard Evaluation Mid 4 12 8 10 5 For the qualitative evaluation, the 1-minute-paper part of the questionnaire is used (Table 5, ques- Final 3 16 5 2 2 tions FFx ). In particular, for question FF1 , 35/32 Volume (mid/final) students provided (positive) feedback; for Mid 8 16 12 3 FF2 , 32/29 students provided feedback regarding neg- Final 8 22 6 2 ative points/aspects to be improved, and, finally, for FF3 , 10/6 students provided further comments. In Speed Mid 11 23 5 total, we received about 130 statements for the mid- term evaluation and about 125 comments in the final Complexity Final 10 24 3 1 evaluation, which we group and analyze in the follow- ing. The statements of the students were categorized Mid 2 12 22 2 1 based on keywords; the threshold for a category was 0% 20% 40% 60% 80% 100% set to three mentions. Very High High Fair Low Very Low Category Mid-Term Final Figure 6: Evaluation of the general criteria part GC1 . Pro Con Pro Con Group work, feedback, com- 11 7 munication Content and understanding 8 9 Final 6 18 10 4 Exercise Mini-projects 8 5 11 4 Mid 8 13 12 5 1 Work pattern 8 4 10 1 Final 11 12 13 2 Content/material volume 12 4 Lecture Class size 3 3 Mid 11 15 7 5 1 Relevance 4 4 Rel. Practice Final 11 17 6 3 1 Guest lectures* 3 1 6 5 Mid 8 19 8 4 Volume for ECTS* 6 4 0% 20% 40% 60% 80% 100% Table 7: Categorized and condensed qualitative feed- Very Good Good Fair Bad Very Bad back (free-form text questions) Categories marked with ‘*’ were added during the final evaluation. Figure 7: Evaluation of the general criteria part GC2 . Table 7 provides the condensed qualitative feed- Question GC2 aims at computing an overall grade back in nine categories. Group work, in particular, for the course and considered the three components the involvement of students, the communication and lecture, exercise, and the relation of the course to quick feedback cycles were considered positive. Also, practice. Figure 7 shows the absolute mentions, and the content collection and the understandability of Bernd Bruegge, Stephan Krusche (Hrsg.): SEUH 2017 27 Marco Kuhrmann - Teaching Empirical Software Engineering Using Expert Teams the content was considered positive; whereas the vol- ume of the content was considered critical (comment, Final 11 16 6 4 1 mid-term: “maybe 20% less would be more manage- able.”). The chosen way of work, i.e., expert teams in Mid 9 14 9 6 1 combination with the mini-projects, shows an indif- 0% 20% 40% 60% 80% 100% ferent picture. On the one hand, students appreciate this approach as it allows for focusing, continuous More focus on mini-projects Somewhat focus... work on one subject, and building expertise in specific Indifferent Somewhat more... methods. However, on the other hand, students con- More classic lectures/exercises sider certain aspects critical. For instance, the focus brought by the mini-projects allows for obtaining de- Figure 8: Perception of the mini-project approach tailed knowledge on one method, yet, the students are compared to the classic lecture-exercise model (mid- concerned about the other approaches, which were term: n=39; final: n=38). not in their respective scope. One argument presented was a non-optimal synchronization among the expert teams. Arguments presented in favor of the work understanding Help my later expertise to Final 4 16 11 6 1 model were real research and cross-team collabora- share I built tion, arguments against this work model regarded a Mid 6 11 21 1 late availability of the research kits and the size of the Final 9 15 9 2 3 projects. Few students also suggested to reduce the career mini-projects to “normal” assignments that would al- Mid 8 13 11 5 2 low for covering more topics/methods: “Mini-projects are basically good, but more variety would be nice.”). experience of concepts Final 13 22 2 1 Better Another critical aspect of the course was the amount Mid 19 8 11 1 of reading material provided. While two students ex- plicitly mentioned “learning to analyze scientific pa- Final 17 19 11 Improved learning pers”—and later on also “write”—positive, six (mid- Mid 16 18 1 3 1 term) students considered the material to read and Changed my analyze too much, yet, in the final evaluation, this Final 9 23 3 1 2 view on science aspect was not mentioned anymore. In the mid-term Mid 6 21 8 3 1 evaluation, six students stated the number of ECTS points for this course to small; in the final evaluation, 0% 20% 40% 60% 80% 100% four students (still) think the the amount of credit Fully Agree Somewhat Agree Indifferent points inappropriate. Furthermore, students from Somewhat Disagree Fully Disagree other study programs than Software Engineering, MSc were enrolled. Hence, the relevance for their specific Figure 9: Evaluation (absolute) of the general criteria education lines was questioned. Also, three students section MP2 (mid-term: n=39; final: n=38). explicitly questioned the relevance to their current studies and future activities. Finally, the course had almost 70 students enrolled, which is almost thrice to improve the general learning experience. Since the class size for which the pattern was applied so far. the mini-projects aim at building expert teams, i.e., And this class size was mentioned critical, especially teams that build a specific expertise to share with the “crowded class room”. other teams, it is important to see the students’ per- 4.1.3 Evaluation of the Course-integrated spective regarding this goal. The figure shows that, Mini-Projects finally, 20 out of 38 students think they have built a For the question MP1 (What is your general opinion respective expertise to share, yet, 11 students are indif- about the mini-projects?), 36 students mentioned that ferent. Compared to the numbers from the mid-term they like the mini-project approach, and two students evaluation, the data shows the students evaluating mentioned that they would prefer the classic lecture- their gained knowledge and experience better to the exercise model to the mini-project approach. Figure 8 end of the course. Another point of interest is the further shows that, eventually, five out of 38 students students’ perception of scientific work. Quite often, tend towards applying more classic teaching elements, students have little contact with scientific work until i.e., the classic lecture-exercise model (question MP3 ). the late stages of their studies, which makes scientific Yet, 11 students would prefer putting even more focus work somewhat abstract and hard to align with the on the mini-projects. students’ day-to-day work3 . Thus, it is of certain in- Figure 9 shows the absolute mentions for MP2 . The 3 In [26], we already mentioned that a strong focus on projects figure shows that the mini-projects are considered might influence the willingness of students to accept and apply valuable to improve understanding of concepts and methods/techniques not directly addressing the actual project goals. Bernd Bruegge, Stephan Krusche (Hrsg.): SEUH 2017 28 Marco Kuhrmann - Teaching Empirical Software Engineering Using Expert Teams terest to learn whether the “practical” scientific work with about 70 students enrolled. The students formed changed the students perception and if they see posi- 20 teams carrying out mini-projects, which are of the- tive impact for their later professional development. oretical, practical, or cross-cutting nature, and that Figure 9 shows 32 students (fully) agreeing that the addressed a variety of different empirical methods. course changed their view, and 24 also see impact on All practical projects comprised “real” research tasks their later career—both increased toward the course’s sponsored by internal or external researchers, and end. A statement from the free-text form (mid-term that come from either ongoing research or from com- evaluation) provides a good summary: “in my opin- pleted research that was considered worth replication. ion lecturing scientific method without working with An evaluation by the students shows the course and the methods it’s only knowing about the methods, not the work pattern considered appropriate. In partic- learning them.” ular, students valued the collaborative work on real research tasks. Yet, they were also concerned about Looking back, the cross-team the effectiveness of the work pattern, in particular, stu- 3 8 6 14 7 collaboration among the different project groups was good. dents are concerned about potentially missing insights to further methods. However, the initial evaluation Looking back, the team work within the mini-projects was good. 16 13 7 2 shows a generally positive attitude towards the expert team and mini-project approach. Looking back, the mini-projects contributed to my learning 13 21 31 However, since it was the first time that (i) this experience. course was run this way and (ii) the used teaching 0% 20% 40% 60% 80% 100% model [27] was yet not implemented at this scale, the Fully Agree Somewhat Agree Indifferent initial implementation revealed potential for improve- ments. For instance, the class size was considered Somewhat Disagree Fully Disagree critical, whereas the heterogeneity of the class needs to be considered, too. For future implementations, the Figure 10: Final evaluation of the mini-project ap- course should be limited to one study program only. proach by the students (n=38). This would allow for better tailoring the course for the respective audience. Due to the explorative nature of Finally, Figure 10 provides a reflection. The general the reported course instance, no teaching assistants perception of the mini-projects and the team work is were involved, which resulted in a dramatically high positive. However, the students considered the cross- workload. For future instances, teaching assistants team collaboration not optimal, which shows room should be involved to reduce the workload, e.g., to for improvement. Studying the feedbacks shows that speed up organization processes like topic sponsor some teams just “disappeared” and the other teams acquisition or research kit preparation. Furthermore, could not interact anymore, yet, this requires further the volume of the course contents needs adjustment. analysis of the evaluation data. Finally, several aspects await an in-depth analysis, e.g., analysis of the work load and the work distribu- 5 Conclusion tion. That is, is the topic selection and task assignment In this paper, we presented a course design to teach fair? Is the cross-team collaboration working as ex- empirical software engineering, which follows the pected? Future work will therefore focus on analyzing principle learn scientific work by doing scientific work. the communication within the course (based on ap- The concept presented aims at implementing such prox. 650 emails, more than 50 meetings in total, a course with larger classes, and, in order to man- paper and presentation reviews, and confidential writ- age a large number of students, utilizes expert teams ten evaluation of the cross-team collaboration by the to allow for specialization and fostering cross-team students). Also, an independent quality assurance collaboration. Expert teams are supposed to build of the students’ deliverables beyond the examination a specialization beyond the base knowledge and to (e.g., supplemental material like extra article sources, bring in this specialized knowledge in a cross-team or quality and completeness of research data) is an collaboration, e.g., a (theoretical) expertise on a spe- option to better understand the appropriateness of the cific method is used to consult practice teams that tasks and the suitability of the topic composition, and apply this method and, vice versa, practice teams that helps improving the course’s goal definitions. report experience to a theory team of how the method “feels” like in practice. Acknowledgement A reference implementation was run at the Univer- sity of Southern Denmark in the fall semester 2016 We owe special thanks to all the students actively participated in the course and who accepted the chal- At SDU’s engineering faculty, project-based learning is foundational lenge to be the “guinea pigs”, and that jumped into principle, which continuously puts students in project situations cold water and conducted real research. We also want and leaves little space to reflect on topics such as scientific meth- ods, since those do not directly and immediately contribute to the to thank the different topic sponsors, who shared their product development. research topics and (ongoing) work in this course. Bernd Bruegge, Stephan Krusche (Hrsg.): SEUH 2017 29 Marco Kuhrmann - Teaching Empirical Software Engineering Using Expert Teams References [13] D. M. Fernández and S. Wagner. Naming the [1] S. Ball, T. Emerson, J. Lewis, and J. T. pain in requirements engineering: A design for Swarthout. Classroom experiments. Avail- a global family of surveys and first results from able from http://serc.carleton.edu/sp/ germany. Inf. Softw. Technol., 57:616–643, 2015. library/experiments/index.html, 2012. [14] A. Fink. The Survey Handbook. Sage Publications [2] V. Basili, R. Selby, and D. Hutchens. Experi- Inc., 2 edition, 2002. mentation in software engineering. Trans. on Software Engineering, 12(7):733–743, 1986. [15] D. Fucci and B. Turhan. A replicated experiment on the effectiveness of test-first development. In [3] V. R. Basili. The experimental paradigm in soft- International Symposium on Empirical Software ware engineering. In Proceedings of the Interna- Engineering and Measurement, pages 103–112. tional Workshop on Experimental Software En- IEEE, Oct 2013. gineering Issues: Critical Assessment and Future Directions, volume 706 of LNCS, pages 3–12. [16] D. Fucci, B. Turhan, and M. Oivo. Impact of pro- Springer, 1993. cess conformance on the effects of test-driven development. In International Symposium on Em- [4] J. D. Blackburn, G. D. Scudder, and L. N. V. pirical Software Engineering and Measurement, Wassenhove. Improving speed and productivity pages 10:1–10:10. ACM, 2014. of software development: a global survey of soft- ware developers. Trans. on Software Engineering, [17] D. Fucci, B. Turhan, and M. Oivo. On the effects 22(12):875–885, Dec 1996. of programming and testing skills on external quality and productivity in a test-driven devel- [5] M. Broy and M. Kuhrmann. Projektorganisa- opment context. In International Conference on tion und Management im Software Engineering. Evaluation and Assessment in Software Engineer- Xpert.press. Springer, 2013. ing, pages 25:1–25:6. ACM, 2015. [6] B. Brügge, S. Krusche, and L. Alperowitz. Soft- ware engineering project courses with industrial [18] J. W. Jacobson, M. Kuhrmann, J. Münch, clients. Trans. Comput. Educ., 15(4):17:1–17:31, P. Diebold, and M. Felderer. On the role of soft- Dec. 2015. ware quality management in software process improvement. In International Conference on [7] R. O. Chaves, C. G. von Wangenheim, J. C. C. Product-Focused Software Process Improvement. Furtado, S. R. B. Oliveira, A. Santos, and E. L. Springer, Nov 2016. Favero. Experimental evaluation of a serious game for teaching software process modeling. [19] B. A. Kitchenham, D. Budgen, and P. Brereton. Trans. on Education, 58(4):289–296, Nov 2015. Evidence-Based Software Engineering and System- atic Reviews. CRC Press, 2015. [8] M. Ciolkowski, C. Differding, O. Laitenberger, and J. Münch. Empirical investigation of [20] B. A. Kitchenham and S. L. Pfleeger. Personal perspective-based reading: A replicated exper- Opinion Surveys, pages 63–92. Springer London, iment. Technical Report 13/97, International London, 2008. Software Engineering Research Network (IS- ERN), 1997. [21] M. Kuhrmann, P. Diebold, and J. Münch. Soft- ware process improvement: A systematic map- [9] D. S. Cruzes, N. B. Moe, and T. Dybå. Com- ping study on the state of the art. PeerJ Computer munication between developers and testers in Science, 2(1):1–38, 2016. distributed continuous agile testing. In Interna- tional Conference on Global Software Engineering, [22] M. Kuhrmann, P. Diebold, J. Münch, and P. Tell. pages 59–68. IEEE, Aug 2016. How does software process improvement ad- dress global software engineering? In Interna- [10] E. Dale. Audiovisual methods in teaching. Dryden tional Conference on Global Software Engineering, Press, 3 edition, 1969. pages 89–98. IEEE, Aug 2016. [11] K. Dikert, M. Paasivaara, and C. Lassenius. Chal- lenges and success factors for large-scale agile [23] M. Kuhrmann, D. M. Fernández, and A. Knapp. transformations. J. Syst. Softw., 119(C):87–108, Who cares about software process modelling? a Sept. 2016. first investigation about the perceived value of process engineering and process consumption. [12] J. Dillon. A Review of the Research on Practical In International Conference on Product-Focused Work in School Science. Technical report, King’s Software Process Improvement, volume 7983 of College, 2008. LNCS, pages 138–152. Springer, 2013. Bernd Bruegge, Stephan Krusche (Hrsg.): SEUH 2017 30 Marco Kuhrmann - Teaching Empirical Software Engineering Using Expert Teams [24] M. Kuhrmann, D. M. Fernández, and J. Münch. engineering. In International Conference on Eval- Teaching software process modeling. In Interna- uation and Assessment in Software Engineering, tional Conference on Software Engineering, pages pages 68–77. ACM, 2008. 1138–1147, 2013. [36] K. Petersen, S. Vakkalanka, and L. Kuzniarz. [25] M. Kuhrmann and J. Münch. Distributed soft- Guidelines for conducting systematic mapping ware development with one hand tied behind studies in software engineering: An update. Inf. the back: A course unit to experience the role Softw. Technol., 64:1–18, August 2015. of communication in gsd. In 1st Workshop on [37] P. Runeson. Using students as experiment Global Software Engineering Education (in con- subjects–an analysis on graduate and freshmen junction with ICGSE’2016). IEEE, 2016. student data. In International Conference on Em- [26] M. Kuhrmann and J. Münch. When teams go pirical Assessment in Software Engineering, pages crazy: An environment to experience group dy- 95–102, 2003. namics in software project management courses. [38] P. Runeson and M. Höst. Guidelines for con- In International Conference on Software Engineer- ducting and reporting Case Study Research in ing, ICSE, pages 412–421. ACM, May 2016. Software Engineering. Empirical Software Engi- neering, 14(2):131–164, 2009. [27] Kuhrmann, M. A practical approach to align research with master’s level courses. In Interna- [39] P. Runeson, M. Höst, A. Rainer, and B. Reg- tional Conference on Computational Science and nell. Case Study Research in Software Engineer- Engineering. IEEE, 2012. ing: Guidelines and Examples. John Wiley & Sons, 2012. [28] T. Kulvicius, M. Tamosiunaite, J. Ainge, P. Dud- chenko, and F. Wörgötter. Odor supported place [40] A. Sarma, X. Chen, S. Kuttal, L. Dabbish, and cell model and goal navigation in rodents. Jour- Z. Wang. Hiring in the global stage: Profiles nal of Computational Neuroscience, 25(3):481– of online contributions. In International Confer- 500, December 2008. ence on Global Software Engineering, pages 1–10. IEEE, Aug 2016. [29] J. Linåker, S. M. Sulaman, R. M. de Mello, and [41] M. Tamosiunaite, J. Ainge, T. Kulvicius, B. Porr, M. Höst. Guidelines for conducting surveys in P. Dudchenko, and F. Wörgötter. Path-finding software engineering. Technical report, Lund in real and simulated rats: assessing the in- University, January 2015. fluence of path characteristics on navigation [30] Münch, J., Pfahl, D., and Rus, I. Virtual software learning. Journal of Computational Neuroscience, engineering laboratories in support of trade-off 25(3):562–582, 2008. analyses. International Software Quality Journal, [42] G. Theocharis, M. Kuhrmann, J. Münch, and 13(4), 2005. P. Diebold. Is Water-Scrum-Fall reality? On the use of agile and traditional development [31] J. J. Nutaro. Building Software for Simulation: practices. In International Conference on Prod- Theory and Algorithms, with Applications in C++. uct Focused Software Development and Process Number ISBN: 978-0-470-41469-9. John Wiley Improvement, volume 9459 of LNCS, pages 149– & Sons, Ltd., 2010. 166. Springer, Dec 2015. [32] J. O’Keefe and N. Burgess. Geometric determi- [43] C. Wohlin. Empirical software engineering: nants of the place fields of hippocampal neurons. Teaching methods and conducting studies. In Nature, 381:425–428, May 1996. Proceedings of the International Workshop on Em- [33] J. Parker. Using laboratory experiments to teach pirical Software Engineering Issues: Critical As- introductory economics. Working paper, Reed Col- sessment and Future Directions, volume 4336 of lege, http://academic.reed.edu/economics/ LNCS, pages 135–142. Springer, 2007. parker/ExpBook95.pdf, accessed 2014-10-23. [44] C. Wohlin, P. Runeson, P. A. da Mota Sil- veira Neto, E. Engström, I. do Carmo Machado, [34] N. Paternoster, C. Giardino, M. Unterkalmsteiner, and E. S. de Almeida. On the reliability of map- T. Gorschek, and P. Abrahamsson. Software ping studies in software engineering. J. Syst. development in startup companies: A systematic Softw., 86(10):2594–2610, 2013. mapping study. Inf. Softw. Technol., 56(10):1200– 1218, Oct. 2014. [45] Wohlin, C., Runeson, P., Höst, M., Ohlsson, M. C., Regnell, B., and Wesslén, A. Experimentation [35] K. Petersen, R. Feldt, S. Mujtaba, and M. Matt- in Software Engineering. Springer, 2012. son. Systematic mapping studies in software Bernd Bruegge, Stephan Krusche (Hrsg.): SEUH 2017 31