=Paper= {{Paper |id=Vol-2104/paper_236 |storemode=property |title=Challenge-Based Learning in Computational Biology and Data Science |pdfUrl=https://ceur-ws.org/Vol-2104/paper_236.pdf |volume=Vol-2104 |authors=Emilio Serrano,Martin Molina,Daniel Manrique,Javier Bajo |dblpUrl=https://dblp.org/rec/conf/icteri/SerranoMMB18 }} ==Challenge-Based Learning in Computational Biology and Data Science== https://ceur-ws.org/Vol-2104/paper_236.pdf
       Challenge-based learning in Computational
               Biology and Data Science

         Emilio Serrano? , Martin Molina, Daniel Manrique?? , Javier Bajo
                {emilioserra,mmolina,dmanrique,jbajo}@fi.upm.es

     Department of Artificial Intelligence, Universidad Politécnica de Madrid, Spain



         Abstract. Data Science is an interdisciplinary field devoted to extract
         knowledge from large amounts of data. There is a great variety of pro-
         grams that address the teaching of this field with a growing demand of
         professionals. However, data science pedagogy tends to emphasize gen-
         eral aspects of data and the use of tools instead of the its scientific
         dimension. This position paper describes an ongoing educational inno-
         vation project for the use of the Challenge-based Learning approach to
         teach and learn Data Science. In this approach, students work on solv-
         ing complex and real world problems while the learning is obtained by
         iterating through three main phases: engage, investigate, and act.

         Keywords: Challenge-based learning, active learning, experiential learn-
         ing, project based learning, data science, computational biology.


1      Introduction

Data science (DS) is an interdisciplinary field devoted to identify patterns and ex-
tract knowledge by mining large amounts of structured and unstructured data.
Among others, DS includes: machine learning, data processing, statistical re-
search, and their related methods. This science has become a revolution that
has changed our manner of doing business, health, politics, education and in-
novation [11]. Scientific breakthroughs will be increasingly assisted by advanced
computing capabilities and DS methods that help researchers manipulate and
explore massive datasets [9].
    Challenge-based learning (CBL) is a new learning approach created by Apple
Inc. in collaboration with teachers and leaders in the education community. CBL
is “an engaging, multidisciplinary approach that starts with standards-based
content and lets students leverage the technology they use in their daily lives
to solve complex, real-world problems” [5]. In CBL, students work with other
students, their teachers, and experts in their communities and around the world
to develop deeper knowledge of the subjects they are studying.
    Data science is in a privileged position with respect to other branches of
knowledge to articulate learning through experiences and challenges [18]. The
?
     ORCID ID: 0000-0001-7587-0703
??
     ORCID ID: 0000-0002-0792-4156
Kaggle platform [2] periodically releases a series of competitions on real problems
such as “Predicting a Biological Response” [4]; which offered 20,000$ to the
best predictive model that linked a biological response of molecules to their
chemical properties. These public competitions have the potential to involve
actively the student in a real, significant, and related problematic situation;
including a framework for the implementation of a solution to the challenge.
    This position paper presents an ongoing educational innovation project fo-
cusing on using the CBL approach in a DS course, as part of a Computational
Biology master degree at the Technical University of Madrid (UPM). Students
will work on challenges at the level of a Kaggle competition with special pref-
erence for active and multidisciplinary problems. Based on the 2016 update for
the CBL framework proposed by Apple Inc. [12], students will learn by following
three main phases: engage, investigate, and act.
    The paper outline is as follows: after describing the background of the pre-
sented innovation project in section 2, the project details are given in section
3. These include the scope and students’ profile, the project goals, the time-
line and educational resources, the evaluation, resulting products, and diffusion
plan. Section 4 explains the expected contribution to the improvement of learn-
ing quality. Finally, section 5 concludes and presents future lines of research and
work.


2   Background
The great diversity of applications and the growing demand of experts in the DS
field has made courses, books and manuals in DS proliferate [18]. The standard
pedagogical method that we can appreciate in these courses consists of four
steps:

 1. The explanation of the different machine learning branches (supervised, un-
    supervised, and by reinforcement).
 2. The detail of some learning paradigms under some of these branches; such
    as decision trees or artificial neural networks.
 3. The illustration of these paradigms using toy datasets such as Weather or
    Iris [21].
 4. Assignments with a straightforward application of the ideas previously ex-
    posed using some DS framework such as Weka [7] or Caret [8].

    We executed an educational innovation project last year, where the limi-
tations of this standard pedagogical approach were revealed [17]. Instead, an
experiential learning (EL) method was successfully adopted in a Deep Learning
course, included in the Master in Data Science (EIT Digital Master School),
offered at UPM.
    EL brings real life experiences into the classroom which must be integrated
with the goals and objectives of the discipline theory [13]. The students reflecting
on their product is a fundamental part of EL [20]. Different learning approaches
based on the Kolb cycle [10] were proposed, applied, and evaluated in the deep
learning course within the frame of the educational innovation project. According
to this cycle, effective learning involves: having a concrete experience, observation
of and reflection on that experience, the formation of abstract concepts (analysis)
and generalizations (conclusions), and testing them by active experimentation,
resulting in new experiences (iterations in the cycle).
    Some of the results of this previous project [17] were presented in a position
paper for an international conference [18], a Spanish conference paper [19], and
software tool to complement JupyterHub for Teaching 1 .


3     Project details
This section explains the ongoing project details [16] to allow interested profes-
sors to extrapolate our case to their specific environment. In this new project,
the CBL approach is studied in a new course with different students’ profiles.

3.1    Scope and students’ profile
The project will be developed with students attending the master in Compu-
tational Biology, offered by the Computer Science School at the UPM. More
specifically, in the module of “Knowledge representation and acquisition”.
    The profile of these Computational Biology students is multidisciplinary, be-
longing part of them to the world of biology and part of them to branches of
information technology. Data science and CBL provide an exceptional framework
to establish synergies between these two main profiles, e.g. applying computer
science techniques to biology or vice-versa. In this vein, one of the requisites of
the challenges will be to include both profiles in each work group.
    Students will also have the opportunity to apply the knowledge acquired from
other master’s modules such as “Statistical Analysis and Data Visualization” or
“Machine Learning” to real world problems. The results of this project will be
directly applicable to other modules, whether of this master degree or of the
master in Data Science (EIT Digital Master School)2 , and also in courses of the
Bachelor Degree in Computer Engineering3 such as “Data Mining”.

3.2    Goals
The goals of the presented project are the following:

 – G1. Development of methods for CBL in Data Science and Computational
   Biology. This goal includes the instantiation of methodologies and general
   frameworks of CBL to the specific field to be treated. Among these frame-
   works, we can point out: the 2016 update for the CBL framework proposed
   by Apple Inc [12]; and the “Challenge Based Learning” report carried out
1
  https://jupyterhub-deploy-teaching.readthedocs.io
2
  http://www.fi.upm.es/?id=masterdatascience
3
  https://www.fi.upm.es/?id=gradoingenieriainformatica&idioma=english
                Fig. 1: CBL framework proposed by Apple Inc.


  by the Monterrey Institute of Technology and Higher Education in 2016 [6].
  The phases of the Apple Inc framework are depicted in Fig. 1.
– G2. Study of specific challenges in the field of Data Science and Compu-
  tational Biology. Application of the explored, extended, and instantiated
  methods in O1 to Data Science. This goal addresses the selection of concrete
  challenges and for a specific student profile. The students will work on a
  challenge at the Kaggle competition level with special preference for active
  and multidisciplinary problems. Examples of past competitions that fit the
  students’ profile are “Predicting a Biological Response”, “Merck Molecular
  Activity Challenge”, “Shelter Animal Outcomes”, “Leaf Classification”, or
  “Zoo Animal Classification”.
– G3. Integration and documentation of tools for the support of CBL in the
  course of ”Representation and Acquisition of Knowledge”. This goal includes
  the analysis and documentation of tools available to support the CBL in this
  specific module. Although Kaggle has a working environment, Kaggle Ker-
  nels4 , this framework will be combined with other tools, preferably free and
  open source, that meet the needs of the CBL. Among others and according
  to the 2016 update for the ABR framework proposed by Apple Inc. [12],
4
    https://www.kaggle.com/kernels
      students will need: a calendar, space for collaboration, and storage of docu-
      ments. Project management tools such as Trello5 and Asana6 as well as free
      alternatives will be considered.

3.3     Timeline and educational resources
The following three major tasks will be addressed with a clear correspondence
to the three goals explained above:

 – T1. Developing of methods for CBL in Data Science and Computational
   Biology.
 – T2. Analyzing and selecting specific challenges in the field of Data Science
   and Computational Biology.
 – T3. Integrating and documenting tools for the support of CBL in the Rep-
   resentation and Acquisition of Knowledge.

    The project kicks-off on February 15, 2018 and an ends on November 15,
2018, giving 9 months of project numbered from 1 to 9. In an iterative and
incremental approach such as the Scrum methodology [15, 14], the following
timeline is proposed in Fig. 2.




    Fig. 2: Timeline of the project with an iterative and incremental approach.


    As shown in Fig. 2, three iterations are considered for each task. This allows
the tasks results to feed back to previous tasks. Moreover, there is an overlap
into the different tasks as expected in an iterative and incremental approach.
    The following educational resources will be used:

 – Scientific repositories available in the UPM as ScienceDirect7 .
 – The Institutional Teaching Platform of the UPM (Moodle)8 .
 – Data repositories and contests websites in Data Science such as Kaggle.
 – Resources of the Department of Artificial Intelligence as web servers.
5
  https://trello.com/
6
  https://asana.com
7
  www.sciencedirect.com
8
  moodle.upm.es
3.4    Evaluation

The evaluation of the project will be carried out through the generation of an
e-portfolio by the students attending the module under study. The e-portfolio
is a digital collection of evidence which includes: demonstrations, resources, and
achievements obtained by students. According to the CBL framework proposed
by Apple Inc [12], the following sections will be evaluated:

 – Report of the great ideas to investigate.
 – The proposal of the challenge, the essential question to answer and the mo-
   tivation about the significance of the challenge.
 – Guiding issues, questions that will guide the search for a solution.
 – Learning plan and schedule.
 – Research report, in Jupyter IPython notebook format9 or alternative to en-
   sure the reproducibility and repeatability of the results achieved.
 – Proposed solution, presentation including prototypes, concepts, and expert
   feedback.
 – Implementation and evaluation plans.
 – Evaluation results.
 – Final presentations.
 – Journals with personal and group experience.
 – Final reflections on what was learned.


3.5    Resulting products

This section describes the tangible products resulting from the project (method-
ological guides, reports, educational resources, etcetera) with a description of
their potential for internal and external transfer. The following deliverables will
be elaborated:

 – D1. Report on methods for CBL in Computational Biology and Data Science.
 – D2. Report on the analysis and selection of appropriate challenges for the
   learning of Computational Biology and Data Science.
 – D3. Manual of tools for the support to the CBL in the Representation and
   Acquisition of Knowledge.
 – D4. Report on the evaluation of results based on e-portfolios created by
   students.
 – D5. Journal and conference papers for the dissemination of results.

   As described in section 3.1 these deliverables have an internal transfer in the
UPM to other modules of the master for which it is proposed, other masters, and
other degrees. These products can also be a competitive advantage in the orga-
nization of massive open online courses (MOOCs) by presenting a pedagogical
prescriptions.
9
    jupyter.org
3.6   Diffusion plan
The main diffusion materials generated in the project will be the deliverables 3,
4 and 5 explained in section 3.5. There will also be contemplated:
 – the construction of a website that collects all the deliverables,
 – news for diffusion at the UPM,
 – microblogging posts (Twitter) in the department and the school,
 – radio interviews to disseminate educational innovation.


4     Contribution to the improvement of teaching quality
Thanks to the application of the CBL approach, a significant improvement of
learning and teaching quality is expected. Among others, this improvement will
be reflected in:
 1. A deeper understanding of Data Science for Computational Biology, allowing
    to diagnose and analyze problems before proposing solutions.
 2. A greater commitment to involve the student both in the definition of the
    Data Science problem to be addressed and in the solution that will be de-
    veloped to solve it.
 3. Development of skills to investigate, create models, materialize them, and
    work collaboratively and multidisciplinary.
 4. A closer approach to the reality of their profession, establishing relationships
    with specialists in the Kaggle platform that contribute to their professional
    growth.
 5. Strengthening the connection between what they learn in the Master’s and
    what they perceive in the professional world.
 6. Development of high-level communication skills, through the use of social
    tools such as Kaggle forums and media production techniques, to create and
    share the solutions developed by them.
    Moreover, this teaching innovation project [16] will be aligned with the results
obtained from our previous project using EL in Data Science [17, 18]. Therefore,
the same benefits obtained in the Deep Learning course are expected in the
“Knowledge representation and acquisition” module. Among others, the student
will:
 1. learn to select relevant information about how learning paradigms work and
    the information they offer, instead of considering them as black boxes where
    the model built has no relevance and only quality metrics are studied;
 2. learn to study the details and data of the concrete problem and to obtain
    good understanding of the data;
 3. perceive the iterative nature of DS, by building different prediction models
    considering the data and results from previous models;
 4. and, research on new methods and their extension or variation for new and
    challenging problems, instead of just applying well-known solutions to well-
    known problems.
5   Conclusion and future works

This position paper has presented an ongoing educational research project to
use the challenge-based learning approach for DS in the context of a master
in Computational Biology. The paper has revised the background, including a
number of shortcomings repeated in current DS courses, and the results obtained
in a previous project, based on considering experiential learning for DS. The
ongoing project details have been presented: the scope and students’ profiles,
goals, timeline, teaching resources, evaluation, resulting products, and diffusion
plan. The project details allow interested professors to extrapolate our case to
their specific environment and audience.
    The contribution to the improvement of the teaching quality of the project
has also been explored, highlighting a deeper understanding of Data Science for
Computational Biology. This allows students to diagnose and analyze problems
before proposing solutions. DS is not only about data and tools to manage
them as classic DS courses may suggest. DS is more about “science” and the
scientific questions we can answer with data. Therefore, a major advantage in
using CBL for DS is that teachers do not present the answers before students
ask the scientific questions by themselves.
    Our main future works include to create a survey on the acceptance and
quality of challenges proposed, to study new theoretical frameworks for applying
CBL in DS, and the exploration (or implementation) of software tools to develop
e-portfolios. Moreover, two students have been hired to assist in the search and
development of specific challenges for Computational Biology.


Acknowledgments

This research work is supported by the Universidad Politécnica de Madrid under
the education innovation project “Aprendizaje basado en retos para la Biologı́a
Computacional y la Ciencia de Datos”, code IE1718.1003; and by the the Spanish
Ministry of Economy, Indystry and Competitiveness under the R&D project
Datos 4.0: Retos y soluciones (TIN2016-78011-C4-4-R, AEI/FEDER, UE).


References
 1. Data Science Specialization, Johns Hopkins University. https://www.coursera.
    org/specializations/jhu-data-science. Accessed: March of 2018.
 2. Kaggle: Academic Machine Learning Competitions. https://inclass.kaggle.
    com/. Accessed: May of 2017.
 3. Machine Learning MOOC, Stanford University. https://www.coursera.org/
    learn/machine-learning. Accessed: March of 2018.
 4. Predicting a Biological Response. https://www.kaggle.com/c/bioresponse. Ac-
    cessed: March of 2018.
 5. Challenge Based Learning. A Classroom Guide. goo.gl/vAwsg8, 2011. Accessed:
    March of 2018.
 6. Edu Trends. Aprendizaje Basado en Retos. goo.gl/dA3ux8, 2016. Accessed: March
    of 2018.
 7. E. Frank, M. A. Hall, G. Holmes, R. Kirkby, and B. Pfahringer. Weka - a machine
    learning workbench for data mining. In O. Maimon and L. Rokach, editors, The
    Data Mining and Knowledge Discovery Handbook, pages 1305–1314. Springer, 2005.
 8. M. K. C. from Jed Wing, S. Weston, A. Williams, C. Keefer, and A. Engelhardt.
    caret: Classification and Regression Training, 2012. R package version 5.15-044.
 9. A. Hey, S. Tansley, and K. Tolle. The Fourth Paradigm: Data-intensive Scientific
    Discovery. Microsoft Research, 2009.
10. D. A. Kolb. Experiential Learning: Experience as the Source of Learning and
    Development. Prentice Hall, 1 edition, Oct. 1983.
11. V. Mayer-Schonberger and K. Cukier. Big Data: A Revolution That Will Transform
    How We Live, Work, and Think. Houghton Mifflin Harcourt, Boston, 2013.
12. M. Nichols, K. Cator, and M. Torres. Challenge Based Learner User Guide. Red-
    wood City, CA: Digital Promise, 2016.
13. Qualters and C. Wehlburg. Experiential Education: Making the Most of Learning
    Outside the Classroom: New Directions for Teaching and Learning, Number 124.
    J-B TL Single Issue Teaching and Learning. Wiley, 2010.
14. A. R. Santos, A. Sales, P. Fernandes, and M. Nichols. Combining challenge-based
    learning and scrum framework for mobile application development. In Proceedings
    of the 2015 ACM Conference on Innovation and Technology in Computer Science
    Education, ITiCSE ’15, pages 189–194, New York, NY, USA, 2015. ACM.
15. K. Schwaber and M. Beedle. Agile Software Development with Scrum. Prentice
    Hall PTR, Upper Saddle River, NJ, USA, 1st edition, 2001.
16. E. Serrano, D. Manrique, J. Bajo, and M. Molina. Aprendizaje basado en retos
    para la Biologı́a Computacional y la Ciencia de Datos. https://goo.gl/d7ZwZt.
    Accessed: March of 2018.
17. E. Serrano, M. Molina, D. Manrique, and L. Baumela. Métodos, experiencias
    y herramientas para el aprendizaje experiencial de la Ciencia de Datos. https:
    //goo.gl/Yy7XeT. Accessed: March of 2018.
18. E. Serrano, M. Molina, D. Manrique, and L. Baumela. Experiential learning in data
    science: From the dataset repository to the platform of experiences. In C. Analide
    and P. Kim, editors, Intelligent Environments 2017 - Workshop Proceedings of the
    13th International Conference on Intelligent Environments, Seoul, Korea, August
    2017, volume 22 of Ambient Intelligence and Smart Environments, pages 122–130.
    IOS Press, 2017.
19. E. Serrano, M. Molina, D. Manrique, L. Baumela, and D. Zanardini. Aprendizaje
    experiencial en ciencia de datos: satisfacción de los estudiantes para tres modelos de
    enseñanza y aprendizaje [Experiential learning in data science: student satisfaction
    for three models of teaching and learning]. 2017.
20. M. Silberman. The Handbook of Experiential Learning. Wiley, 2007.
21. I. H. Witten, E. Frank, and M. A. Hall. Data mining: practical machine learning
    tools and techniques. Morgan Kaufmann, Burlington, MA, 2011.