=Paper=
{{Paper
|id=Vol-1905/recsys2017_poster8
|storemode=property
|title=Towards a Recommender System for Undergraduate Research
|pdfUrl=https://ceur-ws.org/Vol-1905/recsys2017_poster8.pdf
|volume=Vol-1905
|authors=Felipe Del Rio,Denis Parra,Jovan Kuzmicic,Erick Svec
|dblpUrl=https://dblp.org/rec/conf/recsys/del-RioPKS17
}}
==Towards a Recommender System for Undergraduate Research==
<pdf width="1500px">https://ceur-ws.org/Vol-1905/recsys2017_poster8.pdf</pdf>
<pre>
    Towards a Recommender System for Undergraduate Research
                                  Felipe del Rio                                                       Denis Parra
                 Pontificia Universidad Catolica de Chile                               Pontificia Universidad Catolica de Chile
                             Santiago, Chile                                                        Santiago, Chile
                              fidelrio@uc.cl                                                       dparra@ing.puc.cl

                                Jovan Kuzmicic                                                          Erick Svec
                 Pontificia Universidad Catolica de Chile                               Pontificia Universidad Catolica de Chile
                              Santiago, Chile                                                       Santiago, Chile
                           jpkuzmic@ing.puc.cl                                                     evsvec@ing.puc.cl

ABSTRACT                                                                       web platform. Herein, professors offer Research Opportunities to a
Several studies indicate that attracting students to research careers          general board where students can browse and apply to available
requires to engage them from early undergraduate years. Follow-                projects. In this way, students have access to research topics that
ing this, the Engineering School at PUC Chile has developed an                 are new to them and work in different attractive areas. Although
undergraduate research program that allows students to enroll in               this platform promotes exchange of ideas, student engagement in
research in exchange for course credits. Moreover, we developed a              undergraduate research programs faces major challenges [4], and
web portal to inform students about the program, but participation             IPre is not an exception. In order to promote these programs, re-
remains lower than expected. In order to promote student engage-               cent literature has aimed to identify undergraduates’ motivation
ment, we attempt to build a personalized recommender system of                 with research activities [2, 6]. In this line, we have detected lack of
research opportunities to undergraduates. With this goal in mind               knowledge about the IPre program and the available research op-
we investigate two tasks. First, identifying students that are more            portunities as a major factor, thus we herein propose a personalized
willing to participate on this kind of program. A second task is               approach to enroll students in undergraduate research.
generating a personalized list of recommendations of research op-                  Objective and Tasks. In order to address the challenge of pro-
portunities for each student. To evaluate our approach, we perform             moting student engagement in our undergraduate research, and
a simulated prediction experiment with data from our school, which             considering the success of personalization for increasing user en-
has more than 4,000 active undergraduate students nowadays. Re-                gagement in several areas and communities, we decided to explore
sults indicate that there is a big potential to create a personalized          the potential of a recommender systems. In this work we study
recommender system for this purpose. Our research can be used                  the feasibility of such system studying two tasks, using data col-
as a baseline for colleges seeking strategies to encourage research            lected from the current online IPre system over the last five years:
activities within undergraduate students.                                      (i) Identifying Students who would be likely to participate in the
                                                                               undergraduate research program, and (ii) recommending relevant
KEYWORDS                                                                       research opportunities to undergraduate Engineering students.
                                                                                   Results and Contributions. Our results indicate that it is pos-
Recommender Systems, Undergraduate Research
                                                                               sible to identify which students will be more likely to participate,
ACM Reference format:                                                          with a precision up to 72.7%. Moreover, the task of recommending
Felipe del Rio, Denis Parra, Jovan Kuzmicic, and Erick Svec. 2017. Towards a   is indeed more challenging. We compared several methods and
Recommender System for Undergraduate Research. In Proceedings of RecSys        parameters and we were able to obtain a model which close to
2017 Posters, Como, Italy, August 27-31, 2 pages.
                                                                               MAP=0.2, but it requires further research to get to a more accu-
1    INTRODUCTION                                                              rate recommendation approach. Nonetheless, these results set an
                                                                               appropriate baseline to improve further our current IPre system.
In a globalized world, academic institutions are compelled to offer
rich learning experiences to their students, with a complex curricu-
lum that include extra academic activities [1]. In order to address            2    DATASET & FEATURES
this issue, the School of Engineering at PUC Chile established an              We used a dataset from the IPre program over 2012-2016 period,
undergraduate research program in 2011, known as IPre (in Span-                representing applications of students to undergraduate research
ish Investigación en Pregrado), which allows students to receive               opportunities. The dataset comprises user profiles of 10, 546 under-
course credits when joining a research project with faculty advice.            graduate students of the Engineering School, among them 1, 134
The mission of the IPre program is to contribute to the academic               students applied to 1, 017 available research opportunities. Students
and professional development of engineering undergraduates by                  could apply to more than one opportunity, so we recorded 1, 624
enhancing their research skills [3].                                           applications in total, having 81.4% of the applications accepted.
   Context and Problem. Nowadays, the IPre program has an                         Task 1 was about predicting whether student ui applied to re-
offer-demand system focused on student-faculty interaction on a                search opportunities or not (1:applied, 0:did not apply). In this task
                                                                               we compared three feature sets: (a) Base: semesters enrolled, num-
RecSys 2017 Poster Proceedings, August 27-31, Como, Italy                      ber of credits approved, (b) Base + ipre: features in (a) plus a boolean
RecSys 2017 Poster Proceedings, August 27-31, Como, Italy                                                                                     del Rio et al.


Table 1: Task 1, predict if student applies to opportunities.
                                Accuracy   Precision   F-1 Score
          Baseline               10.9%       10.9%        0.20
          LogReg                 91.2%       62.4%       0.55
          GBT                    92.0%      72.7%         0.54
          SVM                    90.1%       67.4%        0.28
          Base (GBT)             89.1%      25.0%        0.01
          Base+ipre (GBT)        92.1%      71.7%        0,55
          Base+ipre+gpa (GBT)    92.0%      72.7%        0.54

indicating previous applications to IPRE, and (c) Base + ipre + gpa:
features in (b) plus GPA.
   For Task 2–predicting which research opportunities the students
applied– we made recommendation as a classification task, i.e., pre-         Figure 1: Task2 MAP by classifier using all features.
dict whether student ui would apply to a research opportunity o j
(1:positive, 0:negative). We used three feature sets: (a) Base: cosine
similarity between research opportunity abstract and descriptions
of courses approved, (b) Base + ht: features in (a) plus a boolean
indicating that the student was taught by the faculty offering the
opportunity, and (c) Base + ht + dept features in (b) plus the per-
centage of courses approved taught by the same department as the
faculty offering the opportunity (e.g. computer science).
3   EVALUATION METHODOLOGY & RESULTS
All data before 2014 is used for training and everything afterwards
for testing. In both tasks we test a baseline classifier, logistic re-
gression (LogReg), gradient boosted trees (GBT) and support vector
                                                                             Figure 2: Task 2 MAP by feature sets using Log. Reg.
machines (SVM). For task 1, predicting whether the user applies to
opportunities or not, the dataset is highly unbalanced since 89.7%       as seen in Figure 2. We observe that knowing if the student had a
of the students do not apply to opportunities. We measure classifier     class with the professor offering the research opportunity increases
performance with accuracy, precision and F-1 score. As a baseline        significantly the prediction compared to only matching content
we use a model that predicts the most common class.                      description of courses and research opportunity. A smaller yet
    For task 2, predicting which opportunities a student actually        important boost on the recommendation is also given by matching
applied to, we classify several opportunities for each student and we    department information in the model.
rank them based on their prediction score. Then, we used the rank-       5   CONCLUSION
ing metric Mean Average Precision (MAP) [5] to evaluate the per-         In this work we showed feasibility of: (a) identifying students prone
formance. The baseline method consisted on generating a random           to apply to research opportunities, and (b) recommending research
list of recommendations. In this task, we analyzed: recommendation       opportunities for undergraduate students. There is still room for
list size (k), feature sets and algorithm used.                          improvement by adding new features and other recommendation
                                                                         approaches (such as factorization machines or neural networks). We
4   RESULTS
                                                                         are currently conducting a user study to verify the generalizability
Task 1: Predict if student applies to opportunities. Table 1 shows       of our results. We expect to serve as a baseline for institutions
the results in two groups: (a) comparing methods (using all fea-         implementing these features in their academic systems.
tures), and (b) comparing features (using the best method). Here
                                                                         REFERENCES
we see that all methods (LogReg, GBT and SVM) outperform the
                                                                         [1] Karen W Bauer and Joan S Bennett. 2003. Alumni perceptions used to assess
baseline in all metrics. The best methods though are GBT (accu-              undergraduate research experience. The Journal of Higher Education 74, 2 (2003),
racy=92%, precision=72.7%, F-1=0.54) and LogReg (accuracy=91.2%,             210–230.
                                                                         [2] John Aubrey Douglass and Chun-Mei Zhao. 2013. Undergraduate Research
precision=62.4%, F-1=0.55). This result is very high considering the         Engagement at Major US Research Universities. Research & Occasional Paper
class imbalance. In terms of feature sets, the baseline (semesters           Series: CSHE. 14.13. Center for Studies in Higher Education (2013).
enrolled and number of credits approved) is boosted specially by         [3] Joseph A Harsh, Adam V Maltese, and Robert H Tai. 2011. Undergraduate research
                                                                             experiences from a longitudinal perspective. Journal of College Science Teaching
considering if the student previously applied to an IPre opportunity         41, 1 (2011), 84.
in the past; i.e., most likely will apply again.                         [4] CA Merkel. 2001. Undergraduate research for six universities. Unpublished report
    Task 2: Recommending research opportunities. We analyze this             for the Association of American Universities. Pasadena, CA: California Institute of
                                                                             Technology. Retrieved on 4, 15 (2001), 07.
task in two stages. First, using all the features we compare methods,    [5] Denis Parra and Shaghayegh Sahebi. 2013. Recommender systems: Sources of
as seen in Figure 1. We found that all methods outperform a random           knowledge and evaluation metrics. In Advanced Techniques in Web Intelligence-2.
                                                                             Springer, 149–175.
baseline, but LogReg and GBT perform the best, getting to a MAP up       [6] Kirsten Zimbardi and Paula Myatt. 2014. Embedding undergraduate research
to 0.20. Our top method scored 14.6 times higher than the baseline           experiences within the curriculum: a cross-disciplinary study of the key char-
for k = 20 and closer to 10 times on a longer recommendation                 acteristics guiding implementation. Studies in Higher Education 39, 2 (2014),
                                                                             233–250.
list. Then, using LogReg method, we study different features set

</pre>