=Paper=
{{Paper
|id=Vol-1905/recsys2017_poster8
|storemode=property
|title=Towards a Recommender System for Undergraduate Research
|pdfUrl=https://ceur-ws.org/Vol-1905/recsys2017_poster8.pdf
|volume=Vol-1905
|authors=Felipe Del Rio,Denis Parra,Jovan Kuzmicic,Erick Svec
|dblpUrl=https://dblp.org/rec/conf/recsys/del-RioPKS17
}}
==Towards a Recommender System for Undergraduate Research==
Towards a Recommender System for Undergraduate Research Felipe del Rio Denis Parra Pontificia Universidad Catolica de Chile Pontificia Universidad Catolica de Chile Santiago, Chile Santiago, Chile fidelrio@uc.cl dparra@ing.puc.cl Jovan Kuzmicic Erick Svec Pontificia Universidad Catolica de Chile Pontificia Universidad Catolica de Chile Santiago, Chile Santiago, Chile jpkuzmic@ing.puc.cl evsvec@ing.puc.cl ABSTRACT web platform. Herein, professors offer Research Opportunities to a Several studies indicate that attracting students to research careers general board where students can browse and apply to available requires to engage them from early undergraduate years. Follow- projects. In this way, students have access to research topics that ing this, the Engineering School at PUC Chile has developed an are new to them and work in different attractive areas. Although undergraduate research program that allows students to enroll in this platform promotes exchange of ideas, student engagement in research in exchange for course credits. Moreover, we developed a undergraduate research programs faces major challenges [4], and web portal to inform students about the program, but participation IPre is not an exception. In order to promote these programs, re- remains lower than expected. In order to promote student engage- cent literature has aimed to identify undergraduates’ motivation ment, we attempt to build a personalized recommender system of with research activities [2, 6]. In this line, we have detected lack of research opportunities to undergraduates. With this goal in mind knowledge about the IPre program and the available research op- we investigate two tasks. First, identifying students that are more portunities as a major factor, thus we herein propose a personalized willing to participate on this kind of program. A second task is approach to enroll students in undergraduate research. generating a personalized list of recommendations of research op- Objective and Tasks. In order to address the challenge of pro- portunities for each student. To evaluate our approach, we perform moting student engagement in our undergraduate research, and a simulated prediction experiment with data from our school, which considering the success of personalization for increasing user en- has more than 4,000 active undergraduate students nowadays. Re- gagement in several areas and communities, we decided to explore sults indicate that there is a big potential to create a personalized the potential of a recommender systems. In this work we study recommender system for this purpose. Our research can be used the feasibility of such system studying two tasks, using data col- as a baseline for colleges seeking strategies to encourage research lected from the current online IPre system over the last five years: activities within undergraduate students. (i) Identifying Students who would be likely to participate in the undergraduate research program, and (ii) recommending relevant KEYWORDS research opportunities to undergraduate Engineering students. Results and Contributions. Our results indicate that it is pos- Recommender Systems, Undergraduate Research sible to identify which students will be more likely to participate, ACM Reference format: with a precision up to 72.7%. Moreover, the task of recommending Felipe del Rio, Denis Parra, Jovan Kuzmicic, and Erick Svec. 2017. Towards a is indeed more challenging. We compared several methods and Recommender System for Undergraduate Research. In Proceedings of RecSys parameters and we were able to obtain a model which close to 2017 Posters, Como, Italy, August 27-31, 2 pages. MAP=0.2, but it requires further research to get to a more accu- 1 INTRODUCTION rate recommendation approach. Nonetheless, these results set an appropriate baseline to improve further our current IPre system. In a globalized world, academic institutions are compelled to offer rich learning experiences to their students, with a complex curricu- lum that include extra academic activities [1]. In order to address 2 DATASET & FEATURES this issue, the School of Engineering at PUC Chile established an We used a dataset from the IPre program over 2012-2016 period, undergraduate research program in 2011, known as IPre (in Span- representing applications of students to undergraduate research ish Investigación en Pregrado), which allows students to receive opportunities. The dataset comprises user profiles of 10, 546 under- course credits when joining a research project with faculty advice. graduate students of the Engineering School, among them 1, 134 The mission of the IPre program is to contribute to the academic students applied to 1, 017 available research opportunities. Students and professional development of engineering undergraduates by could apply to more than one opportunity, so we recorded 1, 624 enhancing their research skills [3]. applications in total, having 81.4% of the applications accepted. Context and Problem. Nowadays, the IPre program has an Task 1 was about predicting whether student ui applied to re- offer-demand system focused on student-faculty interaction on a search opportunities or not (1:applied, 0:did not apply). In this task we compared three feature sets: (a) Base: semesters enrolled, num- RecSys 2017 Poster Proceedings, August 27-31, Como, Italy ber of credits approved, (b) Base + ipre: features in (a) plus a boolean RecSys 2017 Poster Proceedings, August 27-31, Como, Italy del Rio et al. Table 1: Task 1, predict if student applies to opportunities. Accuracy Precision F-1 Score Baseline 10.9% 10.9% 0.20 LogReg 91.2% 62.4% 0.55 GBT 92.0% 72.7% 0.54 SVM 90.1% 67.4% 0.28 Base (GBT) 89.1% 25.0% 0.01 Base+ipre (GBT) 92.1% 71.7% 0,55 Base+ipre+gpa (GBT) 92.0% 72.7% 0.54 indicating previous applications to IPRE, and (c) Base + ipre + gpa: features in (b) plus GPA. For Task 2–predicting which research opportunities the students applied– we made recommendation as a classification task, i.e., pre- Figure 1: Task2 MAP by classifier using all features. dict whether student ui would apply to a research opportunity o j (1:positive, 0:negative). We used three feature sets: (a) Base: cosine similarity between research opportunity abstract and descriptions of courses approved, (b) Base + ht: features in (a) plus a boolean indicating that the student was taught by the faculty offering the opportunity, and (c) Base + ht + dept features in (b) plus the per- centage of courses approved taught by the same department as the faculty offering the opportunity (e.g. computer science). 3 EVALUATION METHODOLOGY & RESULTS All data before 2014 is used for training and everything afterwards for testing. In both tasks we test a baseline classifier, logistic re- gression (LogReg), gradient boosted trees (GBT) and support vector Figure 2: Task 2 MAP by feature sets using Log. Reg. machines (SVM). For task 1, predicting whether the user applies to opportunities or not, the dataset is highly unbalanced since 89.7% as seen in Figure 2. We observe that knowing if the student had a of the students do not apply to opportunities. We measure classifier class with the professor offering the research opportunity increases performance with accuracy, precision and F-1 score. As a baseline significantly the prediction compared to only matching content we use a model that predicts the most common class. description of courses and research opportunity. A smaller yet For task 2, predicting which opportunities a student actually important boost on the recommendation is also given by matching applied to, we classify several opportunities for each student and we department information in the model. rank them based on their prediction score. Then, we used the rank- 5 CONCLUSION ing metric Mean Average Precision (MAP) [5] to evaluate the per- In this work we showed feasibility of: (a) identifying students prone formance. The baseline method consisted on generating a random to apply to research opportunities, and (b) recommending research list of recommendations. In this task, we analyzed: recommendation opportunities for undergraduate students. There is still room for list size (k), feature sets and algorithm used. improvement by adding new features and other recommendation approaches (such as factorization machines or neural networks). We 4 RESULTS are currently conducting a user study to verify the generalizability Task 1: Predict if student applies to opportunities. Table 1 shows of our results. We expect to serve as a baseline for institutions the results in two groups: (a) comparing methods (using all fea- implementing these features in their academic systems. tures), and (b) comparing features (using the best method). Here REFERENCES we see that all methods (LogReg, GBT and SVM) outperform the [1] Karen W Bauer and Joan S Bennett. 2003. Alumni perceptions used to assess baseline in all metrics. The best methods though are GBT (accu- undergraduate research experience. The Journal of Higher Education 74, 2 (2003), racy=92%, precision=72.7%, F-1=0.54) and LogReg (accuracy=91.2%, 210–230. [2] John Aubrey Douglass and Chun-Mei Zhao. 2013. Undergraduate Research precision=62.4%, F-1=0.55). This result is very high considering the Engagement at Major US Research Universities. Research & Occasional Paper class imbalance. In terms of feature sets, the baseline (semesters Series: CSHE. 14.13. Center for Studies in Higher Education (2013). enrolled and number of credits approved) is boosted specially by [3] Joseph A Harsh, Adam V Maltese, and Robert H Tai. 2011. Undergraduate research experiences from a longitudinal perspective. Journal of College Science Teaching considering if the student previously applied to an IPre opportunity 41, 1 (2011), 84. in the past; i.e., most likely will apply again. [4] CA Merkel. 2001. Undergraduate research for six universities. Unpublished report Task 2: Recommending research opportunities. We analyze this for the Association of American Universities. Pasadena, CA: California Institute of Technology. Retrieved on 4, 15 (2001), 07. task in two stages. First, using all the features we compare methods, [5] Denis Parra and Shaghayegh Sahebi. 2013. Recommender systems: Sources of as seen in Figure 1. We found that all methods outperform a random knowledge and evaluation metrics. In Advanced Techniques in Web Intelligence-2. Springer, 149–175. baseline, but LogReg and GBT perform the best, getting to a MAP up [6] Kirsten Zimbardi and Paula Myatt. 2014. Embedding undergraduate research to 0.20. Our top method scored 14.6 times higher than the baseline experiences within the curriculum: a cross-disciplinary study of the key char- for k = 20 and closer to 10 times on a longer recommendation acteristics guiding implementation. Studies in Higher Education 39, 2 (2014), 233–250. list. Then, using LogReg method, we study different features set