Segmenting Student Answers to Textual Exercises Based on Topic Modeling Jan Philip Bernius Anna Kovaleva Bernd Bruegge Department of Informatics Department of Informatics Department of Informatics Technical University of Munich Technical University of Munich Technical University of Munich Munich, Germany Munich, Germany Munich, Germany janphilip.bernius@tum.de anna.kovaleva@tum.de bruegge@in.tum.de Abstract—Giving feedback when grading textual exercises in Multiple graders require means to create consistent feedback very large courses is a challenge, especially when instructors for learners. want to provide consistent feedback to each student in real-time This paper outlines a segmentation algorithm to be applied already during the lecture. This paper outlines a real-time assessment approach based on to student answers to textual exercises. It is intended to be topic modeling and reuse. Segmenting student answers fosters a used as part of an assessment system for textual exercises, structured form of feedback, improving the feedbacks’ reusabil- fostering reuse of feedback between students and increasing ity. We present the design of an answer segmentation system, to consistency between assessments [7]. be integrated with an assessment system for textual exercises. The resulting system aims at quicker and more consistent feedback II. S EGMENTING S TUDENT A NSWER for textual exercises and an improved learning experience for students. We abstracted the topic modeling approach and preserve the idea that every answer is a collection of topics, and I. I NTRODUCTION many topics are distributed among different answers [8]. We With a growing number of students enrolled at universi- compensate for the scarcity of the words in the answers by ties worldwide,1 large courses have thousands of students reducing topics to keywords. Another strategy adapted from participating. Large courses pose a problem for instructors other works is ”vocabulary introduction” [9]. As soon as new when grading textual exercises. The main problem is the keywords are introduced, a new segment begins. The presented asynchronous assessment, which usually requires a week of approach differs from thesaurus or ontology in a way that we time, or even longer. To reduce this delay, we teach interactive do not know what the keywords are going to be, and they are lectures where we combine theory and exercises live during the calculated for every problem separately. lectures, grade them immediately and provide quick feedback The algorithm can be separated into three phases: Text Pre- to students [1]. This increases student comprehension and processing, Keyword Extraction and Segmentation. Figure 1 deepens understanding. depicts the algorithm’s flow of events, which is described in Technology to foster interaction and discussion within large detail in the following sections. Segments can be used as a courses does exist [2, 3], as well as scalable exercise sys- baseline for providing manual structured instructor feedback, tems for programming and modeling exercises with automatic or as a unit for assessment systems to generate feedback assessment [4, 5]. Textual exercises are commonly used in automatically [7]. examination, but no automatic assessment solution is available on the market for this exercise type. A. Text Preprocessing Conducting open answer questions requires time-consuming Student answers are of inconsistent quality in regards to activities from instructors, including designing exercises and spelling, formatting and use of punctuation. Poor data qual- manual assessment, due to the high variability in student ity impacts the segmentation quality negatively. Due to the answers. To reduce efforts, instructors tend to reuse exer- nature of the system, manual preprocessing is not practical. cises from previous years. Grading is a repeatable process, Student submissions must not be modified, as feedback should instructors look for common mistakes or predefined solution be based on the original answer only. We correct common patterns. The students’ learning success benefits from detailed irregularities to an intermediate format suitable for further and personalized feedback [6]. To enable large scale courses, calculations. the need to reuse feedback comments arises. Individual feed- Removing stop words from text is a very common way back can still rely on the domain expertise of the teacher. to clean textual data for Natural Language Processing (NLP) tasks [9, 10]. Words like ”I”, ”the”, ”what” and ”did” do not 1 United Nations, ”UN Global Assessment on Higher Education Reveals contain much lexical content and can be removed. Broad Socio-Economic, Gender Disparities,” https://news.un.org/en/story/ 2017/04/555642-un-global-assessment-higher-education-reveals-broad- Lemmatization is the process of reducing a word to its socio-economic-gender, 2017. meaningful root. Naturally, students use different forms of a S. Krusche, S. Wagner (Hrsg.): SEUH 2020 72 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Answer Remove Convert to Remove stop Lemmatize all Extract Keyword Answer Answer Keyword Keyword Stem keywords punctuation lower-case words words keywords [ same keywords found ] Sentence Sentence Sentence Merge text Search for Segment TextBlock TextBlock TextBlock blocks between stemmed answers into topic shifts keywords Atomic Text text blocks TopicShift Segment [ new keywords found ] Fig. 1. The segmentation algorithms flow of events depicted using a UML activity diagram. word: either singular or plural, different tenses, degrees of The algorithm produces topically coherent segments. Seg- comparison, etc. The result of the text preprocessing is a set ments allow for more structured assessment approaches, sim- of lemmatized lower-case words without any punctuation or ilar to how modeling exercises can be assessed today. This stop words. enables use of semi-automated assessment systems to be used in the assessment process, reducing the delay between B. Keyword Extraction exercise and feedback. Further, tools can help to keep feedback consistent between students, as comparisons can be made We generalize the idea of topic modeling that claims that between segments. every student’s submission is a collection of topics that are The result of the algorithm’s application can be improved in common among different answers. Compensating for data two areas: (1) deriving keywords and text blocks using statisti- scarcity, we reduce each topic to a single keyword. cal models, topic models, or decision trees. (2) Additionally, a The resulting keywords are the ten most frequently used thesaurus could be used to recognize synonyms. Future work words in the texts. The number was chosen empirically based is needed to evaluate this algorithm in a lecture setting. on our data. R EFERENCES C. Segmentation [1] S. Krusche, A. Seitz, J. Börstler, and B. Bruegge, “Interactive learning: Increasing student participation through shorter exercise cycles,” in 19th The segmentation of the texts is split up into two steps: Australasian Computing Education Conference. ACM, 2017, pp. 17– First, the answers are split up into initial text blocks. Second, 26. [2] J. Knobloch and E. Gigantiello, “AMATI: Another massive audience adjacent text blocks are considered and merged if there are teaching instrument,” in 15. Workshop für Software Engineering im no new keywords introduced. The result of this is a set of Unterricht der Hochschulen, 2017, pp. 63–68. segments for each answer. [3] R. Mayer, A. Stull, K. DeLeeuw, K. Almeroth, B. Bimber, D. Chun, M. Bulger, J. Campbell, A. Knight, and H. Zhang, “Clickers in college For identifying sentences we use a pre-trained model of classrooms: Fostering learning with questioning methods in large lecture the ”punkt tokenizer” [11, 12] and a custom implementation classes,” Contemporary Educational Psychology, vol. 34, pp. 51–57, for bulleted lists. To identify clauses we rely on conjunctions. 2009. [4] S. Krusche and A. Seitz, “Artemis: An automatic assessment manage- This is an incomplete clause identification approach, however ment system for interactive learning,” in 49th ACM Technical Symposium sufficient for this use case. We consider that subordinating on Computer Science Education. ACM, 2018, pp. 284–289. conjunctions indicate a new clause, only considering sentences [5] S. Krusche and A. Seitz, “Increasing the Interactivity in Software Engineering MOOCs - A Case Study,” in 52nd Hawaii International that are longer than 20 words to reduce false positives. Conference on System Sciences, 2019, pp. 1–10. We use a stemmer to unify different forms of a word in the [6] A. Poulos and M. J. Mahony, “Effectiveness of feedback: the students’ text. Based on lexical cohesion and vocabulary introduction perspective,” Assessment & Evaluation in Higher Education, vol. 33, no. 2, pp. 143–154, 2008. [9, 13], we define segments. Within each student answer, the [7] J. P. Bernius and B. Bruegge, “Towards the Automatic Assessment of extracted keywords are compared for adjacent segments. A Text Exercises,” in 2nd Workshop on Innovative Software Engineering change in keywords signals a topic shift. For equal keywords, Education, 2019, pp. 19–22. [8] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent dirichlet allocation,” J. segments are merged into a single text block. Mach. Learn. Res., vol. 3, pp. 993–1022, 2003. [9] M. A. Hearst, “Texttiling: Segmenting text into multi-paragraph subtopic III. S UMMARY passages,” Computational Linguistics, vol. 23, pp. 33–64, 1997. [10] A. Hulth, “Improved automatic keyword extraction given more linguistic knowledge,” in Conference on Empirical Methods in Natural Language In this paper we have presented a high level overview of a Processing. ACL, 2003, pp. 216–223. new algorithm based on topic modeling and text segmentation [11] S. Bird, E. Klein, and E. Loper, Natural Language Processing with to segment student answers into topically coherent text blocks. Python, 1st ed. O’Reilly Media, Inc., 2009. [12] T. Kiss and J. Strunk, “Unsupervised multilingual sentence boundary Following a ”divide & conquer” approach, we first divide detection,” Comput. Linguist., vol. 32, no. 4, pp. 485–525, 2006. student answers into initial, small segments and then merge [13] M. A. K. Halliday and R. Hasan., Cohesion in English, ser. English them according to topic boundaries to larger text blocks. Language Series. London: Longman, 1976. S. Krusche, S. Wagner (Hrsg.): SEUH 2020 73