Using Sentence Compression to Develop Visual Analytics for Student Responses to Short Answer Questions Aneesha Bakharia Shane Dawson Queensland University of Technology University of South Australia Queensland South Australia Australia Australia aneesha.bakharia@gmail.com shaned07@gmail.com ABSTRACT Lecturers and tutors require a way to analyse and visualize student In this paper, we report on early research to visualize and responses so that they can: summarize student responses to short answer questions. Recently  understand how students have responded to a question published, graph-based multi-sentence compression algorithms have been successfully applied to summarize opinions – a domain  review the vocabulary being used area with many similarities to short answer responses. Initial  identify knowledge gaps investigations reveal that visual analytics for short answer questions can be derived from the output of graph-based multi-  provide feedback to groups of students with similar sentence compression algorithms. A proposed open source short knowledge gaps answer analytics tool is also briefly discussed, along with an evaluation plan. The proposed analytics tool will allow lecturers The visual analytics tool that is proposed in this paper will apply and tutors to have a high level overview of how students have sentence compression to summarize student responses to short responded to questions, identify knowledge gaps and provide answer questions. Graph-based sentence compression algorithms feedback to students. have recently been developed and applied to summarize multiple related sentences [3] and opinions within textual reviews [4]. The fact that student responses are made up of short sentences with Categories and Subject Descriptors common phrases being used amoung students, makes the H.3.1 [Information Storage and Retrieval]: Content Analysis summarization problem an ideal candidate for sentence and Indexing compression because there is high similarity and redundancy between student responses. Graph-based approaches have also General Terms been applied to automatically grade student responses [4]. Algorithms, Measurement, Design, Human Factors. 2. MULTI-SENTENCE COMPRESSION Keywords The first algorithm that has been investigated is the multi-sentence Learning Analytics, Visual Analytics, Sentence Compression, compression algorithm by Filippova [3]. The Filippova algorithm Natural Language Processing, Summarization, Graph Layout. summarizes similar or related sentences and outputs a single short sentence that summarizes the most salient theme conveyed in the cluster of sentences. The algorithm constructs a word graph and 1. INTRODUCTION uses an approach based upon the shortest paths between words in Short answer questions are a useful form of summative and the graph to produce a summary sentence. The algorithm is easy formative assessment as they allow students to explain concepts in to implement because sentences must only be tokenized and part their own words without providing prompts to students. Grading of speech tagged. The Filippova algorithm is the first sentence and providing feedback for short answer questions, depending compression algorithm that does not require "hand-crafted rules, upon the number of student responses is a tedious task. Multiple nor a language model to generate reasonably grammatical output". choice questions which are easily automatically graded are predominantly used when student numbers are large such as All of the words contained in the sentences form the nodes in the MOOCs. Within flipped classroom scenarios, students are given word graph. A word graph is a directed graph where an edge from pre-lecture readings and required to answer short answer word A to word B represents an adjacency relation. It also questions, with the lecturer analyzing the student answers and contains start and end nodes (i.e., punctuation). Part of speech addressing knowledge gaps during the lecture. The need for visual information is used to prevent verbs and nouns from being analytics to help lecturers and tutors gain an overview of student merged in the word graph which would result in the responses is therefore becoming increasingly important. summarization sentence having ungrammatical sequences. Edges within the word graph are used to connect words that are adjacent A rapid turnaround between students submitting their responses in a sentence with the edge weight incremented by 1 each time a and the lecturer analyzing the response is required. This research word occurs after another in a sequence. is not focused on automatically grading short answer questions, rather the focus in on providing insight into how students have Filippova [3] says that good sentence compression goes through answered questions and allowing lecturers to easily determine the all the nodes which represent important concepts but does not appropriate feedback and support that students require using pass the same node several times. This is achieved by inverting visual analytics. the edge weights and finding the K shortest paths from the start to the end node in the word graph that don't include a verb. The path through the graph with the minimum total weight is selected as graph visualization of the word graph also shows multiple the summary sentence. Additional graph scoring and ranking branches and loops. metrics are used to take into consideration strong links between words and determine salient words. 4.1 Example 1 The summary candidate sentences in the first example shows that 3. VISUAL ANALYTICS BASED ON most students have included the 2 key similarities between iteration and recursion related to a termination condition and that MULTI-SENTENCE COMPRESSION both can execute infinitely. Students however have not used the Graph-based multi-sentence compression produces K candidate word “repetition” but refer to programming code syntax (i.e., summary sentences, with the sentence with the minimum shortest control statement). path score being selected as the summary. Within the context of applying the algorithm to develop visual analytics for short Question: What are the similarities between iteration and answer questions, we propose to use all K candidates because recursion? difference common pathways are captured and these may have Teachers Answer: They both involve repetition; they both have branches that identify different concepts or vocabulary being used termination tests; they can both occur infinitely. by students. Table 1. Top 10 candidate summary sentences for Example 1 The following 3 approaches are being considered as summarization and visualization tools for short answer questions: Score Candidate Summary Sentence  Approach 1: Display the K candidate sentences that are 0.016 both are based on control statement. derived from the Filippova [3] algorithm. This is only a both are based on a control statement. textual display of the sentences. 0.015  Approach 2: Construct a graph from the K candidate 0.023 they are based on control statement. sentences and use a graph layout algorithm to display the graph is a visual manner [5]. The advantage over the they are based on a control statement. 0.021 textual display of the sentence is that loops of words and branches between words would be more easily both are based on control statement, termination 0.021 identifiable. test. both are based on control statement both can  Approach 3: Display the full word graph and highlight 0.02 infinitely. the K candidate paths (sentences) on the graph display. both are based on a control statement , termination This visualization would allow the lecturer/tutor to see 0.02 test. the range of words used. Approach 3 is not presented in both are based on control statement , both involve a this paper but will be evaluated in future work. 0.017 termination test. both are based on a control statement , both can 4. INITIAL INVESTIGATION 0.019 infinitely. An initial investigation on using the Filippova [3] algorithm to based on control statement , both can occur produce a summary and visualization of student responses to short 0.023 infinitely. answer questions has been conducted using the open dataset provided by Mohler and Mihalcea [6]. The dataset consists of three assignments of seven short answer questions each given to an introductory computer science class at the University of North Texas. Each assignment includes the question, the teachers answer, and the student responses (usually a few short sentences). An open source implementation of the Filippova [3] multi- sentence compression algorithm, from the Takahe library (https://github.com/boudinfl/takahe) was used. Tokenization and part of speech tagging was done using NLTK [1]. Numerous spelling errors were noted, but were not fixed for the initial investigations. The minimum number of words in the derived the compressions was set to 6. The number of sentence candidate generated for the 2 examples shown in this paper was 10. The visualization for each of the examples was created using the Yifan Hu [5] Layout in Gephi. Figure 1. Graph visualization of the top 10 summary sentence The top 10 summary sentences (Approach 1) and the graph candidates in Example 1. visualization of the word graph constructed from the summary sentences (Approach 2) is included for 2 questions from the 4.2 Example 2 Mohler and Mihalcea [6] dataset in Section 4.1 and 4.2. In the second example, very few students include "abstraction" in their answer or concepts that would be associated with Initial results show that the summary candidate sentences provide “abstraction” such as “encapsulation”. Most students mention a good overview of the common concepts used by students. The reusability and maintenance/debugging but it is actually “abstraction” that leads to easier maintenance/debugging of object oriented programming code. The proposed visualizations would feedback to students. Integration with quiz tool export formats therefore allow the lecturer/tutor to identify the concepts that the from popular Learning Management Systems is also planned. students have missed or explained incorrectly and guide the lecturer in providing feedback. 6. PROPOSED EVALUATION Question: What are the main advantages associated with object- A between subjects comparative study is being planned. The study oriented programming? will be comprised of two groups. Group A will be required to read Teachers Answer: Abstraction and reusability. all student responses and identify student knowledge gaps. Group B will use the visualizations produced from the output of sentence Table 2. Top 10 candidate summary sentences for Example 2 compression to identify knowledge gaps in the student responses. Identified knowledge gaps from Group A and Group B will then Score Candidate Summary Sentence be compared. 0.025 existing classes can be reused program. Participants in Group B will be shown all 3 approaches described existing classes can be reused program in Section 3 and asked to rate each approach based on principles 0.025 maintenance. of visual analytics. existing classes can be reused program maintenance 0.023 and verification are easier. 7. CONCLUSION 0.042 objects can be reused program maintenance. In this paper, ideas on using graph-based multi-sentence existing classes can be reused and program. compression as the basis for the visual analysis of student 0.04 responses to short answer questions were explored. The multi- existing classes can be reused and program sentence compression algorithm was introduced and ideas for 0.038 potential visualizations were discussed. Example visualizations maintenance. were then presented along with plans to embed visualizations with 0.05 the classes can be reused program . in an open source tool that is able to integrate with quiz responses objects can be reused program maintenance and from Learning Management Systems. Preliminary results indicate 0.034 verification are easier. that visualizations derived from the output of sentence 0.047 the classes can be reused program maintenance. compression are able to allow lecturers to identify knowledge gaps and provide feedback to groups of students with similar 0.059 objects can be reused and program. knowledge gaps. In the future an evaluation of the visualizations will be conducted. The keyphrase extraction algorithm for reranking summary sentences [2] and the Opinosis algorithm [4] will also be evaluated in addition to the Filippova algorithm [3]. 8. REFERENCES [1] Bird, S., Edward L., & Ewan K. 2009. Natural Language Processing with Python. O’Reilly Media Inc. [2] Boudin, F., & Morin, E. 2013. Keyphrase Extraction for N- best Reranking in Multi-Sentence Compression. In Proccedings of the NAACL HLT 2013 conference. [3] Filippova, K. 2010. Multi-sentence compression: Finding shortest paths in word graphs. In Proceedings of the 23rd International Conference on Computational Linguistics (pp. 322-330). Association for Computational Linguistics. [4] Ganesan, K., Zhai, C., & Han, J. 2010. Opinosis: a graph- based approach to abstractive summarization of highly Figure 2. Graph visualization of the top 10 summary sentence redundant opinions. In Proceedings of the 23rd International candidates in Example 2. Conference on Computational Linguistics (pp. 340-348). Association for Computational Linguistics. Chicago. 5. TOOL DESIGN AND FUNCTIONALITY [5] Hu, Y. F. 2005. Efficient and high quality force-directed We intend to create an open source tool that incorporates the graph drawing. The Mathematica Journal, 10 (37-71). sentence compression algorithm and the proposed visualizations described in Section 3 that will be made available on Github. The [6] Mohler, M., & Mihalcea, R. 2009. Text-to-text Semantic tool will allow lecturers to view student responses that match Similarity for Automatic Short Answer Grading, in word graph loops and branches. This will help lecturers to Proceedings of the European Chapter of the Association for determine context by viewing exemplar student responses. The Computational Linguistics (EACL 2009), Athens, Greece. tool will also allow lecturers to attach feedback to nodes and paths in the word graph as a means of providing specific and targeted