Using Sentence Compression to Develop Visual Analytics
    for Student Responses to Short Answer Questions
         Aneesha Bakharia                                 Shane Dawson
Queensland University of Technology                 University of South Australia
           Queensland                                     South Australia
            Australia                                         Australia
 aneesha.bakharia@gmail.com                          shaned07@gmail.com

ABSTRACT                                                             Lecturers and tutors require a way to analyse and visualize student
In this paper, we report on early research to visualize and          responses so that they can:
summarize student responses to short answer questions. Recently               understand how students have responded to a question
published, graph-based multi-sentence compression algorithms
have been successfully applied to summarize opinions – a domain               review the vocabulary being used
area with many similarities to short answer responses. Initial
                                                                              identify knowledge gaps
investigations reveal that visual analytics for short answer
questions can be derived from the output of graph-based multi-                provide feedback to groups of students with similar
sentence compression algorithms. A proposed open source short                   knowledge gaps
answer analytics tool is also briefly discussed, along with an
evaluation plan. The proposed analytics tool will allow lecturers    The visual analytics tool that is proposed in this paper will apply
and tutors to have a high level overview of how students have        sentence compression to summarize student responses to short
responded to questions, identify knowledge gaps and provide          answer questions. Graph-based sentence compression algorithms
feedback to students.                                                have recently been developed and applied to summarize multiple
                                                                     related sentences [3] and opinions within textual reviews [4]. The
                                                                     fact that student responses are made up of short sentences with
Categories and Subject Descriptors                                   common phrases being used amoung students, makes the
H.3.1 [Information Storage and Retrieval]: Content Analysis          summarization problem an ideal candidate for sentence
and Indexing                                                         compression because there is high similarity and redundancy
                                                                     between student responses. Graph-based approaches have also
General Terms                                                        been applied to automatically grade student responses [4].
Algorithms, Measurement, Design, Human Factors.
                                                                     2. MULTI-SENTENCE COMPRESSION
Keywords                                                             The first algorithm that has been investigated is the multi-sentence
Learning Analytics, Visual Analytics, Sentence Compression,          compression algorithm by Filippova [3]. The Filippova algorithm
Natural Language Processing, Summarization, Graph Layout.            summarizes similar or related sentences and outputs a single short
                                                                     sentence that summarizes the most salient theme conveyed in the
                                                                     cluster of sentences. The algorithm constructs a word graph and
1. INTRODUCTION                                                      uses an approach based upon the shortest paths between words in
Short answer questions are a useful form of summative and            the graph to produce a summary sentence. The algorithm is easy
formative assessment as they allow students to explain concepts in   to implement because sentences must only be tokenized and part
their own words without providing prompts to students. Grading       of speech tagged. The Filippova algorithm is the first sentence
and providing feedback for short answer questions, depending         compression algorithm that does not require "hand-crafted rules,
upon the number of student responses is a tedious task. Multiple     nor a language model to generate reasonably grammatical output".
choice questions which are easily automatically graded are
predominantly used when student numbers are large such as            All of the words contained in the sentences form the nodes in the
MOOCs. Within flipped classroom scenarios, students are given        word graph. A word graph is a directed graph where an edge from
pre-lecture readings and required to answer short answer             word A to word B represents an adjacency relation. It also
questions, with the lecturer analyzing the student answers and       contains start and end nodes (i.e., punctuation). Part of speech
addressing knowledge gaps during the lecture. The need for visual    information is used to prevent verbs and nouns from being
analytics to help lecturers and tutors gain an overview of student   merged in the word graph which would result in the
responses is therefore becoming increasingly important.              summarization sentence having ungrammatical sequences. Edges
                                                                     within the word graph are used to connect words that are adjacent
A rapid turnaround between students submitting their responses       in a sentence with the edge weight incremented by 1 each time a
and the lecturer analyzing the response is required. This research   word occurs after another in a sequence.
is not focused on automatically grading short answer questions,
rather the focus in on providing insight into how students have      Filippova [3] says that good sentence compression goes through
answered questions and allowing lecturers to easily determine the    all the nodes which represent important concepts but does not
appropriate feedback and support that students require using         pass the same node several times. This is achieved by inverting
visual analytics.                                                    the edge weights and finding the K shortest paths from the start to
                                                                     the end node in the word graph that don't include a verb. The path
through the graph with the minimum total weight is selected as       graph visualization of the word graph also shows multiple
the summary sentence. Additional graph scoring and ranking           branches and loops.
metrics are used to take into consideration strong links between
words and determine salient words.                                   4.1 Example 1
                                                                     The summary candidate sentences in the first example shows that
3. VISUAL ANALYTICS BASED ON                                         most students have included the 2 key similarities between
                                                                     iteration and recursion related to a termination condition and that
MULTI-SENTENCE COMPRESSION                                           both can execute infinitely. Students however have not used the
Graph-based multi-sentence compression produces K candidate
                                                                     word “repetition” but refer to programming code syntax (i.e.,
summary sentences, with the sentence with the minimum shortest
                                                                     control statement).
path score being selected as the summary. Within the context of
applying the algorithm to develop visual analytics for short         Question: What are the similarities between iteration and
answer questions, we propose to use all K candidates because         recursion?
difference common pathways are captured and these may have           Teachers Answer: They both involve repetition; they both have
branches that identify different concepts or vocabulary being used   termination tests; they can both occur infinitely.
by students.
                                                                      Table 1. Top 10 candidate summary sentences for Example 1
The following 3 approaches are being considered as
summarization and visualization tools for short answer questions:     Score               Candidate Summary Sentence
        Approach 1: Display the K candidate sentences that are       0.016     both are based on control statement.
         derived from the Filippova [3] algorithm. This is only a               both are based on a control statement.
         textual display of the sentences.                            0.015

        Approach 2: Construct a graph from the K candidate           0.023
                                                                                they are based on control statement.
         sentences and use a graph layout algorithm to display
         the graph is a visual manner [5]. The advantage over the               they are based on a control statement.
                                                                      0.021
         textual display of the sentence is that loops of words
         and branches between words would be more easily                        both are based on control statement, termination
                                                                      0.021
         identifiable.                                                          test.
                                                                                both are based on control statement both can
        Approach 3: Display the full word graph and highlight        0.02
                                                                                infinitely.
         the K candidate paths (sentences) on the graph display.
                                                                                both are based on a control statement , termination
         This visualization would allow the lecturer/tutor to see     0.02
                                                                                test.
         the range of words used. Approach 3 is not presented in
                                                                                both are based on control statement , both involve a
         this paper but will be evaluated in future work.             0.017
                                                                                termination test.
                                                                                both are based on a control statement , both can
4. INITIAL INVESTIGATION                                              0.019
                                                                                infinitely.
An initial investigation on using the Filippova [3] algorithm to                based on control statement , both can occur
produce a summary and visualization of student responses to short     0.023
                                                                                infinitely.
answer questions has been conducted using the open dataset
provided by Mohler and Mihalcea [6]. The dataset consists of
three assignments of seven short answer questions each given to
an introductory computer science class at the University of North
Texas. Each assignment includes the question, the teachers
answer, and the student responses (usually a few short sentences).
An open source implementation of the Filippova [3] multi-
sentence compression algorithm, from the Takahe library
(https://github.com/boudinfl/takahe) was used. Tokenization and
part of speech tagging was done using NLTK [1]. Numerous
spelling errors were noted, but were not fixed for the initial
investigations. The minimum number of words in the derived the
compressions was set to 6. The number of sentence candidate
generated for the 2 examples shown in this paper was 10. The
visualization for each of the examples was created using the Yifan
Hu [5] Layout in Gephi.                                              Figure 1. Graph visualization of the top 10 summary sentence
The top 10 summary sentences (Approach 1) and the graph                               candidates in Example 1.
visualization of the word graph constructed from the summary
sentences (Approach 2) is included for 2 questions from the          4.2 Example 2
Mohler and Mihalcea [6] dataset in Section 4.1 and 4.2.              In the second example, very few students include "abstraction" in
                                                                     their answer or concepts that would be associated with
Initial results show that the summary candidate sentences provide    “abstraction” such as “encapsulation”. Most students mention
a good overview of the common concepts used by students. The         reusability and maintenance/debugging but it is actually
                                                                     “abstraction” that leads to easier maintenance/debugging of object
oriented programming code. The proposed visualizations would           feedback to students. Integration with quiz tool export formats
therefore allow the lecturer/tutor to identify the concepts that the   from popular Learning Management Systems is also planned.
students have missed or explained incorrectly and guide the
lecturer in providing feedback.                                        6. PROPOSED EVALUATION
Question: What are the main advantages associated with object-         A between subjects comparative study is being planned. The study
oriented programming?                                                  will be comprised of two groups. Group A will be required to read
Teachers Answer: Abstraction and reusability.                          all student responses and identify student knowledge gaps. Group
                                                                       B will use the visualizations produced from the output of sentence
 Table 2. Top 10 candidate summary sentences for Example 2             compression to identify knowledge gaps in the student responses.
                                                                       Identified knowledge gaps from Group A and Group B will then
 Score               Candidate Summary Sentence
                                                                       be compared.
 0.025     existing classes can be reused program.                     Participants in Group B will be shown all 3 approaches described
           existing classes can be reused program                      in Section 3 and asked to rate each approach based on principles
 0.025
           maintenance.                                                of visual analytics.
           existing classes can be reused program maintenance
 0.023
           and verification are easier.                                7. CONCLUSION
 0.042     objects can be reused program maintenance.                  In this paper, ideas on using graph-based multi-sentence
           existing classes can be reused and program.                 compression as the basis for the visual analysis of student
 0.04                                                                  responses to short answer questions were explored. The multi-
           existing classes can be reused and program                  sentence compression algorithm was introduced and ideas for
 0.038                                                                 potential visualizations were discussed. Example visualizations
           maintenance.
                                                                       were then presented along with plans to embed visualizations with
 0.05      the classes can be reused program .                         in an open source tool that is able to integrate with quiz responses
           objects can be reused program maintenance and               from Learning Management Systems. Preliminary results indicate
 0.034
           verification are easier.                                    that visualizations derived from the output of sentence
 0.047     the classes can be reused program maintenance.              compression are able to allow lecturers to identify knowledge
                                                                       gaps and provide feedback to groups of students with similar
 0.059     objects can be reused and program.                          knowledge gaps. In the future an evaluation of the visualizations
                                                                       will be conducted. The keyphrase extraction algorithm for
                                                                       reranking summary sentences [2] and the Opinosis algorithm [4]
                                                                       will also be evaluated in addition to the Filippova algorithm [3].

                                                                       8. REFERENCES
                                                                       [1] Bird, S., Edward L., & Ewan K. 2009. Natural Language
                                                                           Processing with Python. O’Reilly Media Inc.
                                                                       [2] Boudin, F., & Morin, E. 2013. Keyphrase Extraction for N-
                                                                           best Reranking in Multi-Sentence Compression. In
                                                                           Proccedings of the NAACL HLT 2013 conference.
                                                                       [3] Filippova, K. 2010. Multi-sentence compression: Finding
                                                                           shortest paths in word graphs. In Proceedings of the 23rd
                                                                           International Conference on Computational Linguistics (pp.
                                                                           322-330). Association for Computational Linguistics.
                                                                       [4] Ganesan, K., Zhai, C., & Han, J. 2010. Opinosis: a graph-
                                                                           based approach to abstractive summarization of highly
Figure 2. Graph visualization of the top 10 summary sentence               redundant opinions. In Proceedings of the 23rd International
                 candidates in Example 2.                                  Conference on Computational Linguistics (pp. 340-348).
                                                                           Association for Computational Linguistics. Chicago.
5. TOOL DESIGN AND FUNCTIONALITY                                       [5] Hu, Y. F. 2005. Efficient and high quality force-directed
We intend to create an open source tool that incorporates the              graph drawing. The Mathematica Journal, 10 (37-71).
sentence compression algorithm and the proposed visualizations
described in Section 3 that will be made available on Github. The      [6] Mohler, M., & Mihalcea, R. 2009. Text-to-text Semantic
tool will allow lecturers to view student responses that match             Similarity for Automatic Short Answer Grading, in
word graph loops and branches. This will help lecturers to                 Proceedings of the European Chapter of the Association for
determine context by viewing exemplar student responses. The               Computational Linguistics (EACL 2009), Athens, Greece.
tool will also allow lecturers to attach feedback to nodes and paths
in the word graph as a means of providing specific and targeted