Corpus Methods and Textual Visualization To Enhance Learning in Core Writing Courses David Kaufer Suguru Ishizaki Carnegie Mellon University Carnegie Mellon University 5000 Forbes Ave. 5000 Forbes Ave. Pittsburgh, PA 15213 Pittsburgh, PA 15213 +1 412-268-1074 +1 412-268-4013 kaufer@andrew.cmu.edu suguru@cmu.edu ABSTRACT making tangible the decision-making underlying writing has Writing tasks require countless composing decisions that are eluded these approaches. typically beyond the conscious grasp of writers. Much of the skill The goal of our project is to develop a suite of corpus-based of being ―text-aware‖ inheres in understanding that texts produced learning tools that will help students notice hidden structures and from classroom assignments are not just composed of words and composing decisions in writing, and become more self-aware and sentences but of highly structured and often highly predictive reflective writers. composing decisions. However, the decision-making underlying writing is an extremely abstract idea that is hard to make tangible 2. OUR APPROACH for students. Although a significant number of pedagogical Our approach builds on a graduate-level writing course developed approaches have been investigated in the past three decades, the and taught by Kaufer over a decade, in collaboration with means to help students acquire more tangible understanding and Ishizaki. In the course, students used DocuScope [1]—a control of their composing decisions has not been addressed. dictionary-based tool for rhetorical text analysis with a suite of We propose to address this gap by developing a corpus-based tools for interactive visualization—that allowed students to learning tool to help students notice and reflect on composition visualize differences in the rhetorical strategies underlying their decisions in their writing and to become resultantly more self- drafts and across the different genres they were assigned to write. aware and reflective writers. This approach builds on an existing DocuScope transformed the writing classroom into a design corpus-based text analysis tool called DocuScope, which for over studio–like environment for writing, where—unlike a typical a decade was successfully used for these purposes in a graduate writing course—students could compare their writing at a glance pilot course. The goal of this project is to extend this approach to as if they were comparing posters on a wall (Figure 1). support the core writing courses at our university. DocuScope, then, would allow students to select specific writing Keywords to view how certain rhetorical strategies are implemented in terms of composing decisions (Figure 2). Textual Awareness, Textual Visualization, Corpus-Based We informally observed that the visualizations helped enhance Instruction students’ awareness of (a) their composing decisions and (b) the relationship of their decision-making to their writing context and 1. INTRODUCTION the genre of text they were seeking to produce. Although we have Writing tasks require countless composing decisions that are no definitive understanding of how this works, we suspect that typically beyond the conscious grasp of writers. Much of the skill allowing students to see their composing decisions visualized of being ―text-aware‖ inheres in understanding that texts produced after the fact creates grounded evidence for claiming ownership of from classroom assignments are not just composed of words and those decisions and using those decisions to explain their situated sentences but of highly structured and often highly predictive goals of composing with sharpened clarity. composing decisions. A fundamental goal of Carnegie Mellon’s In our current project, our goal is to extend the use of DocuScope core writing courses is to help students develop this textual to a much larger scale by embedding it in a freshman-writing awareness so that they are able to make appropriate compositional course and a popular professional writing course. Each student decisions for different text types. Unfortunately, the decision- will receive feedback based on the text-analysis that compares and making underlying writing is an extremely abstract notion and situate his or her writing against the historical student data. hard to make tangible for students. While various pedagogical Students of any cohort on any assignment will be able to compare approaches have been investigated over the past 30+ years, their writing against a historical cohort writing on the same assignment. More specifically, we are developing a tool for automatically generating visual reports that highlight salient structures and composition decisions in the students’ own writing in relation to the historical data as well as writing by other students in class. We hypothesize that enhancing students’ awareness of their low-level composition choices can enhance their overall metacognitive awareness as writers. Figure 1. LEFT: Multi-Text Visualization (MTV)—This screenshot shows three genres of a writing course. Yellow dots indicate a single discrete student writer's text on the self-portrait assignment. Red dots indicate a single discrete student writer's text on the observer-portrait assignment. Orange dots indicate a single discrete student writer's text on the scenic writing assignment. The X- axis represents the amount of "first person" in each text. The Y-axis represents the amount of "description" (writing for the eyes and ears) in each text. Notice that the self-portraits are separated from the other genres on first person. Notice that the scenic texts are separated from the other genres on description. RIGHT: Single-Text Visualization (STV)—In this screenshot, we see how a student writer or teacher can drill down from MTV and see how DocuScope categories tag individual words and word strings. A number of categories are highlighted. Notice how the word "suggested" is tied to the facilitating category through color-coding. To suggest something is to help another facilitate action. 3. CHALLENGES the corpus, the visualizations (i.e., reports) we are experimenting While the course taught by Kaufer was successful [2, 3], the text to provide feedback to students. analysis tool was not fully automated. Running DocuScope We are currently working with a team of statistics professors and therefore required a manual process that had to be handled by the students to help us answer some of these questions. By the time of instructor (Kaufer). This original context worked as well as it did the workshop, we should have more concrete results about helpful because (1) the instructor was extremely familiar with the tool and visual feedback to students. We will also discuss our pedagogical (2) he was able to assist students in interpreting the analysis. philosophy for the way students can productively use this In order to scale the use of this environment for core writing feedback, as well as some of the challenges of getting this courses with many sections with different instructors, we must ambitious project off the ground. make it highly user-friendly and capable of presenting results clearly to non-writing experts—i.e., students. Accordingly, we are 5. ACKNOWLEDGMENTS currently addressing the following specific research questions. Our thanks to Danielle Wetzel, Necia Werner, Xizhen Cai, Ann Lee, Joel Greenhouse, Arianna Garofalo, Chushan Chen and  What are optimal ways to integrate automated reporting Binghui Ouyang for vital help on this project. into undergraduate writing instructions? We are exploring how these reports can be integrated meaningfully for students in our core writing classes. 6. REFERENCES We are also examining the extent to which these reports [1] Ishizaki, S., & Kaufer, D. (2011). Computer-aided rhetorical can positively impact student understanding of analysis. In P. McCarthy & C. Boonthum (Eds.), Applied structures and composition decisions in their own Natural Language Processing and Content Analysis: writing. Advances in Identification, Investigation, and Resolution. Hershey, PA: IGI Global.  What are the optimal statistical methods for uncovering the most salient composing choices from data generated [2] Kaufer, D., Geisler, C., Vlachos, P., & Ishizaki, S. (2006). from DocuScope? In order to fully automate the Mining textual knowledge for writing education and analysis and report generation, we are exploring research. In L. v. Waes, M. Leijten, & C. Neuwirth (Eds.), statistical methods for uncovering salient features in a Writing and Digital Media (pp. 115-130). Oxford, UK: student’s writing. Elsevier Science.  What are optimal ways to visualize the results of [3] David Kaufer, Suguru Ishizaki, Jeff Collins, and Pantelis statistical analysis? We are exploring optimal ways Vlachos, (2004) ―Teaching Language Awareness in students’ composing decisions can be visualized. Rhetorical Choice Using IText and Visualization in Classroom Genre Assignments.‖ Journal for Business and 4. DEMO Technical Communication, 18:3 361-40 In this demonstration, we will provide an overview of the technology we have developed so far, including the tool to mine