=Paper= {{Paper |id=Vol-1633/ws2-paper3 |storemode=property |title=Corpus Methods and Textual Visualization to Enhance Learning in Core Writing Courses |pdfUrl=https://ceur-ws.org/Vol-1633/ws2-paper3.pdf |volume=Vol-1633 |authors=David Kaufer,Suguru Ishizaki |dblpUrl=https://dblp.org/rec/conf/edm/KauferI16 }} ==Corpus Methods and Textual Visualization to Enhance Learning in Core Writing Courses== https://ceur-ws.org/Vol-1633/ws2-paper3.pdf
                       Corpus Methods and Textual Visualization
                     To Enhance Learning in Core Writing Courses
                       David Kaufer                                                           Suguru Ishizaki
                Carnegie Mellon University                                               Carnegie Mellon University
                    5000 Forbes Ave.                                                         5000 Forbes Ave.
                  Pittsburgh, PA 15213                                                     Pittsburgh, PA 15213
                    +1 412-268-1074                                                          +1 412-268-4013
              kaufer@andrew.cmu.edu                                                         suguru@cmu.edu


ABSTRACT                                                               making tangible the decision-making underlying writing has
Writing tasks require countless composing decisions that are           eluded these approaches.
typically beyond the conscious grasp of writers. Much of the skill     The goal of our project is to develop a suite of corpus-based
of being ―text-aware‖ inheres in understanding that texts produced     learning tools that will help students notice hidden structures and
from classroom assignments are not just composed of words and          composing decisions in writing, and become more self-aware and
sentences but of highly structured and often highly predictive         reflective writers.
composing decisions. However, the decision-making underlying
writing is an extremely abstract idea that is hard to make tangible    2. OUR APPROACH
for students. Although a significant number of pedagogical             Our approach builds on a graduate-level writing course developed
approaches have been investigated in the past three decades, the       and taught by Kaufer over a decade, in collaboration with
means to help students acquire more tangible understanding and         Ishizaki. In the course, students used DocuScope [1]—a
control of their composing decisions has not been addressed.           dictionary-based tool for rhetorical text analysis with a suite of
We propose to address this gap by developing a corpus-based            tools for interactive visualization—that allowed students to
learning tool to help students notice and reflect on composition       visualize differences in the rhetorical strategies underlying their
decisions in their writing and to become resultantly more self-        drafts and across the different genres they were assigned to write.
aware and reflective writers. This approach builds on an existing      DocuScope transformed the writing classroom into a design
corpus-based text analysis tool called DocuScope, which for over       studio–like environment for writing, where—unlike a typical
a decade was successfully used for these purposes in a graduate        writing course—students could compare their writing at a glance
pilot course. The goal of this project is to extend this approach to   as if they were comparing posters on a wall (Figure 1).
support the core writing courses at our university.                    DocuScope, then, would allow students to select specific writing
Keywords                                                               to view how certain rhetorical strategies are implemented in terms
                                                                       of composing decisions (Figure 2).
Textual Awareness, Textual Visualization, Corpus-Based                 We informally observed that the visualizations helped enhance
Instruction                                                            students’ awareness of (a) their composing decisions and (b) the
                                                                       relationship of their decision-making to their writing context and
1. INTRODUCTION                                                        the genre of text they were seeking to produce. Although we have
Writing tasks require countless composing decisions that are           no definitive understanding of how this works, we suspect that
typically beyond the conscious grasp of writers. Much of the skill     allowing students to see their composing decisions visualized
of being ―text-aware‖ inheres in understanding that texts produced     after the fact creates grounded evidence for claiming ownership of
from classroom assignments are not just composed of words and          those decisions and using those decisions to explain their situated
sentences but of highly structured and often highly predictive         goals of composing with sharpened clarity.
composing decisions. A fundamental goal of Carnegie Mellon’s           In our current project, our goal is to extend the use of DocuScope
core writing courses is to help students develop this textual          to a much larger scale by embedding it in a freshman-writing
awareness so that they are able to make appropriate compositional      course and a popular professional writing course. Each student
decisions for different text types. Unfortunately, the decision-       will receive feedback based on the text-analysis that compares and
making underlying writing is an extremely abstract notion and          situate his or her writing against the historical student data.
hard to make tangible for students. While various pedagogical          Students of any cohort on any assignment will be able to compare
approaches have been investigated over the past 30+ years,             their writing against a historical cohort writing on the same
                                                                       assignment.
                                                                       More specifically, we are developing a tool for automatically
                                                                       generating visual reports that highlight salient structures and
                                                                       composition decisions in the students’ own writing in relation to
                                                                       the historical data as well as writing by other students in class. We
                                                                       hypothesize that enhancing students’ awareness of their low-level
                                                                       composition choices can enhance their overall metacognitive
                                                                       awareness as writers.
Figure 1. LEFT: Multi-Text Visualization (MTV)—This screenshot shows three genres of a writing course. Yellow dots indicate a
single discrete student writer's text on the self-portrait assignment. Red dots indicate a single discrete student writer's text on the
observer-portrait assignment. Orange dots indicate a single discrete student writer's text on the scenic writing assignment. The X-
axis represents the amount of "first person" in each text. The Y-axis represents the amount of "description" (writing for the eyes
and ears) in each text. Notice that the self-portraits are separated from the other genres on first person. Notice that the scenic texts
are separated from the other genres on description.
RIGHT: Single-Text Visualization (STV)—In this screenshot, we see how a student writer or teacher can drill down from MTV and
see how DocuScope categories tag individual words and word strings. A number of categories are highlighted. Notice how the word
"suggested" is tied to the facilitating category through color-coding. To suggest something is to help another facilitate action.

3. CHALLENGES                                                           the corpus, the visualizations (i.e., reports) we are experimenting
While the course taught by Kaufer was successful [2, 3], the text       to provide feedback to students.
analysis tool was not fully automated. Running DocuScope                We are currently working with a team of statistics professors and
therefore required a manual process that had to be handled by the       students to help us answer some of these questions. By the time of
instructor (Kaufer). This original context worked as well as it did     the workshop, we should have more concrete results about helpful
because (1) the instructor was extremely familiar with the tool and     visual feedback to students. We will also discuss our pedagogical
(2) he was able to assist students in interpreting the analysis.        philosophy for the way students can productively use this
In order to scale the use of this environment for core writing          feedback, as well as some of the challenges of getting this
courses with many sections with different instructors, we must          ambitious project off the ground.
make it highly user-friendly and capable of presenting results
clearly to non-writing experts—i.e., students. Accordingly, we are      5. ACKNOWLEDGMENTS
currently addressing the following specific research questions.         Our thanks to Danielle Wetzel, Necia Werner, Xizhen Cai, Ann
                                                                        Lee, Joel Greenhouse, Arianna Garofalo, Chushan Chen and
         What are optimal ways to integrate automated reporting
                                                                        Binghui Ouyang for vital help on this project.
          into undergraduate writing instructions? We are
          exploring how these reports can be integrated
          meaningfully for students in our core writing classes.        6. REFERENCES
          We are also examining the extent to which these reports       [1] Ishizaki, S., & Kaufer, D. (2011). Computer-aided rhetorical
          can positively impact student understanding of                    analysis. In P. McCarthy & C. Boonthum (Eds.), Applied
          structures and composition decisions in their own                 Natural Language Processing and Content Analysis:
          writing.                                                          Advances in Identification, Investigation, and Resolution.
                                                                            Hershey, PA: IGI Global.
         What are the optimal statistical methods for uncovering
          the most salient composing choices from data generated        [2] Kaufer, D., Geisler, C., Vlachos, P., & Ishizaki, S. (2006).
          from DocuScope? In order to fully automate the                    Mining textual knowledge for writing education and
          analysis and report generation, we are exploring                  research. In L. v. Waes, M. Leijten, & C. Neuwirth (Eds.),
          statistical methods for uncovering salient features in a          Writing and Digital Media (pp. 115-130). Oxford, UK:
          student’s writing.                                                Elsevier Science.
         What are optimal ways to visualize the results of             [3] David Kaufer, Suguru Ishizaki, Jeff Collins, and Pantelis
          statistical analysis? We are exploring optimal ways               Vlachos, (2004) ―Teaching Language Awareness in
          students’ composing decisions can be visualized.                  Rhetorical Choice Using IText and Visualization in
                                                                            Classroom Genre Assignments.‖ Journal for Business and
4. DEMO                                                                     Technical Communication, 18:3 361-40
In this demonstration, we will provide an overview of the
technology we have developed so far, including the tool to mine