Using Natural Language Processing Tools in Educational Data Mining Scott Crossley Laura K Allen Danielle S. McNamara Georgia State U. Arizona State Univ. Arizona State Univ. Atlanta, GA 30303 Tempe, AZ, 85287 Tempe, AZ, 85287 scrossley@gsu.edu LauraKAllen@asu.edu dsmcnama@asu.edu ABSTRACT complexity, and text organization. The constructs that will be The workshop will cover the development, use, and educational discussed have all been shown to correlate with student success data mining applications of a number of freely available natural in various educational settings. The remainder of the tutorial will language processing (NLP) tools such as Coh-Metrix, the focus on how these constructs can be applied in educational data Writing Assessment Tool (WAT), the Simple NLP (SiNLP) mining research to predict educational outcomes. The basic tool, the Tool for the Automatic Analysis of Lexical outline for the tutorial is an introduction to linguistic constructs, Sophistication (TAALES), and the Tool for the Automatic an overview of available NLP tools, a description of how the Analysis of Cohesion (TAACO). The workshop will provide the linguistic constructs are calculated in the tools, and a discussion participants with an overview of the types of linguistic features of the links between these linguistic constructs and educational that can be measured with these NLP tools. Additionally, it will outcomes (e.g., performance, attitudes, etc.). describe how these features have been (and can be) used in text analyses that are of importance to the educational data mining community. Participants will receive hands-on training with the 1.1 Justification for Importance of Topic Natural language processing tools have been widely used to tools using data from computerized learning environments. Participants will also be shown how the output from these tools better inform research in a number of fields including cognitive can be used to develop machine learning algorithms that can aid science, medical discourse, literary studies, language learning, in predicting educational outcomes. and the social sciences. However, large-scale use of NLP tools in educational data mining is much less common. Recently, the Keywords strength of NLP has begun to be recognized, as evidenced by a number of funding opportunities, and research findings in fields Natural language processing, Data mining, machine learning related to EDM (e.g., MOOCS, ITSs, etc.). This suggests that NLP is poised to become an important element of educational 1. Description of the Tutorial Content and research, particularly when used in combination with more Themes traditional measurements of student success (i.e., psychometric In this workshop, participants will be introduced to a number of data, system interaction data, and sensor data). The convergence freely available natural language processing tools. The primary of readily available NLP tools, large-scale educational data sets, focus of the tutorial will be to familiarize participants with the and data mining techniques can provide EDM researchers with linguistic constructs measured by the tools, including lexical new approaches to better understand and predict variables sophistication, text cohesion, rhetorical style, syntactic related to educational success.