=Paper= {{Paper |id=Vol-1446/EDM_NLP_tutorial_paper |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-1446/EDM_NLP_tutorial_paper.pdf |volume=Vol-1446 }} ==None== https://ceur-ws.org/Vol-1446/EDM_NLP_tutorial_paper.pdf
   Using Natural Language Processing Tools in Educational Data
                            Mining
         Scott Crossley                               Laura K Allen                     Danielle S. McNamara
          Georgia State U.                            Arizona State Univ.                       Arizona State Univ.
         Atlanta, GA 30303                            Tempe, AZ, 85287                          Tempe, AZ, 85287
           scrossley@gsu.edu                          LauraKAllen@asu.edu                         dsmcnama@asu.edu


ABSTRACT                                                             complexity, and text organization. The constructs that will be
The workshop will cover the development, use, and educational        discussed have all been shown to correlate with student success
data mining applications of a number of freely available natural     in various educational settings. The remainder of the tutorial will
language processing (NLP) tools such as Coh-Metrix, the              focus on how these constructs can be applied in educational data
Writing Assessment Tool (WAT), the Simple NLP (SiNLP)                mining research to predict educational outcomes. The basic
tool, the Tool for the Automatic Analysis of Lexical                 outline for the tutorial is an introduction to linguistic constructs,
Sophistication (TAALES), and the Tool for the Automatic              an overview of available NLP tools, a description of how the
Analysis of Cohesion (TAACO). The workshop will provide the          linguistic constructs are calculated in the tools, and a discussion
participants with an overview of the types of linguistic features    of the links between these linguistic constructs and educational
that can be measured with these NLP tools. Additionally, it will     outcomes (e.g., performance, attitudes, etc.).
describe how these features have been (and can be) used in text
analyses that are of importance to the educational data mining
community. Participants will receive hands-on training with the
                                                                     1.1 Justification for Importance of Topic
                                                                     Natural language processing tools have been widely used to
tools using data from computerized learning environments.
Participants will also be shown how the output from these tools      better inform research in a number of fields including cognitive
can be used to develop machine learning algorithms that can aid      science, medical discourse, literary studies, language learning,
in predicting educational outcomes.                                  and the social sciences. However, large-scale use of NLP tools
                                                                     in educational data mining is much less common. Recently, the
Keywords                                                             strength of NLP has begun to be recognized, as evidenced by a
                                                                     number of funding opportunities, and research findings in fields
Natural language processing, Data mining, machine learning           related to EDM (e.g., MOOCS, ITSs, etc.). This suggests that
                                                                     NLP is poised to become an important element of educational
1. Description of the Tutorial Content and                           research, particularly when used in combination with more
Themes                                                               traditional measurements of student success (i.e., psychometric
In this workshop, participants will be introduced to a number of     data, system interaction data, and sensor data). The convergence
freely available natural language processing tools. The primary      of readily available NLP tools, large-scale educational data sets,
focus of the tutorial will be to familiarize participants with the   and data mining techniques can provide EDM researchers with
linguistic constructs measured by the tools, including lexical       new approaches to better understand and predict variables
sophistication, text cohesion, rhetorical style, syntactic           related to educational success.