SAS Tools for Educational Data Mining Jennifer Sabourin Scott McQuiggan Andre de Waal SAS Institute SAS Institute SAS Institute 100 SAS Campus Dr. 100 SAS Campus Dr. 100 SAS Campus Dr. Cary, NC 27513 Cary, NC 27513 Cary, NC 27513 1.919.531.3313 1.919.531.1119 1.919.531.6575 Jennifer.Sabourin@sas.com Scott.McQuiggan@sas.com Andre.DeWaal@sas.com ABSTRACT accuracy of predictions, which can be verified easily by visual Researchers in the EDM community have always relied on so- model assessment and validation. Users build process flow dia- phisticated tools to analyze data and build models. As the grams that serve as self-documenting procedures. T hese diagrams amount of data that can be collected and stored grows, the need can be updated easily or applied to new problems without starting for tools capable of handling “big data” becomes ever more over from scratch. In addition to process flow diagrams, Enter- prevalent. SAS® Analytics U is a new initiative for making SAS prise Miner provides a programming interface for advanced data analysis and mining tools available for free to educational users. Enterprise Miner allows integration with open source researchers and instructors. T hese tools are designed for handling software for data manipulation and model comparison, the open very large data sets and can be run in the cloud, saving research- standard PMML, and databases for scoring models without data ers valuable time and resources. Furthermore, SAS Analytics U movement. provides a community of SAS educators and learners to share Additional SAS tools that may be covered if it is of interest to resources and information about SAS tools and techniques. the participants include t ools for time series analysis, forecast- ing, matrix manipulations, and advanced statistics. T his tutorial aims to introduce researchers to the tools available through SAS Analytics U and how they can be applied to the field of Educational Data Mining. We will provide an overview 2. JUSTIFICATION of the SAS architecture and provide instruction on the key fea- Educational data miners rely on computational tools to under- tures of each tool in the suite. We will guide participants through stand and explore their data. T hese tools must be robust and examples using relevant educational data sources to help re- flexible in order to allow for innovation. T hey must be able to searchers understand how the tools can be applied to their own handle ever increasing amounts of data. Ideally, they are easy to work. use by both programmers and non-programmers alike due to the interdisciplinary nature of this research area. Finally, most re- searchers rely upon tools that are freely available and do not 1. TUTORIAL OVERVIEW require excessive resources. T his tutorial will focus on introducing SAS to participants and guiding them through the use of the suite of tools using relevant SAS University Edition is a new option that addresses many of educational data sets. T he tools that will be covered include: these needs. T his suite of powerful SAS software was made avail- able to all learners for free in May of 2014. SAS Enterprise SAS® Programming Language . SAS programming language is Miner, T ext Miner, and Forecast Server have been available a powerful language designed specifically for intensive data anal- through SAS OnDemand for Academics since late 2010. Howev- ysis. T his highly flexible and extensible fourth generation pro- er, the biggest barrier to adopting new tools is learning how to gramming language has a clear syntax and hundreds of language use them. SAS Analytics U is a community centered around these elements and functions. It supports programming everything free offerings and is designed to support SAS learners and educa- from data extraction, formatting and cleansing to data analysis, tors. T his tutorial seeks to introduce participants to these re- building sophisticated models, and generating reports. T he SAS sources and suite of tools and demonstrate how they can be ap- programming language is at the heart of the SAS University plied to EDM research. T he goal is that participants will be able Edition tools. to add another set of tools to their every growing toolbox for SAS® Studio. SAS Studio is the development environment for conducting EDM research SAS University Edition and runs through the web browser as well as in the cloud. It offers a powerful GUI interface that allows novice programmers to interact with data and perform analyses 3. TUTORIAL FORMAT T his tutorial will be divided into four sessions, each covering a without writing any SAS code themselves. However, the SAS specific topic: code is all generated behind the scenes and is visible to help users learn. Session 1 – SAS University Edition SAS® Enterprise Miner™. SAS Enterprise Miner helps users Session 2 – SAS Studio streamline the data mining process to create highly accurate Session 3 – SAS Enterprise Miner predictive and descriptive models based on analysis of vast amounts of data. It includes innovative algorithms in the areas Session 4 – Participant requests and questions of statistics and machine learning to enhance the stability and