=Paper=
{{Paper
|id=Vol-1819/edudm2017-paper6
|storemode=property
|title=
|pdfUrl=https://ceur-ws.org/Vol-1819/edudm2017-paper6.pdf
|volume=Vol-1819
|authors=Subhalaxmi Panda,P. A Pattanaik,Tripti Swarnkar
|dblpUrl=https://dblp.org/rec/conf/indiaSE/PandaPS17
}}
====
A Higher Education Predictive Model Using Data Mining Techniques Subhalaxmi Panda P. A Pattanaik Tripti Swarnkar Department of Computer Science & Department of Computer Science & Department of Computer Engineering, Siksha 'O' Anusandhan Engineering, Siksha 'O' Anusandhan Application, Siksha 'O' Anusandhan University, Bhubaneswar, Odisha, University, Bhubaneswar, Odisha, University, Bhubaneswar, Odisha, INDIA. INDIA. INDIA. Ph : +91-9439057546 Ph : +91-9692998983 Ph : +91- 9437130794 jinisubha.666@gmail.com priyadarshiniadyashapattanaik triptiswarnakar@soauniversity. @soauniversity.ac.in ac.in ABSTRACT The main objective of the higher educational organization is to There exist lack of efficient and enough knowledge in Indian provide high quality and necessary education to its students. The educational system which hampers the system management to get two goals of data mining in Indian education system is to analyze their quality objectives. Thus, data mining is considered as the and enhance the chronicle way of recent educational data mining most suitable technology which provides additional insight into advances development; the second is to preserve, organize and the industrial as well as educational sectors helping in taking discuss the content of the result which is produced by a data better decisions and motivating them to perform effectively. Data mining approach. The use of various data mining techniques such mining technology acts as a bridge between the lacunas and as random forest, decision tree, etc in Indian education processes Indian educational system. Data mining approach leads to some will help to improve students' performance and provide a broad data mining techniques which will help to improve the decision management skill in selection of courses as per their effectiveness, efficiency and the accuracy of the processes. As a retention rate. This paper focuses on the model representation for result, this development helps in improving the Indian educational analyzing the different data mining techniques in an Indian system by increasing educational system efficiency, minimizing education system. Also the paper reviews a comparative study of students drop-out rate, gradually increasing students promotion ID3, K-Means, Naïve Bayes, Random Forest algorithm. In this rate, students retention rate, simultaneously educational paper, we have proposed the approach of Random Forest to improvement rate, students success, increase in students learning predict the career decision for the 12th passing out students. The rate[6]. So, to achieve the overall quality improvement, we need use of Random Forest has helped the students to take a correct some data mining techniques in the system that helps the decision appropriate decision as per their interest and skills and acts a makers to act smartly. Random Forest is one of the dynamic career counselor toolbox. ensemble learning techniques which helps the students to take correct decision for their appropriate career choices after board exams. This data mining technique instructs the student with a Keywords particular pathway to direct his/her brighter career in an effective manner. Indian Education System; Data Mining; Random Forest 1. INTRODUCTION 2. METHODS Education is an attempt or effort of the senior people to spread Data mining in Indian education system has some extend their knowledge to the younger people of society. It is thus an overcome the lacunae by various techniques. It is gaining institution, which plays a vital role in maintaining the perpetuation popularity because of effective, efficient and accurate towards of culture by integrating an individual with his society. But in Indian education system. The dataset used in this study contains India, the education system has some serious lacunae[3]. records of class 10th and 12th students of career counseling. The Nowadays the important challenges in the educational data set is used to improve the performances, predict, or focus on organization are, not having more efficient, effective and accurate skills of students by using different classification techniques. educational processes. Nowadays the important challenges in the educational organization are, not having more efficient, effective Figure 1 demonstrates the whole working of the proposed model and accurate educational processes. to give a broad understanding to the students about their career counseling. In the first stage, information about students of class 10th and 12th were collected and is named as data pre-processing stage. In the second stage, remove the unnecessary information and only relevant data will be fed to the database. After Copyright ©2017 for the individual papers by the papers’ authors. Copying addressing the students information, the dataset is tried with permiŠed for private and academic purposes. Œis volume is published and different algorithms like ID3, K-Means, Naïve Bayes and Random copyrighted by its editors. Forest[1]. K-Means technique is one of the old and most widely used algorithms used for clustering larger information based / Both databases. Naïve Bayes is one of the statistical classifier techniques which act as a hypothesis for a set of estimating Efficiency Good/Average/Poor attributes in a database. This technique helps to detect the Service, Business, effectiveness of specific attribute for a given class and its Father's occupation Agriculture, Retired, NA relationship with other classes[2][5]. The other algorithm is the House-wife (HW), Service, Random Forest, which aims in the first randomization through Mother's occupation Retired, NA bagging. This approach of using Random Forest helps on handling Parental income status High Medium/ Low missing values and category predictors and problems. The third last stage states the application of Random forest algorithm to the training data set with better output and the performance of each student are evaluated[6]. 2.1 RANDOM FOREST The Random forest concept was first introduced by Tin Kam Ho. Random forests or random decision forest is a learning technique for classification and regression. It is used in the construction of decision trees at training time and gives output classes that is in the form of the classification classes or mean prediction (regression) of the individual tree[1]. Basic Random forest Algorithm:- Consider Nstudent be the no. of students to create for each of N- students iterations. Where mtry is no. of predictors to try at each split. Choose a new bootstrap sample from the training set. Develop an un-pruned tree on this bootstrap. Arbitrarily, choose Mtry predictors and find the best split using only these predictors at each internal node. Each Nstudent leads to the largest extent possible with no pruning. Figure 1. Paradigm of Proposed model The training data set, shown in Table: 1 contains detail information of the student like Student ID, Gender, etc. The whole student information detail is used as the input dataset. Table 1. Student related variables ATTRIBUTES VARIABLES Student ID Student ID Gender Male/ Female Students category Unreserved/ OBC/ SC/ ST Medium of Teaching Hindi/English/ Local Figure 2: Description of the working Random forest[1] Stream Science/ Arts/ Commerce 10th Grade Excellent/ Average/ Poor 3. RESULTS & DISCUSSION 12th Grade Excellent/ Average/ Poor For this experiment, 200 samples were taken into consideration. Type of coaching Online/ offline The table shows the accuracy in terms of percentage for different classifiers with the increasing data set size. To predict the change Scholarship Yes /No in behavior, the Random forest technique is used on student Entrance database. The technique distinguishes between slow learner and Admission type keen learner, recover the failure as soon as possible, takes exam/Management appropriate action to improve the poor section students in a Type of coaching Yes/ No correct manner. The comparison of students performance using Material Text book / Online / Both classifier algorithms like decision tree clustering, decision tree, NCC /Scout / Guide / Naïve Bayes, Random forest and outcome concluded that as the Extra curriculum size of data set goes on increasing, Random forest gives better Sports & heritage activities result or accuracy. [6] Yadav, Surjeet Kumar, Brijesh Bharadwaj, and Saurabh Pal "Data mining applications: A comparative study for predicting student's performance." arXiv preprint arXiv, Volume. 1202, pp.4815, February 2012. Table 2: Prediction accuracy Accuracy (%) Dataset size ID3 K-means Naïve Random Byes forest 20 62 40 40 60 80 64 55 62 78 160 72 43 81 79 200 75 54 59 80 CONCLUSION This paper lists a high scope for the students to decide for the brighter future with specific and accurate analysis. As the efficiency, accuracy, and effectiveness play the vital role in the process of Indian education system, use of the Random Forest technique provides us an optimal solution to the real world student’s education. In this paper, we have used the approach of Random Forest to predict the career decision for the 12th passing out students. The use of Random Forest has helped the students to take a correct appropriate decision as per their interest and skills. The final goal is to give a better insight to design a better Indian Education system for Indian students with the effective outcome. This review may extend to larger features to solve complex decision databases in an efficient manner. REFERENCES [1] Rao, K. Prasada, MVP. Chandra Sekhara, and B. Ramesh "Predicting Learning Behavior of Students using Classification Techniques." International Journal of Computer Applications, Volume 139, Issues 7, pp: 0975 – 8887, April 2016. [2] P.Veeramuthu "Analysis of Student Result Using Clustering Techniques" International Journal of Computer Science and Information Technologies, Volume 5, Issues 4, pp: 5092- 5094, 2014. [3] Goyal, Monika, and Rajan Vohra "Applications of data mining in higher education." International journal of computer science, Volume 9, Issues 2, pp: 113, March 2012. [4] Hijazi and Naive, “Factors Affecting Students’ Performance” e-Journal of Sociology, Volume 3, Issues 1 , pp: 2, January 2006. [5] Dutt and Ashish. "Clustering algorithms applied in educational data mining." International Journal of Information and Electronics Engineering, Volume 5, Issues.2, pp:112, March 2015.