=Paper= {{Paper |id=Vol-1819/edudm2017-paper6 |storemode=property |title= |pdfUrl=https://ceur-ws.org/Vol-1819/edudm2017-paper6.pdf |volume=Vol-1819 |authors=Subhalaxmi Panda,P. A Pattanaik,Tripti Swarnkar |dblpUrl=https://dblp.org/rec/conf/indiaSE/PandaPS17 }} ==== https://ceur-ws.org/Vol-1819/edudm2017-paper6.pdf
    A Higher Education Predictive Model Using Data Mining
                         Techniques
          Subhalaxmi Panda                                    P. A Pattanaik                                Tripti Swarnkar
Department of Computer Science & Department of Computer Science &          Department of Computer
Engineering, Siksha 'O' Anusandhan Engineering, Siksha 'O' Anusandhan Application, Siksha 'O' Anusandhan
 University, Bhubaneswar, Odisha,   University, Bhubaneswar, Odisha,   University, Bhubaneswar, Odisha,
               INDIA.                             INDIA.                             INDIA.
       Ph : +91-9439057546                Ph : +91-9692998983               Ph : +91- 9437130794
 jinisubha.666@gmail.com                            priyadarshiniadyashapattanaik triptiswarnakar@soauniversity.
                                                         @soauniversity.ac.in                  ac.in


ABSTRACT
The main objective of the higher educational organization is to             There exist lack of efficient and enough knowledge in Indian
provide high quality and necessary education to its students. The           educational system which hampers the system management to get
two goals of data mining in Indian education system is to analyze           their quality objectives. Thus, data mining is considered as the
and enhance the chronicle way of recent educational data mining             most suitable technology which provides additional insight into
advances development; the second is to preserve, organize and               the industrial as well as educational sectors helping in taking
discuss the content of the result which is produced by a data               better decisions and motivating them to perform effectively. Data
mining approach. The use of various data mining techniques such             mining technology acts as a bridge between the lacunas and
as random forest, decision tree, etc in Indian education processes          Indian educational system. Data mining approach leads to some
will help to improve students' performance and provide a broad              data mining techniques which will help to improve the
decision management skill in selection of courses as per their              effectiveness, efficiency and the accuracy of the processes. As a
retention rate. This paper focuses on the model representation for          result, this development helps in improving the Indian educational
analyzing the different data mining techniques in an Indian                 system by increasing educational system efficiency, minimizing
education system. Also the paper reviews a comparative study of             students drop-out rate, gradually increasing students promotion
ID3, K-Means, Naïve Bayes, Random Forest algorithm. In this                 rate, students retention rate, simultaneously educational
paper, we have proposed the approach of Random Forest to                    improvement rate, students success, increase in students learning
predict the career decision for the 12th passing out students. The          rate[6]. So, to achieve the overall quality improvement, we need
use of Random Forest has helped the students to take a correct              some data mining techniques in the system that helps the decision
appropriate decision as per their interest and skills and acts a            makers to act smartly. Random Forest is one of the dynamic
career counselor toolbox.                                                   ensemble learning techniques which helps the students to take
                                                                            correct decision for their appropriate career choices after board
                                                                            exams. This data mining technique instructs the student with a
Keywords                                                                    particular pathway to direct his/her brighter career in an effective
                                                                            manner.
Indian Education System; Data Mining; Random Forest

1. INTRODUCTION                                                             2. METHODS
Education is an attempt or effort of the senior people to spread            Data mining in Indian education system has some extend
their knowledge to the younger people of society. It is thus an             overcome the lacunae by various techniques. It is gaining
institution, which plays a vital role in maintaining the perpetuation       popularity because of effective, efficient and accurate towards
of culture by integrating an individual with his society. But in            Indian education system. The dataset used in this study contains
India, the education system has some serious lacunae[3].                    records of class 10th and 12th students of career counseling. The
Nowadays the important challenges in the educational                        data set is used to improve the performances, predict, or focus on
organization are, not having more efficient, effective and accurate         skills of students by using different classification techniques.
educational processes. Nowadays the important challenges in the
educational organization are, not having more efficient, effective          Figure 1 demonstrates the whole working of the proposed model
and accurate educational processes.                                         to give a broad understanding to the students about their career
                                                                            counseling. In the first stage, information about students of class
                                                                            10th and 12th were collected and is named as data pre-processing
                                                                            stage. In the second stage, remove the unnecessary information
                                                                            and only relevant data will be fed to the database. After
Copyright ©2017 for the individual papers by the papers’ authors. Copying   addressing the students information, the dataset is tried with
permiŠed for private and academic purposes. Œis volume is published and     different algorithms like ID3, K-Means, Naïve Bayes and Random
copyrighted by its editors.
                                                                            Forest[1]. K-Means technique is one of the old and most widely
used algorithms used for clustering larger information based                                           / Both
databases. Naïve Bayes is one of the statistical classifier
techniques which act as a hypothesis for a set of estimating                Efficiency                 Good/Average/Poor
attributes in a database. This technique helps to detect the                                           Service, Business,
effectiveness of specific attribute for a given class and its               Father's occupation
                                                                                                       Agriculture, Retired, NA
relationship with other classes[2][5]. The other algorithm is the                                      House-wife (HW), Service,
Random Forest, which aims in the first randomization through                Mother's occupation
                                                                                                       Retired, NA
bagging. This approach of using Random Forest helps on handling
                                                                            Parental income status     High Medium/ Low
missing values and category predictors and problems. The third
last stage states the application of Random forest algorithm to the
training data set with better output and the performance of each
student are evaluated[6].
                                                                      2.1 RANDOM FOREST

                                                                      The Random forest concept was first introduced by Tin Kam Ho.
                                                                      Random forests or random decision forest is a learning technique
                                                                      for classification and regression. It is used in the construction
                                                                      of decision trees at training time and gives output classes that is in
                                                                      the form of the classification classes or mean prediction
                                                                      (regression) of the individual tree[1].

                                                                      Basic Random forest Algorithm:-


                                                                      Consider Nstudent be the no. of students to create for each of N-
                                                                      students iterations. Where mtry is no. of predictors to try at each
                                                                      split.
                                                                               Choose a new bootstrap sample from the training set.
                                                                               Develop an un-pruned tree on this bootstrap.
                                                                               Arbitrarily, choose Mtry predictors and find the best split
                                                                                using only these predictors at each internal node.
                                                                               Each Nstudent leads to the largest extent possible with no
                                                                                pruning.
            Figure 1. Paradigm of Proposed model


The training data set, shown in Table: 1 contains detail
information of the student like Student ID, Gender, etc. The whole
student information detail is used as the input dataset.
                 Table 1. Student related variables
         ATTRIBUTES                    VARIABLES
      Student ID                Student ID
      Gender                    Male/ Female
      Students category         Unreserved/ OBC/ SC/ ST
      Medium of Teaching        Hindi/English/ Local                      Figure 2: Description of the working Random forest[1]
      Stream                    Science/ Arts/ Commerce
      10th Grade                Excellent/ Average/ Poor              3. RESULTS & DISCUSSION
      12th Grade                Excellent/ Average/ Poor              For this experiment, 200 samples were taken into consideration.
      Type of coaching          Online/ offline                       The table shows the accuracy in terms of percentage for different
                                                                      classifiers with the increasing data set size. To predict the change
      Scholarship               Yes /No                               in behavior, the Random forest technique is used on student
                                Entrance                              database. The technique distinguishes between slow learner and
      Admission type                                                  keen learner, recover the failure as soon as possible, takes
                                exam/Management
                                                                      appropriate action to improve the poor section students in a
      Type of coaching          Yes/ No
                                                                      correct manner. The comparison of students performance using
      Material                  Text book / Online / Both             classifier algorithms like decision tree clustering, decision tree,
                                NCC /Scout / Guide /                  Naïve Bayes, Random forest and outcome concluded that as the
      Extra curriculum                                                size of data set goes on increasing, Random forest gives better
                                Sports & heritage activities
                                                                      result or accuracy.
                                                                         [6]   Yadav, Surjeet Kumar, Brijesh Bharadwaj, and Saurabh Pal
                                                                               "Data mining applications: A comparative study for
                                                                               predicting student's performance." arXiv preprint arXiv,
                                                                               Volume. 1202, pp.4815, February 2012.




                   Table 2: Prediction accuracy


                                   Accuracy (%)
 Dataset
 size          ID3           K-means       Naïve         Random
                                           Byes          forest
      20           62            40           40              60
      80           64            55            62             78
      160          72            43            81             79
      200          75            54            59             80


 CONCLUSION

 This paper lists a high scope for the students to decide for the
 brighter future with specific and accurate analysis. As the
 efficiency, accuracy, and effectiveness play the vital role in the
 process of Indian education system, use of the Random Forest
 technique provides us an optimal solution to the real world
 student’s education. In this paper, we have used the approach of
 Random Forest to predict the career decision for the 12th passing
 out students. The use of Random Forest has helped the students to
 take a correct appropriate decision as per their interest and skills.
 The final goal is to give a better insight to design a better Indian
 Education system for Indian students with the effective outcome.
 This review may extend to larger features to solve complex
 decision databases in an efficient manner.

 REFERENCES

[1]   Rao, K. Prasada, MVP. Chandra Sekhara, and B. Ramesh
      "Predicting Learning Behavior of Students using
      Classification Techniques." International Journal of
      Computer Applications, Volume 139, Issues 7, pp: 0975 –
      8887, April 2016.
[2]   P.Veeramuthu "Analysis of Student Result Using Clustering
      Techniques" International Journal of Computer Science and
      Information Technologies, Volume 5, Issues 4, pp: 5092-
      5094, 2014.
[3]   Goyal, Monika, and Rajan Vohra "Applications of data
      mining in higher education." International journal of
      computer science, Volume 9, Issues 2, pp: 113, March 2012.
[4]   Hijazi and Naive, “Factors Affecting Students’ Performance”
      e-Journal of Sociology, Volume 3, Issues 1 , pp: 2, January
      2006.
[5]   Dutt and Ashish. "Clustering algorithms applied in
      educational data mining." International Journal of
      Information and Electronics Engineering, Volume 5, Issues.2,
      pp:112, March 2015.