IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.
6, June 2011                                   243
    Prediction of Students' Educational Status Using CART
Algorithm, Neural Network, and Increase in Prediction Precision
                  Using Combinational Model
Mohammad Tari,                       Behrouz Minaei ,              Ahmad Farahi,           Mohammad Niknam Pirzadeh
Tehran Payame Noor                    Iran University of Science     Payame Noor                 Tehran Payame Noor
University                            and Technology                  University                      University
                                                                    Future prediction in various fields has ever been
Summary                                                             interesting and attractive for human being. Surely, it can
In this paper, using CART and Neural Network data                   be said that future prediction and transformations trend in
mining algorithms we have dealt with the study of                   all areas is one of the basic and constant distresses of high
effective factors in the rate of becoming unqualified               and intermediate-level managers. However, there have
among students of Payame Noor University of Qom                     always been numerous difficulties against it that have
Province. After creating models using algorithms                    made precise and reliable predictions almost impossible.
separately, we have combined models in Clementine 12.0              Existence of too many and in many cases hidden
software through Ensemble Node and have created the                 parameters have changed such instances into highly
combinational model and finally with Analysis Node we               complicated problems that mathematical gigantic
have evaluated models and compared results. In the                  algorithms have become incapable of presenting an
created model using neural network algorithm, in input              appropriate technique to build an efficient prediction
layer 44 neurons, in hidden layer 3 neurons, and in output          model.
layer 1 neuron are created, also in CART model a decision           In two recent decades, simultaneous with emergence of
tree with depth of 5 is created. Considering that the               artificial intelligence and its combination with well-
number of fields in data bank is high, using Feature                established science of statistics along with advanced and
Selection Node and selecting the target, we have deleted            innovative algorithms like genetic algorithm, metaheuristic
those fields that had less influence on the target. This            methods, artificial neural networks, and … a wide
matter has decreased the model complexity to some extent.           revolution has been made in this arena.
Key words:                                                          Data mining as a new and flourishing area in the field of
mining, neural network, CART, Payame Noor University of Qom         presenting predictor models have applied and combined
Province, Ensemble                                                  different kinds of statistical techniques, artificial intelligent
                                                                    and innovative algorithms so that it can discover reliable
                                                                    algorithms and models from among of gigantic data
1. Introduction                                                     storages to predict intended parameters.
that education always encounters too much data and
information about universities, students, faculty members,          2. CRISP Methodology
personnel, pecuniary resources, etc and most of time, this
data can contain valuable information and patterns, so it           Data mining1 techniques are among scientific modern
seems that one of the most important applications of data           techniques that are used in description, explanation,
mining is in higher education. Nowadays, there are wide             prediction, and control of phenomena. These techniques
data banks of students' characteristics that include family,        measure, explain, and predict the correlation degree among
educational, and … characteristics. Finding patterns and            variables. Data mining methods influence not only on
knowledge concealed in this information can considerably            analytic aspects of studies but also on designing and tools
help decision-makers of higher education arena.                     of data collection for decision-making and solving
Finding concealed patterns and knowledge from                       problems. The most successful data mining projects are
educational systems' data can considerably help decision-           implemented in the framework of a standard process,
makers of higher education arena in advancement and                 which is presented by a work team in SPPS Company in
improvement of educational processes like planning,                 the form of projects named CRISP-DM. According to
enrollment, evaluation, and consultation.                           CRISP-DM, a certain data-mining project has a six-stage
                                                                    life cycle that shows succession of stages. Each stage of
  Manuscript received June 5, 2011
  Manuscript revised June 20, 2011
244                       IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011
stages' succession often includes the result of correlation              4.1. Tables Combination
of previous stages. The most important correlation among
stages is showing arrows. Recurrent feature of CRISP                     In this stage that is a part of data preparation, we combine
indicates external cycle that often leads to a solution for              tables and the result is a new table with the specification of
research or business problem with additional remarkable                  table 3 that is used as data mine for algorithms.
questions. In figure 1, you can observe stages of this                   Through nodes that are data preparation special, we
methodology [1].                                                         implement some changes on this data mine.
                                                                         Table (3): List of created combinational table fields
                                                                          Field     Field type          Field name       Field type
                                                                          name
                                                                          Gender Male/female            unqualified      numerical
                                                                          Native Yes/no                 successively numerical
                                                                                                        unqualified
                                                                          Field     Fields of study Passed               numerical
                                                                          of        list                credits
                                                                          study
                                                                          Course Formal/compreh Kind                of ration
                                                                                    ensive              admission
                                                                          Total     numerical           age              numerical
                                                                          averag
  Figure (1): Stages of CRISP-DM Methodology
                                                                          e
3. Table Applied                                                         4.2. Data anomaly3 recognition
Table (1): Basic information                                             One of the methods of preparing data is to find
 Student      First      Kind     of   Admission       Birth   native    anomalous data from the tables and delete them in order to
 Number       name and   admission     Term            Date              gain results that are more precise through models. To do
              surname    in                                              this, we attach data mine to Anomaly node and next, we
                         university
                                                                         add the created model to the project, and then by
Some unnecessary fields have been deleted.
                                                                         adjustment of model, anomalous data in models creation
Table (2): Lessons
Lesso      Lesso   Ter    Sco     Theor    Practical      Explanations
                                                                         are deleted. In the figure, you observe the way of model
n's        n's     m      re      etical   credit                        creation.
code       name                   credit
4. Data preparation
Pre processing2: The importance of data preparation is
because of the reality that "lack of data of quality is equal
to lack of quality in results of mining" and improper input
leads to improper output. In table 2, the importance of data
preparation is compared to the importance of other steps of
knowledge discovery using data mining. [2]
                                                                                      Figure (2): Data Anomaly Model
 Data mining step    Percent       of Percent         of                 As you can see in figure number 2, first, we attached data
                     time      spent importance       in                 mine to Anomaly algorithm and after creating the model,
                     from       total work's       final                 we added it to the project. Indeed, the technique of
                     work              success                           anomalous data recognition is use of clustering methods.
 Data preparation    75                75
 Data investigation 20                 15                                4.3. Feature Selection Technique
 Data modeling       5                 10
                                                                         A technique is used to decrease the number of variables
  Table (2): comparison of data preparation importance                   before implementation of data mining algorithm. Irrelevant
IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011                           245
variables can have negative effect on prediction duty, or      processor system that a large number of them in the form
can complicate the calculations.                               of a complex work together like brain hormones in order
In order to implement algorithms on data banks, we             to solve special problems like pattern detection or data
decreased the number of required fields through this           classification through learning process. [3]
technique.
To do this work, first, we create a mine node (figure 3) and
then link data mine, which Anomaly technique has been          6. Clementine software
implemented on it to that feature selection.
                                                               First, Ulshan, Friedman, Barmyman, and Stone designed
                                                               CART algorithm for trees of regression and classification
                                                               in 1984. The operation method of this algorithm is named
                                                               Surrogate Spiliting. This algorithm includes a recurrent
                                                               method. In every stage, CART algorithm divides
                                                               instructional records into two subsets so that the records of
                                                               each subset are more homogeneous than previous subsets.
                                                               These divisions continue until conditions of stop are
                                                               established. In CART, the best breaking point or
                                                               assignment of the value of impurity parameter is
                                                               determined. If the best break for a branch makes impurity
                                                          F
                                                               less than the defined extent, that split will not be made.
  igure (3): implementation of Feature Selection model
                                                               Here, the concept of impurity is assigned to the degree of
                                                               similarity between target field value and records arrived to
For this model, successively unqualified field was selected
                                                               a node. If 100 percent of the samples in a node are placed
as target field and after algorithm implementation that 28
                                                               in a specific category of the target field, that node is
fields have been selected as inputs, 3 fields were deleted
                                                               named pure. It is remarkable that in CART algorithm, a
from list of fields and 25 fields were selected as inputs of
                                                               foreteller field may be often used in different levels of
neural network and CART algorithms.
                                                               decision tree. In addition, this algorithm supports
This method is one of the supervisory learning techniques.
                                                               categorical and continues types of foreteller and target
When outcomes are two classes, linear regression is not so
                                                               fields. [3]
efficient; in this state, applying this technique is more
appropriate. The other point is that this method is a
nonlinear regression technique and it is not necessary for     7. Model Creation
data to have linear state. If we want to say the reason for
use of logistic regression, we should argue that in linear     Because of lengthiness of data preparation part, we have
regression not only outcomes should be in numerical            given up to cite its stages in this section.
forms, but also variables should be in numerical forms as      To create the model, first we add a data mine node to the
well. Hence, the states that are in sort form should change    project and then add a partition node to the project in order
to numerical forms.                                            to divide data to two parts of test and learning that 30
                                                               percent of data are test data and its 70 percent are
4.4. Filtration and deleting wide data                         instructional data.
                                                               Next stage after data conversion is data type determination
To make a better model, in addition to fields deleted by       using type node; after this stage, feature selection
Feature Selection algorithm, using filter node we delete       capability is used for determination of fields' importance in
those fields that do not influence on models' output.          prediction. This leads to the selection of those fields that
Therefore, from 28 total fields, 8 fields were deleted, and    have more importance in prediction.
finally 15 fields were transferred to algorithms for model     In subsequent stage, we add models, which are CART and
implementation.                                                Neural Net to the project.
In addition, using Select node, we deleted from the table      When models are attached, we implement them and then
those data, which their values were null.                      link models' outputs to each other; and after linking
                                                               models, using Ensemble node we combine them with each
                                                               other. In figure 4, you observe the created model.
5. Neural Network4
Neural Network or ANN is an instance of processing
system, which is inspired from           biological neural
networks like brain. Key organ of this new structure is data
246                    IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011
                                                                       Figure (5): Neural Network's Gain Graph
       Figure (4): the created combinational model
7.1. Evaluation of Neural Network Model
To evaluate neural network model, first we add an analysis
node to the project and then attach it to the created model
and implement it.
Minimum Error                    -1.68
Maximum Error                    2.722
Mean Error                      -0.045                          Figure (6): the importance of variables in neural network
Mean Absolute Error               0.28                                                   model
Standard Deviation                0.35
Linear correlation               0.987                         7.2. Evaluation of CART Model
Occurrences                      1,678
                                                               In order to evaluate CART model, first we add an analysis
                                                               node to the created model and implement it.
Minimum Error: the least difference between predicted
                                                               Minimum Error                   -3.531
data and observed data
                                                               Maximum Error                    4.469
Maximum Error: the most difference between predicted
                                                               Mean Error                         -0.0
data and observed data
                                                               Mean Absolute Error              0.276
Mean Error: the observance of errors' average among the
                                                               Standard Deviation               0.516
total of records
                                                               Linear correlation               0.971
Mean Absolute Error: the observance of the average of
                                                               Occurrences                      1,678
errors' absolute value among the total of records
Standard Deviation: observance of Standard deviation of
the total of errors
Linear Correlation: linear Correlation between observed
data and real data
Occurrences: the number of records used in prediction
In figure 6, the degree of correlation or the importance of
fields in target field has been shown that fields of
successively unqualified and number of unqualified are the
most important and field of term type is the least important
one in target field prediction.
                                                                        Figure (7): CART Model's Gain Graph
IJCSNS International Journal of Computer Science and Network Security, VOL.11 No.6, June 2011                           247
                                                              Minimum Error                     -1.818
                                                              Maximum Error                      3.596
                                                              Mean Error                        -0.022
                                                              Mean Absolute Error                 0.24
                                                              Standard Deviation                 0.348
                                                              Linear correlation                 0.992
                                                              Occurrences                        1,678
                                                              As we see, the degree of linear correlation has increased to
                                                              0.992; that the precision of model has increased eminently
                                                              and       model's        errors       have        decreased.
Figure (8): the importance of variables in CART Model
In this stage, using table No.1 that is a table with almost
one million records for 5 successive years log of students'
use of internet, we can find maximum, minimum, and
average values in terms of kilobytes for students' monthly
use.                                                              Figure (10): Combinational model's Gain Graph
7.3. Evaluation of Combinational Model
                                                              8. Conclusion
Ensemble node combines 2 or more models so that a more
precision is gained compared to models that are               Contemporary universities and higher education
implemented separately; through combining predictions         institutions are drowned in a mass of data and information
from prediction models, limitations that are created in       that in most cases, their use is limited to current affairs and
single models are deleted and a more precision can be         data are not yet used in strategic decision-making. Data
achieved. Most of time combinational models have a better     mining which its use is developing day-to-day can lead to
result than separate models. After using Ensemble node,       the use of existing information in higher education
we can use Analysis node to compare the results of            institutions and centers in strategic decision-making
combinational model to separate model.                        domains. In addition, created models from instructional
                                                              data, using data mining algorithms can be utilized as a
                                                              decision support system in educational systems and play
                                                              an important role in advancement of universities' scientific
                                                              level.
                                                              REFERENCES
                                                              [1] http://www.crisp-dm.org
                                                              [2] Andrew W. Moore. "Regression and Classification with
                                                                  Neural Networks". School of Computer Science Carnegie
                                                                  Mellon University. 2001.
                                                              [3] Mosavi, M.R., "A Comparative Study between Performance
Figure(9):Models' combination through ensemble node               of Recurrent Neural Network and Kalman Filter for DGPS
                                                                  Corrections Prediction", IEEE Conference on Signal
As you see in figure number 9, after creation of 2 models         Processing (ICSP 2004), China, Vol.1, pp.356-359, August
                                                                  31-4, September, 2004.
we link them to each other and then combine them through
                                                              [4] Ramos, V. and Abraham, A., "Evolving a Stigmergic Self-
Ensemble node. After implementation of combinational              Organized Data-Mining", Conf. on Intelligent Systems,
model, in order to evaluate model, we attach it to Analysis       Design and Applications (ISDA-04), 2004, pp.725-730.
mode.