0% found this document useful (0 votes)

56 views35 pages

Rainforest

The document is a tutorial overview on advances in decision tree construction, covering both classification and regression trees. It discusses key concepts such as split selection, pruning, data access, and evaluation methods for classifiers. The tutorial aims to provide insights into building accurate and interpretable decision tree models while addressing challenges like misclassification rates and the importance of impurity functions in split selection.

Uploaded by

Arun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views35 pages

Rainforest

Uploaded by

Arun

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Advances in

Decision Tree Construction

Johannes Gehrke
Cornell University
johannes@cs.cornell.edu
http://www.cs.cornell.edu/johannes

Wei-Yin Loh
University of Wisconsin-Madison
loh@stat.wisc.edu
http://www.stat.wisc.edu/~loh

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Tutorial Overview

O Part I: Classification Trees

O Introduction
O Classification tree construction schema
O Split selection
O Pruning
O Data access
O Missing values
O Evaluation
O Bias in split selection
(Short Break)
O Part II: Regression Trees
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Tutorial Overview

O Part I: Classification Trees

T3-1
Classification

Goal: Learn a function that assigns a record

to one of several predefined classes.

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Classification Example

O Example training database Age Car Class

O Two predictor attributes: 20 M Yes
Age and Car-type (Sport, 30 M Yes
Minivan and Truck) 25 T No
O Age is ordered, Car-type is 30 S Yes
categorical attribute
40 S Yes
O Class label indicates
20 T No
whether person bought
product 30 M Yes
O Dependent attribute is 25 M Yes
categorical 40 M Yes
20 S No
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Types of Variables

O Numerical: Domain is ordered and can be

represented on the real line (e.g., age, income)
O Nominal or categorical: Domain is a finite set
without any natural ordering (e.g., occupation,
marital status, race)
O Ordinal: Domain is ordered, but absolute
differences between values is unknown (e.g.,
preference scale, severity of an injury)

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-2
Definitions

O Random variables X1, …, Xk (predictor variables)

and Y (dependent variable)
O Xi has domain dom(Xi), Y has domain dom(Y)
O P is a probability distribution on
dom(X1) x … x dom(Xk) x dom(Y)
Training database D is a random sample from P
O A predictor d is a function
d: dom(X1) … dom(Xk) Æ dom(Y)

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Classification Problem

O C is called the class label, d is called a classifier.

O Take r be record randomly drawn from P.
Define the misclassification rate of d:
RT(d,P) = P(d(r.X1, …, r.Xk) != r.C)

Problem definition: Given dataset D that is a random

sample from probability distribution P, find classifier d
such that RT(d,P) is minimized.
(More on regression problems in the second part of the
tutorial.)

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Goals and Requirements

Goals:
O To produce an accurate classifier/regression function
O To understand the structure of the problem
Requirements on the model:
O High accuracy
O Understandable by humans, interpretable
O Fast construction for very large training databases

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-3
What are Decision Trees?

Age Minivan
<30 >=30 YES
Sports, YES
Car Type YES Truck
NO
Minivan Sports, Truck

YES NO
0 30 60 Age
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Decision Trees

O A decision tree T encodes d (a classifier or

regression function) in form of a tree.
O A node t in T without children is called a leaf
node. Otherwise t is called an internal node.
O Each internal node has an associated splitting
predicate. Most common are binary predicates.
Example splitting predicates:
O Age <= 20
O Profession in {student, teacher}
O 5000*Age + 3*Salary – 10000 > 0

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Internal and Leaf Nodes

Internal nodes:
O Binary Univariate splits:
O Numerical or ordered X: X <= c, c in dom(X)
O Categorical X: X in A, A subset dom(X)
O Binary Multivariate splits:
O Linear combination split on numerical variables:
Σ aiXi <= c
O k-ary (k>2) splits analogous
Leaf nodes:
O Node t is labeled with one class label c in dom(C)

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-4
Example

Encoded classifier:
If (age<30 and
Age carType=Minivan)
Then YES
<30 >=30
If (age <30 and
Car Type YES (carType=Sports or
carType=Truck))
Minivan Sports, Truck Then NO
If (age >= 30)
YES NO Then NO

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Evaluation of Misclassification Error

Problem:
O In order to quantify the quality of a classifier d, we need
to know its misclassification rate RT(d,P).
O But unless we know P, RT(d,P) is unknown.
O Thus we need to estimate RT(d,P) as good as possible.

Approaches:
O Resubstitution estimate
O Test sample estimate
O V-fold Cross Validation

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Resubstitution Estimate

The Resubstitution estimate R(d,D) estimates

RT(d,P) of a classifier d using D:
O Let D be the training database with N records.
O R(d,D) = 1/N Σ I(d(r.X) != r.C))
O Intuition: R(d,D) is the proportion of training
records that is misclassified by d
O Problem with resubstitution estimate:
Overly optimistic; classifiers that overfit the
training dataset will have very low resubstitution
error.
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-5
Test Sample Estimate

O Divide D into D1 and D2

O Use D1 to construct the classifier d
O Then use resubstitution estimate R(d,D2)
to calculate the estimated misclassification
error of d
O Unbiased and efficient, but removes D2
from training dataset D

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

V-fold Cross Validation

Procedure:
O Construct classifier d from D
O Partition D into V datasets D1, …, DV
O Construct classifier di using D \ Di
O Calculate the estimated misclassification error
R(di,Di) of di using test sample Di
Final misclassification estimate:
O Weighted combination of individual
misclassification errors:
R(d,D) = 1/V Σ R(di,Di)
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Cross-Validation: Example

d3
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-6
Cross-Validation

O Misclassification estimate obtained through

cross-validation is usually nearly unbiased
O Costly computation (we need to compute d, and
d1, …, dV); computation of di is nearly as
expensive as computation of d
O Preferred method to estimate quality of learning
algorithms in the machine learning literature

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Tutorial Overview

O Part I: Classification Trees

Decision Tree Construction

O Top-down tree construction schema:
O Examine training database and find best splitting
predicate for the root node
O Partition training database
O Recurse on each child node
BuildTree(Node t, Training database D, Split Selection Method S)
(1) Apply S to D to find splitting criterion
(2) if (t is not a leaf node)
(3) Create children nodes of t
(4) Partition D into children partitions
(5) Recurse on each partition
(6) endif
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-7
Decision Tree Construction (Contd.)

O Three algorithmic components:

O Splitselection (CART, C4.5, QUEST, CHAID,
CRUISE, …)
O Pruning (direct stopping rule, test dataset
pruning, cost-complexity pruning, statistical
tests, bootstrapping)
O Data access (CLOUDS, SLIQ, SPRINT,
RainForest, BOAT, UnPivot operator)

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Split Selection Methods

O Multitude of split selection methods in the

literature
O In this tutorial:
O Impurity-basedsplit selection: CART (most
common in today’s data mining tools)
O Model-based split selection: QUEST

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Split Selection Methods: CART

O Classification And Regression Trees

(Breiman, Friedman, Ohlson, Stone, 1984;
considered “the” reference on decision tree
construction)
O Commercial version sold by Salford Systems
(www.salford-systems.com)
O Many other, slightly modified implementations
exist (e.g., IBM Intelligent Miner implements the
CART split selection method)

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-8
CART Split Selection Method

Motivation: We need a way to choose

quantitatively between different splitting
predicates
O Idea: Quantify the impurity of a node
O Method: Select splitting predicate that
generates children nodes with minimum
impurity from a space of possible splitting
predicates

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Intuition: Impurity Function

X1 X2 Class
X1<=1 (50%,50%)
1 1 Yes
1 2 Yes
1 2 Yes Yes No
1 2 Yes
(83%,17%) (0%,100%)
1 2 Yes
1 1 No
2 1 No X2<=1 (50%,50%)
2 1 No
2 2 No No Yes
2 2 No
(25%,75%) (66%,33%)
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Impurity Function

Let p(j|t) be the proportion of class j training records at

node t. Then the node impurity measure at node t:
i(t) = phi(p(1|t), …, p(J|t))

Properties:
O phi is symmetric
O Maximum value at arguments (J-1, …, J-1)
O phi(1,0,…,0) = … =phi(0,…,0,1) = 0

The reduction in impurity through splitting predicate s (t splits into

children nodes tL with impurity phi(tL) and tR with impurity phi(tR))
is:
∆phi(s,t) = phi(t) – pL phi(tL) – pR phi(tR)
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-9
Example

Root node t: p(1|t)=0.5; p(2|t)=0.5 X1<=1 (50%,50%)

Left child node t:
P(1|t)=0.83; p(2|t)=-.17
Impurity of root node: phi(0.5,0.5) Yes No
Impurity of left child node:
phi(0.83,0.17) (83%,17%) (0%,100%)
Impurity of right child node:
phi(0.0,1.0)
Impurity of whole tree:
0.6* phi(0.83,0.17) + 0.4 * phi(0,1)
Impurity reduction:
phi(0.5,0.5) - 0.6*phi(0.83,0.17) - 0.4*phi(0,1)

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Error Reduction as Impurity Function

T
X1<=1 (50%,50%)
O Possible impurity 1
function:
Resubstitution error Yes No
R(T,D).
(83%,17%) (0%,100%)
O Example:
R(no tree, D) = 0.5 T2 X2<=1 (50%,50%)
R(T1,D) = 0.6*0.17
R(T2,D) =
0.4*0.25 + 0.6*0.33 No Yes

(25%,75%) (66%,33%)
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Problems with Resubstitution Error

O Obvious problem:
There are situations
where no split can
decrease impurity X3<=1 (80%,20%)
O Example:
R(no tree, D) = 0.2
R(T1,D) Yes Yes
=0.6*0.17+0.4*0.25
=0.2 6: (83%,17%) 4: (75%,25%)
O More subtle problems
exist

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-10
Remedy: Concavity

Concave Impurity Functions

Use impurity functions that are concave: phi’’ < 0
Example concave impurity functions
O Entropy: phi(t) = - Σ p(j|t) log(p(j|t))
O Gini index: phi(t) = Σ p(j|t)2

Nonnegative Decrease in Impurity

Theorem: Let phi(p1, …, pJ) be a strictly concave function on j=1, …, J,
Σj pj = 1.
Then for any split s: ∆phi(s,t) >= 0
With equality if and only if: p(j|tL) = p(j|tR) = p(j|t), j = 1, …, J

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

CART Univariate Split Selection

O Use gini-index as impurity function

O For each numerical or ordered attribute X,
consider all binary splits s of the form
X <= x
where x in dom(X)
O For each categorical attribute X, consider all
binary splits s of the form
X in A, where A subset dom(X)
O At a node t, select split s* such that
∆phi(s*,t) is maximal over all s considered
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

CART: Shortcut for Categorical Splits

Computational shortcut if |Y|=2.

O Theorem: Let X be a categorical attribute with
dom(X) = {b1, …, bk}, |Y|=2, phi be a concave
function, and let
p(X=b1) <= … <= p(X=bk).
Then the best split is of the form:
X in {b1, b2, …, bl} for some l < k
O Benefit: We need only to check k-1 subsets of
dom(X) instead of 2(k-1)-1 subsets

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-11
Problems with CART Split Selection

O Biased towards variables with more splits

(M-category variable has 2M-1-1) possible
splits, an M-valued ordered variable has
(M-1) possible splits
(Explanation and remedy later)
O Computationally expensive for categorical
variables with large domains

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

QUEST: Model-based split selection

“The purpose of models is not to fit the data

but to sharpen the questions.”
Karlin, Samuel (1923 - )
(11th R A Fisher Memorial Lecture, Royal Society 20, April 1983.)

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Split Selection Methods: QUEST

O Quick, Unbiased, Efficient, Statistical Tree

(Loh and Shih, Statistica Sinica, 1997)
Freeware, available at www.stat.wisc.edu/~loh
Also implemented in SPSS.

O Main new ideas:

O Separate splitting predicate selection into variable
selection and split point selection
O Use statistical significance tests instead of impurity
function

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-12
QUEST Variable Selection

Let X1, …, Xl be numerical predictor variables, and

let Xl+1, …, Xk be categorical predictor
variables.
1. Find p-value from ANOVA F-test for each numerical
variable.
2. Find p-value for each X2-test for each categorical
variable.
3. Choose variable Xk’ with overall smallest p-value pk’
(Actual algorithm is more complicated.)

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

QUEST Split Point Selection

CRIMCOORD transformation of categorical

variables into numerical variables:
1. Take categorical variable X with domain
dom(X)={x1, …, xl}
2. For each record in the training database, create
vector (v1, …, vl) where vi = I(X=xi)
3. Find principal components of set of vectors V
4. Project the dimensionality-reduced data onto the
largest discriminant coordinate dxi
5. Replace X with numeral dxi in the rest of the
algorithm

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

CRIMCOORDs: Examples

O Values(X|Y=1) = {4c1,c2,5c3}, values(X|Y=2) = {2c1, 2c2, 6c3}

Î dx1 = 1, dx2 = -1, dx3 = -0.3
O Values(X|Y=1) = {5c1,5c3}, values(X|Y=2) = {5c1, 5c3}
Î dx1 = 1, dx2 = 0, dx3 = 1
O Values(X|Y=1) = {5c1,5c3}, values(X|Y=2) = {5c1, c2, 5c3}
Î dx1 = 1, dx2 = -1, dx3 = 1

Advantages
O Avoid exponential subset search from CART
O Each dxi has the form Σ bi I(X=xi) for some b1, …, bl,
thus there is a 1-1 correspondence between subsets of
X and a dxi

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-13
QUEST Split Point Selection

O Assume X is the selected variable (either numerical, or categorical

transformed to CRIMCOORDS)
O Group J>2 classes into two superclasses
O Now problem is reduced to one-dimensional two-class problem
O Use exhaustive search for the best split point (like in CART)
O Use quadratic discriminant analysis (QDA, next few bullets)
QUEST Split Point Selection: QDA
O Let x1, x2 and s12, s22 the means and variances for the two
superclasses
O Make normal distribution assumption, and find intersections of the
two normal distributions N(x1,s12) and N(x2,s22)
O QDA splits the X-axis into three intervals
O Select as split point the root that is closer to the sample means

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Illustration: QDA Splits

N(0,1) N(2,2.25)

0.5

0.4

0.3

0.2

0.1

0
-1.5 -0.5 0.5 1.5 2.5 3.5
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

QUEST Linear Combination Splits

O Transform all categorical variables to

CRIMCOORDS
O Apply PCA to the correlation matrix of the data
O Drop the smallest principal components, and
project the remaining components onto the
largest CRIMCOORD
O Group J>2 classes into two superclasses
O Find split on largest CRIMCOORD using ES or
QDA

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-14
Key Differences CART/QUEST

Feature QUEST CART

Variable selection Statistical tests ES

Split point selection QDA or ES ES
Categorical variables CRIMCOORDS ES
Monotone transformations
for numerical variables Not invariant Invariant
Ordinal Variables No Yes
Variables selection bias No Yes (No)

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Tutorial Overview

O Part I: Classification Trees

Pruning Methods

O Test dataset pruning

O Directstopping rule
O Cost-complexity pruning (not covered)
O MDL pruning
O Pruning by randomization testing

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-15
Stopping Policies

A stopping policy indicates when further growth of

the tree at a node t is counterproductive.
O All records are of the same class
O The attribute values of all records are identical
O All records have missing values
O At most one class has a number of records
larger than a user-specified number
O All records go to the same child node if t is split
(only possible with some split selection
methods)
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Test Dataset Pruning

O Use an independent test sample D’ to

estimate the misclassification cost using
the resubstitution estimate R(T,D’) at
each node
O Select the subtree T’ of T with the
smallest expected cost

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Reduced Error Pruning

(Quinlan, C4.5, 1993)

O Assume observed misclassification rate at
a node is p
O Replace p (pessimistically) with the upper
75% confidence bound p’, assuming a
binomial distribution
O Then use p’ to estimate error rate of the
node

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-16
Pruning Using the MDL Principle

(Mehta, Rissanen, Agrawal, KDD 1996)

Also used before by Fayyad, Quinlan, and others.

O MDL: Minimum Description Length Principle

O Idea: Think of the decision tree as encoding the
class labels of the records in the training
database
O MDL Principle: The best tree is the tree that
encodes the records using the fewest bits

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

How To Encode a Node

Given a node t, we need to encode the following:

O Nodetype: One bit to encode the type of each node
(leaf or internal node)
For an internal node:
O Cost(P(t)): The cost of encoding the splitting
predicate P(t) at node t
For a leaf node:
O n*E(t): The cost of encoding the records in leaf node
t with n records from the training database (E(t) is
the entropy of t)

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

How To Encode a Tree

Recursive definition of the minimal cost of a node:

O Node t is a leaf node:
cost(t)= n*E(t)
O Node t is an internal node with children nodes t1
and t2. Choice: Either make t a leaf node, or
take the best subtrees, whatever is cheaper:

cost(t) =
min( n*E(t), 1+cost(P(t))+cost(t1)+cost(t2))

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-17
How to Prune

1. Construct decision tree to its maximum

size
2. Compute the MDL cost for each node of
the tree bottom-up
3. Prune the tree bottom-up:
If cost(t)=n*E(t), make t a leaf node.
Resulting tree is the final tree output by
the pruning algorithm.

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Performance Improvements: PUBLIC

(Shim and Rastogi, VLDB 1998)

O MDL bottom-up pruning requires construction of
a complete tree before the bottom-up pruning
can start
O Idea: Prune the tree during (not after) the tree
construction phase
O Why is this possible?
O Calculate a lower bound on cost(t) and compare it
with n*E(t)

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

PUBLIC Lower Bound Theorem

O Theorem: Consider a classification problem with

k predictor attributes and J classes. Let Tt be a
subtree with s internal nodes, rooted at node t,
let ni be the number of records with class label i.
Then
cost(Tt) >= 2*s+1+s*log k + Σ ni
O Lower bound on cost(Tt) is thus the minimum
of:
O n*E+1 (t becomes a leaf node)
O 2*s+1+s*log k + Σ ni (subtree at t remains)
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-18
Large Datasets Lead to Large Trees

O Oates and Jensen (KDD 1998)

O Problem: Constant probability distribution P,
datasets D1, D2, …, Dk with
|D1| < |D2| < … < |Dk|
|Dk| = c |Dk-1| = … = ck |D1|
O Observation: Trees grow
|T1| < |T2| < … < |Tk|
|Tk| = c’ |Tk-1| = … = c’k |T1|
O But: No gain in accuracy due to larger trees
R(T1,D1) ~ R(T2,D2) ~ … ~ R(Tk, Dk)
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Pruning By Randomization Testing

O Reduce pruning decision at each node to a hypothesis test

O Generate empirical distribution of the hypothesis under the null
hypothesis for a node

Node n with subtree T(n) and pruning statistic S(n)

For (i=0; i<K; i++)
1. Randomize class labels of the data at n
2. Build and prune a tree rooted at n
3. Calculate pruning statistic Si(n)
Compare S(n) to empirical distribution of Si(n) to estimate significance
of S(n)
If S(n) is not significant enough compared to a significance level alpha,
then prune T(n) to n

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Tutorial Overview

O Part I: Classification Trees

T3-19
SLIQ

Shafer, Agrawal, Mehta (EDBT 1996)

O Motivation:
O Scalable data access method for CART
O To find the best split we need to evaluate the impurity function
at all possible split points for each numerical attribute, at each
node of the tree
O Idea: Avoids re-sorting at each node of the three through pre-
sorting and maintenance of sort orders
O Ideas:
O Uses vertical partitioning to avoid re-sorting
O Main-memory resident data structure with schema (class label,
leaf node index)
Very likely to fit in-memory for nearly all training databases
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

SLIQ: Pre-Sorting
Age Car Class Age Ind Ind Class Leaf
20 M Yes 20 1 1 Yes 1
30 M Yes 20 6 2 Yes 1
25 T No 20 10 3 No 1
30 S Yes 25 3 4 Yes 1
40 S Yes 25 8 5 Yes 1
20 T No 30 2 6 No 1
30 M Yes 30 4 7 Yes 1
25 M Yes 30 7 8 Yes 1
40 M Yes 40 5 9 Yes 1
20 S No 40 9 10 No 1
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

SLIQ: Evaluation of Splits

Age Ind Ind Class Leaf
Node2 Yes No
20 1 1 Yes 2
Left 2 0
20 6 2 Yes 2
Right 3 2
20 10 3 No 2
25 3 4 Yes 3
5 Yes 3 Node3 Yes No
25 8
30 2 6 No 2 Left 0 1
30 4 7 Yes 2 Right 2 0
30 7 8 Yes 2
40 5 9 Yes 2
40 9 10 No 3
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-20
SLIQ: Splitting of a Node
Age Ind Ind Class Leaf
20 1 1 Yes 4
20 6 2 Yes 5 1
20 10 3 No 5
2 3
25 3 4 Yes 7
25 8 5 Yes 7 4 5 6 7
30 2 6 No 4
30 4 7 Yes 7
30 7 8 Yes 7
40 5 9 Yes 7
40 9 10 No 6
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

SLIQ: Summary

O Uses vertical partitioning to avoid re-

sorting
O Main-memory resident data structure with
schema (class label, leaf node index)
Very likely to fit in-memory for nearly all
training databases

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

SPRINT

Shafer, Agrawal, Mehta (VLDB 1996)

O Motivation:
O Scalable data access method for CART
O Improvement over SLIQ to avoid main-memory data structure
O Ideas:
O Create vertical partitions called attribute lists for each attribute
O Pre-sort the attribute lists

Recursive tree construction:

1. Scan all attribute lists at node t to find the best split
2. Partition current attribute lists over children nodes while
maintaining sort orders
3. Recurse
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-21
SPRINT Attribute Lists
Age Car Class Age Class Ind Car Class Ind
20 M Yes 20 Yes 1 M Yes 1
30 M Yes 20 No 6 M Yes 2
25 T No 20 No 10 T No 3
30 S Yes 25 No 3 S Yes 4
40 S Yes 25 Yes 8 S Yes 5
20 T No 30 Yes 2 T No 6
30 M Yes 30 Yes 4 M Yes 7
25 M Yes 30 Yes 7 M Yes 8
40 M Yes 40 Yes 5 M Yes 9
20 S No 40 Yes 9 S No 10
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

SPRINT: Evaluation of Splits

Age Class Ind
20 Yes 1
Node1 Yes No
20 No 6
20 No 10 Left 1 2
25 No 3 Right 6 1
25 Yes 8
30 Yes 2
30 Yes 4
30 Yes 7
40 Yes 5
40 Yes 9

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

SPRINT: Splitting of a Node

1. Scan all attribute lists to find the best

split
2. Partition the attribute list of the splitting
attribute X
3. For each attribute Xi != X
Perform the partitioning step of a hash-join
between the attribute list of X and the
attribute list of Xi

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-22
SPRINT: Hash-Join Partitioning
Age Class Ind Car Class Ind
20 Yes 1 M Yes 1
20 No 6 Right Child
M Yes 2
20 No 10 Right Child
R
M Yes 7
25 No 3
25 Yes 8 M Yes 8
30 Yes 2 M Yes 9
30 Yes 4 R
30 Yes 7
40 Yes 5 R
40 Yes 9

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

SPRINT: Summary

O Scalable data access method for CART

split selection method
O Completely scalable, can be (and has
been) implemented “inside” a database
system
O Hash-join partitioning step expensive
(each attribute, at each node of the tree)

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

RainForest
(Gehrke, Ramakrishnan, Ganti, VLDB 1998)

Training Database AVC-Sets

A ge Car Clas s Age Yes No
20 M Yes 20 1 2
30 M Yes 25 1 1
25 T No 30 3 0
30 S Yes 40 2 0
40 S Yes
20 T No Car Yes No
30 M Yes Sport 2 1
25 M Yes Truck 0 2
40 M Yes Minivan 5 0
20 S No
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-23
Refined RainForest Top-Down Schema
BuildTree(Node n, Training database D,
Split Selection Method S)

[ (1) Apply S to D to find splitting criterion ]

(1a) for each predictor attribute X
(1b) Call S.findSplit(AVC-set of X)
(1c) endfor
(1d) S.chooseBest();
(2) if (n is not a leaf node) ...

S: C4.5, CART, CHAID, FACT, ID3, GID3, QUEST, etc.

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

RainForest Data Access Method

Assume datapartition at a node is D. Then the

following steps are carried out:

1. Construct AVC-group of the node

2. Choose splitting attribute and splitting predicate
3. Partition D across the children

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

RainForest Algorithms: RF-Write

First scan:

Database
AVC-Sets

Main Memory
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-24
RainForest Algorithms: RF-Write

Second Scan:

Database
Age<30?

Main Memory Partition 1 Partition 2

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

RainForest Algorithms: RF-Write

Analysis:
O Assumes that the AVC-group of the root node fits into main
memory
O Two database scans per level of the tree
O Usually more main memory available than one single AVC-
group needs

Age<30? Database

? Database
?
Main Memory Main Memory Partition 1 Partition 2
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

RainForest Algorithms: RF-Read

First scan:

Database
AVC-Sets

Main Memory
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-25
RainForest Algorithms: RF-Read

Second Scan:

Age<30

Database

AVC-Sets

Main Memory
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

RainForest Algorithms: RF-Read

Third Scan:

Age<30

Sal<20k Car==S

Database

Main Memory
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

RainForest Algorithms: RF-Hybrid

First scan:

Database
AVC-Sets

Main Memory
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-26
RainForest Algorithms: RF-Hybrid

Second Scan:

Age<30

Database

AVC-Sets

Main Memory
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

RainForest Algorithms: RF-Hybrid

Third Scan:

Age<30 Database
Sal<20k Car==S

Main Memory Partition 1 Partition 2 Partition 3 Partition 4

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

RainForest Algorithms: RF-Hybrid

Further optimization: While writing partitions,

concurrently build AVC-groups of as many nodes
as possible in-memory

Age<30
Database
Sal<20k Car==S

Main Memory Partition 1 Partition 2 Partition 3 Partition 4

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-27
BOAT

(Gehrke, Ganti, Ramakrishnan, Loh; SIGMOD 1999)

Age
Training Database

<30 >=30

Left Partition Right Partition

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

BOAT: Algorithm Overview

In-memory Approximate All the

Sample tree, bounds data

Sampling Phase Cleanup Phase

Approximate tree Final tree

with bounds
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Tutorial Overview

O Part I: Classification Trees

O Introduction
O Classification tree construction schema
O Split selection
O Pruning
O Data access
O Missing Values
O Evaluation
O Bias in split selection
(Short Break)
O Part II: Regression Trees
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-28
Missing Values

O What is the problem?

O During computation of the splitting predicate,
we can selectively ignore records with missing
values (note that this has some problems)
O But if a record r misses the value of the
variable in the splitting attribute, r can not
participate further in tree construction
Algorithms for missing values address this
problem.

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Mean and Mode Imputation

Assume record r has missing value r.X, and

splitting variable is X.
O Simplest algorithm:
O If X is numerical (categorical), impute the
overall mean (mode)
O Improved algorithm:
O If X is numerical (categorical), impute the
mean(X|t.C) (the mode(X|t.C))

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Surrogate Splits (CART)

Assume record r has missing value r.X, and

splitting predicate is PX.
O Idea: Find splitting predicate QX’ involving
another variable X’ != X that is most similar to
PX.
O Similarity sim(Q,P|D) between splits Q and P:
Sim(Q,P|D) = |{r in D: P(r) and Q(r)}|/|D|
O 0 <= sim(Q,P|D) <= 1
O Sim(P,P) = 1

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-29
Surrogate Splits: Example
X1 X2 Class
Consider splitting predicate
X1 <= 1. 1 1 Yes
Sim((X1 <= 1), 1 1 Yes
(X2 <= 1)|D) = 1 1 Yes
(3+4)/10 1 2 Yes
Sim((X1 <= 1), 1 2 Yes
(X2 <= 2)|D) = 1 2 No
(6+3)/10 2 2 No
(X2 <= 2) is the preferred 2 3 No
surrogate split. 2 3 No
2 3 No
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Tutorial Overview

O Part I: Classification Trees

Choice of Classification Algorithm?

O Example study: (Lim, Loh, and Shih, Machine

Learning 2000)
O 33 classification algorithms
O 16 (small) data sets (UC Irvine ML Repository)
O Each algorithm applied to each data set
O Experimental measurements:
O Classification accuracy
O Computational speed
O Classifier complexity

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-30
Experimental Setup

Algorithms:
O Tree-structure classifiers (IND, S-Plus Trees, C4.5, FACT,
QUEST, CART, OC1, LMDT, CAL5, T1)
O Statistical methods (LDA, QDA, NN, LOG, FDA, PDA, MDA, POL)
O Neural networks (LVQ, RBF)
Setup:
O 16 primary data sets, created 16 more data sets by adding noise
O Converted categorical predictor variables to 0-1 dummy
variables if necessary
O Error rates for 6 data sets estimated from supplied test sets, 10-
fold cross-validation used for the other data sets

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Results

Rank Algorithm Mean Error Time

1 Polyclass 0.195 3 hours
2 Quest Multivariate 0.202 4 min
3 Logistic Regression 0.204 4 min
6 LDA 0.208 10 s
8 IND CART 0.215 47 s
12 C4.5 Rules 0.220 20 s
16 Quest Univariate 0.221 40 s
…
O Number of leaves for tree-based classifiers varied widely (median
number of leaves between 5 and 32 (removing some outliers))
O Mean misclassification rates for top 26 algorithms are not
statistically significantly different, bottom 7 algorithms have
significantly lower error rates

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Tutorial Overview

O Part I: Classification Trees

T3-31
Bias in Split Selection for ES
Age Yes No
Assume: No correlation 20 15 15
with the class label. 25 15 15
30 15 15
O Question: Should we 40 15 15
choose Age or Car?
O Answer: We should Car Yes No
choose both of them Sport 20 20
equally likely! Truck 20 20
Minivan 20 20
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Formal Definition of the Bias

O Bias: “Odds of choosing X1 and X2 as split

variable when neither X1 nor X2 is correlated
with the class label”

O Formally:
Bias(X1,X2) = Log10(P(X1,X2)/(1-P(X1,X2)),
P(X1,X2): probability of choosing variable X1 over X2

We would like: Bias(X1,X2) = 0 in the Null Case

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Formal Definition of the Bias (Contd.)

Car Yes No
O Example: Synthetic Car1
data with two
Car2
categorical predictor
variables Car3
O X1: 10 categories …
O X2: 2 categories Car10
O For each category:
Same probability of State Yes No
choosing “Yes” (no CA
correlation) NY
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-32
Evidence of the Bias
Gini Entropy

Gain Ratio

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

One Explanation

Theorem: (Expected Value of the Gini Gain)

Assume:
O Two classlabels
O n: number of categories
O N: number of records
O p1: probability of having classlabel “Yes”

Then: E(ginigain) = 2p(1-p)*(n-1)/N

Expected ginigain increases linearly with number

KDD of categories!
2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Bias Correction: Intuition

O Value of the splitting criteria is biased under

the Null Hypothesis.
O Idea: Use p-value of the criterion:
Probability that the value of the criterion under
the Null Case is as extreme as the observed
value

Method:
1. Compute criterion (gini, entropy, etc.)
2. Compute p-value
3. Choose splitting variable
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-33
Correction Through P-Value

O New p-value criterion:

O Maintains “good” properties of your favorite
splitting criterion
O Theorem: The correction through the p-value is
nearly unbiased.
Computation:
1. Exact (randomization statistic; very expensive to compute)
2. Bootstrapping (Monte Carlo simulations; computationally
expensive; works only for small p-values)
3. Asymptotic approximations (G2 for entropy, Chi2 distribution for
Chi2 test; don’t work well in boundary conditions)
4. Tight approximations (cheap, often work well in practice)
KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Tight Approximation

O Experimental evidence
shows that Gamma
distribution approximates
gini-gain very well.
O We can calculate:
O Expected gain:
E(gain) = 2p(1-p)*(n-1)/N
O Variance of gain:
Var(gain) = 4p(1-p)/N2[(1-
6p-6p2) * (sum 1/Ni – (2n-
1)/N) + 2(n-1)p(1-p)]

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

Problem: ES and Missing Value

Consider a training database with the following

schema: (X1, …, Xk, C)
O Assume the projection onto (X1, C) is the
following:

{(1, Class1), (2, Class2), (NULL, Class13), …,

(NULL, Class1N)}
(X1 has missing values except for the first two
records)

O Exhaustive search will very likely split on X1!

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-34
Concluding Remarks Part I

O There are many algorithms available for:

O Split selection
O Pruning
O Data access
O Handling missing values
O Challenges: Performance, getting the “right” model,
data streams, new applications

KDD 2001 Tutorial: Advances in Decision Trees Gehrke and Loh

T3-35

Classification and Regression Tree Construction
No ratings yet
Classification and Regression Tree Construction
18 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
Classification: Decision Trees: Business Analytics Lecture 7/8
No ratings yet
Classification: Decision Trees: Business Analytics Lecture 7/8
35 pages
Decision Trees for Data Mining Students
No ratings yet
Decision Trees for Data Mining Students
30 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
15 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
Siv UNIT-3 Classification DWM PART-A
No ratings yet
Siv UNIT-3 Classification DWM PART-A
12 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
ML Chapter 4 Part2
No ratings yet
ML Chapter 4 Part2
75 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
DM Lab 04
No ratings yet
DM Lab 04
6 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
Decision Trees
No ratings yet
Decision Trees
27 pages
2.12 Chapter 6 Decision Tree
No ratings yet
2.12 Chapter 6 Decision Tree
56 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Classification
No ratings yet
Classification
45 pages
ML - Module-3-Chapter-6 RNSIT
No ratings yet
ML - Module-3-Chapter-6 RNSIT
10 pages
Classification & Prediction Guide
100% (1)
Classification & Prediction Guide
67 pages
Module 04
No ratings yet
Module 04
75 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
Classification and Prediction Guide
No ratings yet
Classification and Prediction Guide
93 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
Decision Trees & Kernel Machines
No ratings yet
Decision Trees & Kernel Machines
39 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
Decision Trees
No ratings yet
Decision Trees
38 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Decision Tree R
No ratings yet
Decision Tree R
5 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
No ratings yet
ESGB - 2025 - Classification and Regression Tress (Enregistré Automatiquement)
43 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
23 pages
Unit-II - Tree Based Methods
No ratings yet
Unit-II - Tree Based Methods
158 pages
ML Unit 03
No ratings yet
ML Unit 03
23 pages
Decision Trees for AI Students
No ratings yet
Decision Trees for AI Students
28 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Decision Trees for CS Students
100% (1)
Decision Trees for CS Students
29 pages
Wid Unit-3 Npre
No ratings yet
Wid Unit-3 Npre
27 pages
Dbms Unit-5
No ratings yet
Dbms Unit-5
44 pages
Os Unit I
No ratings yet
Os Unit I
16 pages
Exercise - 1,2 Excel Lab
No ratings yet
Exercise - 1,2 Excel Lab
4 pages
Unit 2
No ratings yet
Unit 2
14 pages
Os-Unit-3 R16
No ratings yet
Os-Unit-3 R16
12 pages
Dynamic Programming - General Method
No ratings yet
Dynamic Programming - General Method
13 pages
Networking Basics for Beginners
No ratings yet
Networking Basics for Beginners
14 pages
Daa-Unit-2 R16
No ratings yet
Daa-Unit-2 R16
33 pages
Iii-B.Sc Semister-Vi: Cloud Computing
No ratings yet
Iii-B.Sc Semister-Vi: Cloud Computing
17 pages
Dmdw-Unit-1 R16
No ratings yet
Dmdw-Unit-1 R16
17 pages
Graded Quz BUS 1101 - 3
No ratings yet
Graded Quz BUS 1101 - 3
23 pages
KOBORI Et Al-2012-Japanese Psychological Research
No ratings yet
KOBORI Et Al-2012-Japanese Psychological Research
6 pages
What Is Autonomy and Why Is It Important?
No ratings yet
What Is Autonomy and Why Is It Important?
2 pages
Blended Distance Learning Approach in The New Normal and The Core Competencies in Cookey Among TVL Students
No ratings yet
Blended Distance Learning Approach in The New Normal and The Core Competencies in Cookey Among TVL Students
12 pages
Infant Alternative Feeding Methods
No ratings yet
Infant Alternative Feeding Methods
4 pages
Uptown School - Admission Requirements
No ratings yet
Uptown School - Admission Requirements
2 pages
Capgemini Company
No ratings yet
Capgemini Company
41 pages
GCE 1 Mathmatics
No ratings yet
GCE 1 Mathmatics
52 pages
Conceptual Metaphor & Image Schemas
No ratings yet
Conceptual Metaphor & Image Schemas
1 page
Essay Mullananda
No ratings yet
Essay Mullananda
3 pages
Ganfield Resume
No ratings yet
Ganfield Resume
3 pages
SIPA
No ratings yet
SIPA
7 pages
TERM PAPER IN EDUCATIONAL PLANNING-Albino, Sandrex S.
100% (1)
TERM PAPER IN EDUCATIONAL PLANNING-Albino, Sandrex S.
20 pages
DLL - Mapeh 4 - Q1 - W2
No ratings yet
DLL - Mapeh 4 - Q1 - W2
3 pages
Assessment Task 1: Written Questions: BSB50420Diploma of Leadership and Management
100% (1)
Assessment Task 1: Written Questions: BSB50420Diploma of Leadership and Management
5 pages
Summer Valley Eng Lang Prelims
No ratings yet
Summer Valley Eng Lang Prelims
4 pages
Final Group 6 Research Manuscript PDF
100% (1)
Final Group 6 Research Manuscript PDF
76 pages
Words of Wisdom Script
No ratings yet
Words of Wisdom Script
3 pages
List of Assessments
No ratings yet
List of Assessments
16 pages
Y2 Simplified English Yearly Sow
No ratings yet
Y2 Simplified English Yearly Sow
35 pages
Application Form
No ratings yet
Application Form
2 pages
Human-Computer Interactions in Museums: Series Editor
No ratings yet
Human-Computer Interactions in Museums: Series Editor
173 pages
Antoinette Benjamin Resume 1
No ratings yet
Antoinette Benjamin Resume 1
2 pages
Validation of The ASK-ASD in A Sample of Parents Teachers and M
No ratings yet
Validation of The ASK-ASD in A Sample of Parents Teachers and M
90 pages
Philosophy for Senior Students
100% (8)
Philosophy for Senior Students
3 pages
Learning Delivery Modalities Course
No ratings yet
Learning Delivery Modalities Course
57 pages
Best Practices in Data Collection and Preparation: Recommendations For Reviewers, Editors, and Authors
No ratings yet
Best Practices in Data Collection and Preparation: Recommendations For Reviewers, Editors, and Authors
16 pages
Elective Subjects For Private Candidates
No ratings yet
Elective Subjects For Private Candidates
3 pages
Recognized University Scholarship Dec
No ratings yet
Recognized University Scholarship Dec
1 page
Clinical Applications of Learning Theory 1st Edition Mark Haselgrove Instant Download
100% (7)
Clinical Applications of Learning Theory 1st Edition Mark Haselgrove Instant Download
131 pages