HR Metrics and Analytics
Session 9 to 12
© Santhanam P
Compensation System - Strategic Choices 3
A strategic perspective focuses on those compensation choices that help the
organization gain and sustain competitive advantage.
How should
total
compensation Competitive
advantages
help us to win Employee
How should
Attitude and
HR help us Compensati behaviors
What How do win on system
business we win Strategic
should we compensati
HR Strategies on decisions
be in
Business unit
strategies Social,
Corporate competitive and
Objectives, regulatory
strategic environment
plans, vision
and values
© Santhanam P
Tailor the Compensation System to Strategy 4
HR Program
Strategy Business Responses Compensation System
Alignment
• Reward innnovaion in
• Product Leadership
Innovator : Increase • Committed to products ad process
• Shift to mas
product complexity and agile, Risk Taking, • Market based pay
customization
shorten product life cycle Innovative people • Flexible generic job
• Cycle time
description
• Focus on competitors
labor cost
• Operational excellence
Cost Cutter : Focus on • Increase variable pay
• Pursue cost –effective • Do more with less
efficiency • Emphasize productivity
solutions
• Focus on system control
and work specification
• Customer satisfaction
Customer Focused : • Deliver solutions to • Delight Customer, incentives
Increase customer customers exceed • Value of job and skills
expectations • Speed to marker expectations based on customer
contact
© Santhanam P
Competitive pay policy – Comp objectives 5
Labour Reduce Pay Increase
Policy Attract Retain Productivity
Cost Dissatisfaction
Pay above market
(Lead)
Pay with Market
(Match)
Pay Below market
(Lag)
Hybrid Policy
Employer of
Choice
© Santhanam P
Critical workforce segmentation 6
• Typically, a small percentage of roles in an
organization make a disproportionately large
contribution to the achievement of business
strategy
• Talent solutions designed around CWS
requirements create competitive advantage:
• Maximize the attraction , development and
retention of workers who create the most
value
© Santhanam P
Critical workforce segmentation – Infra 7
• Need to retain
High Specialist Critical Workforce and enhance
• Technical team • Project development team their
• Legal • Project finance competencies
Difficulty of Replacing Skills
• Internal audit • Strategic sourcing & Vendor
/ part development
Flexible Labor Core Workforce
• Admin • Project Engineers
• Security • Operations team
• Payroll • HR team
• Front office • Accounts
• Secretary • Finance
• Procurement
• IT
• CS
Low High
Impact on value chain
• Differentiated HR strategy for each quadrant – Hiring, Training and retaining
© Santhanam P • Example- Hire using competency matrix incase of critical workforce and effective training
The infamous “Bell curve” 8
© Santhanam P
The infamous “Bell curve” 9
© Santhanam P
Types of Distributions 10
◼ Discrete theoretical distributions
◼ Binomial distribution
◼ Poisson distribution
◼ Rectangular distribution
◼ Multinomial distribution etc.,
◼ Continuous theoretical distributions
◼ Normal distribution
◼ Students t-distribution
◼ Chi-square distribution
© Santhanam P ◼ F-distribution
Distribution of data 11
Let us plot the height of 100 children in cms with the age group of 4 to 5 years
80 100 120 80 100 120 80 100 120 80 100 120
But they form a pattern that, if stable, can be described as a distribution.
80 100 120 80 100 120 80 100 120
© Santhanam P
Histogram 12
© Santhanam P
13
© Santhanam P
Normal Distribution – Excel exercise 14
Chart Title
0 1 2 3 4 5 6 7 8
12 12
10 10
8 8
6 6
4 4
2 2
0 0
52-60 60-68 68-76 76-84 84-92 92-100 100-108
© Santhanam P
Normal Distribution 15
What is Normal Distribution?
• Widely used Distribution
• It is assumed that all processes occurring in nature approximate the
normal distribution. Hence the name “Normal”
• Characterized by familiar “Bell Shape Curved”
• Characterized by three variables Mean, Sigma & Standard Deviation
© Santhanam P
Normal Distribution 16
When the top points of each rectangle of the histogram are joined
by a smooth curve it follows a bell shaped pattern.
If this curve is symmetrical about the mean then it is said to be
normal curve.
In general all processes which are operating under chance causes
follow normal distribution.
© Santhanam P
Normal Distribution 17
Properties of Normal Distribution:
μ + 2σ
μ + 3σ
μ - 3σ
μ - 2σ
μ+σ
μ-σ
μ
1) Normal Distribution is completely designated by two
parameters (μ and σ)
2) μ used for location and σ for spread.
© Santhanam P 3) Normal curve is bell shaped.
Normal Distribution 18
Properties of Normal Distribution (Contd):
4) Normal distribution is symmetric around μ.
5) Normal distribution extends from -∞ to ∞. But for all practicable
purposes the region μ - 3σ to μ + 3σ covers most of the distribution.
6) The mean, mode and median will be the same for this curve.
7) Area under the normal curve is equal to 1
8) 68.26% of values lie within 1σ limits.
∓
∓
9) 95.44% of values lie within 2σ limits.
∓
10)99.73% of values lie within 3σ limits.
© Santhanam P
Normal Distribution 19
© Santhanam P
Normal Distribution 20
© Santhanam P
Normal Distribution 21
Forced distribution Method
▪ In this technique the rater is required to distribute his
rating in form of normal distribution
▪ It’s similar to grading on the curve
▪ Places predetermined percentages of ratees into
several performance categories.
▪ The proportion in each category may not be
symmetrical
▪ For instance, GE used Top 20% Middle 70% and bottom
10% for its managers.
© Santhanam P
Normal Distribution 22
© Santhanam P
Non Normal Distribution 23
© Santhanam P
Non Normal Distribution 24
© Santhanam P
Asymmetry 25
◼ Measures of asymmetry
◼ When the distribution of item in a series happens to be perfectly
symmetrical, we have normal distribution. Such a curve is perfectly bell
shaped. But if the curve is distorted (whether on the right side or on the
left side) we have asymmetrical distribution which indicates that there is
skewness.
◼ If the curve is distorted on the towards left we have negative skewness
and vice versa
© Santhanam P
Asymmetry 26
◼ Skewness is, thus, a measure of asymmetry and shows the manner in
which the items are clustered around the average
◼ The difference between the mean, median or mode provides an
easy way of expressing skewness in a series
© Santhanam P
Asymmetry 27
An important measure of the shape of a distribution is called skewness.
The formula for the skewness of sample data is
xi − x
3
n
Skewness =
(n − 1)(n − 2) s
Skewness can be easily computed using statistical software.
© Santhanam P
Asymmetry 28
Symmetric (not skewed)
• Skewness is zero.
• Mean and median are equal.
Skewness = 0
.35
.30
Relative Frequency
.25
.20
.15
.10
.05
0
© Santhanam P
Asymmetry 29
◼ Moderately Skewed Left ◼ Moderately Skewed Right
• Skewness is negative. • Skewness is positive.
• Mean will usually be less than the median. • Mean will usually be more than the median.
Skewness = − .31 Skewness = .31
.35 .35
.30 .30
Relative Frequency
Relative Frequency
.25 .25
.20 .20
.15 .15
.10 .10
.05 .05
0 0
© Santhanam P
Kurtosis 30
Calculation in Excel
= Kurt
© Santhanam P
Skewness & Kurtosis 31
© Santhanam P
Parametric Test 32
Parametric statistics make assumptions (such as normality) about the population
values (called parameter)
For example, one assumption for the one way ANOVA is that the data comes from a
normal distribution. If your data isn’t normally distributed, you can’t run an ANOVA,
but you can run the nonparametric alternative—the Kruskal-Wallis test.
© Santhanam P
Non parametric Test 33
A non parametric test (sometimes called a distribution free test) does not assume
anything about the underlying distribution (for example, that the data comes from
a normal distribution). That’s compared to parametric test, which makes assumptions
about a population’s parameters (for example, the mean or standard deviation);
When the word “non parametric” is used in stats, it doesn’t quite mean that you
know nothing about the population. It usually means that you know the population
data does not have a normal distribution.
© Santhanam P
Non parametric Test 34
Non parametric tests are used when your data isn’t normal. Therefore the key is to figure
out if you have normally distributed data. For example, you could look at the distribution
of your data. If your data is approximately normal, then you can use parametric statistical
tests.
Q. If you don’t have a graph, how do you figure out if your data is normally distributed?
A. Check the skewness and Kurtosis of the distribution using software like Excel
A normal distribution has no skew. Basically, it’s a centered and symmetrical in shape.
Kurtosis refers to how much of the data is in the tails and the center.
© Santhanam P
Non parametric Test 35
Does your data allow for a parametric test, or do you have to use a non parametric
test like chi-square? The rule of thumb is:
For nominal scales or ordinal scales, use non parametric statistics.
For interval scales or ratio scales use parametric statistics.
Other reasons to run nonparametric tests:
• One or more assumptions of a parametric test have been violated.
• Your sample size is too small to run a parametric test.
• Your data has outliers that cannot be removed.
• You want to test for the median rather than the mean (you might want to do this if
you have a very skewed distribution)
© Santhanam P
Non parametric vs Parametric Test 36
NONPARAMETRIC TEST PARAMETRIC ALTERNATIVE
1-sample sign test One-sample Z-test, One sample t-test
1-sample Wilcoxon Signed Rank test One sample Z-test, One sample t-test
Friedman test Two-way ANOVA
Kruskal-Wallis test One-way ANOVA
Mann-Whitney test Independent samples t-test
Mood’s Median test One-way ANOVA
Spearman Rank Correlation Correlation Coefficient
© Santhanam P
37
Logistic Regression
© Santhanam P
What is Logistic Regression 38
© Santhanam P
What is Logistic Regression 39
© Santhanam P
Linear vs Logistic Regression 40
© Santhanam P
Logistic Regression 41
© Santhanam P
42
Attrition Case:
Logistic Regression
SPSS output discussion
© Santhanam P
43
X1 X2 X3 X4 X5 X6 X7 Y
LENGTH OF NUMBER OF
TOTAL LAST YEAR COMMUTE
SERVICE IN SATISFACTI LAST TRAINING LEAVER PREDICTED
NAME EXPERIENCE APPRAISAL DISTANCE IN
COMPANY IN ON SCORE INCREMENT % MANDAYS STATUS PROBABILITIES
IN YEARS RATING KMS
YEARS LAST YEAR
Kumar 2 6 3 2 8 2 15 1
Rajesh 3 3 3 3 25 5 12 0
Santhosh 4 4 3 3 30 4 18 0
Ajesh 5 7 3 3 9 3 19 1
© Santhanam P
Omnibus Test 44
Block 1: Method = Enter
we have an index of how good our model is at accounting for turnover across our organization.
Here we see that our model chi square is 77.814 and this is deemed to be significant (p<0.000).
Thus, importantly in this case, our six independent variables significantly account for variation in
the leaver versus stayer variable in our data set. In other words, we can identify features that may
well account for whether or not people have left.
© Santhanam P
R square 45
The Nagelkerke R-square is a figure that most closely equates with the multiple R-square figure in
multiple linear regression. Here the Nagelkerke R is referred to as a pseudo R-square in that it gives
an estimated indication of the percentage variation (whether people leave or not) that we
account for in our dependent variable with our proposed model. Here this indicates that just from
looking at these seven variables in our data set, we can account for 73.9 per cent of turnover
across our organization. Whilst this may seem like a moderate percentage, we need to consider all
of the possible causes of people leaving an organization and be impressed that these few
predictors actually account for a not-insubstantial amount of variation.
© Santhanam P
Hosmer and Lemeshow (HL) Test 46
The Hosmer-Lemeshow test (HL test) is a goodness of fit test for logistic regression, especially for risk
prediction models. A goodness of fit test tells you how well your data fits the model. Specifically, the
HL test calculates if the observed event rates match the expected event rates in population
subgroups. A significant test indicates that the model is not a good fit and a non-significant test
indicates a good fit.
The Hosmer-Lemeshow statistic indicates a poor fit if the significance value is less than 0.05. Here,
the model adequately fits the data. This statistic is the most reliable test of model fit for IBM® SPSS®
Statistics binary logistic regression, because it aggregates the observations into groups of "similar"
© Santhanam P
Classification table / Confusion matrix 47
Classification table, also known as prediction accuracy table. From the above table we can understand the
following
• Out of 198 people used in the model, the model correctly predicted whether or not somebody stayed / left
91.4% of the time
• Stayer : 148 people who stayed in the company were predicted correctly, 7 people who stayed in the
company were predicted as leaver.
• Leaver : 10 people who stayed were predicted wrongly as leaver and 33 leaver who left the company were
predicted
© Santhanam P correctly
Variables in the equation: Wald Statistic 48
Wald statistics (Wald’s test), which is equivalent of t-statistics (t- test) in regression.
This statistics is used to check the significance of individual explanatory variable. If
the p-value corresponding to Wald statistics is <0.05, the coefficient of the
explanatory variable explains the variation in dependent variable (that is the
variable is statistically significant).
© Santhanam P
Variables in the equation: Wald Statistic 49
It is clear from the analysis that Experience, Satisfaction score, Last increment percentage,
Training mandays & Commute distance are the three factors (that we have looked at
here) that significantly account for individual staff turnover. Looking down the significance
column, these three factors have p-values below 0.05 (p<0.05). To interpret what this
means we need to look further at something called odds ratios.
© Santhanam P
Variables in the equation: Multi-collinearity 50
The issue of multi-collinearity in the logistic regression is detected by examining the
standard errors for the beta coefficients of independent variables. Standard error
greater than 2.0 indicates multi-collinearity among the independent variables,
zero cells for a dummy-coded independent variable. From the above table,we
can infer that none of the independent variables have multi-collinearity problem.
© Santhanam P
Predicted probability is of Member? 51
© Santhanam P
Variables in the equation: Odds Ratio 52
Example for Odds Ratio:
Exp (B) for Commute distance 1.370, this means if commute distance increases by 1 unit
the probability of employee leaving increases by 1.37 times, this is Odds Ratio. In this
same case the Odds of employee leaving increases by 37% (1.370 - 1= .370) if
commute distance increases by 1 unit
© Santhanam P
Variables in the equation: Odds Ratio 53
Example for Odds Ratio:
Odds of employee leaving decreases by 78.5% (.215 - 1= -0.785) if training
increases by 1 unit
© Santhanam P
Variables in the equation: Predicted Probabilities 54
Odds Ratio can be converted in to probability using the below formula.
Example: Odds ratio of commute distance = 1.37
57% chance of falling in to our target group
Odds 1.37
P= = = 0.57 membership. Ie., higher the Commute distance the
1+ Odds 1+ 1.37
employee is 57% more prone to attrition
© Santhanam P
Predicted Probabilities 55
We can also see the values of
Probabilities under data view once
the analysis is completed.
PRE_1: Probability value closer to
one is more attrition prone
PGR_1: Probability group, 1 is
predicted leaver and 0 is predicted
stayer.
© Santhanam P
Attrition model 56
Based on the model results, four levels of employee risk buckets were identified and have
been shown below
Extreme High Risk (Score > 80%)
High Risk (Score < 80% to > 60%)
Moderate Risk (Score < 60% to > 40%)
Low Risk (Score < 40%)
© Santhanam P
Attrition model 57
The retention plan charted using identified risk bucket for anonymous Inc. has been presented below
Zone Action
Zone 1 (Safe) No action required
• Action to be taken on long term basis
• Apply group interventions
Zone 2 (Dormant) • HR to track demographic data of these individuals to see if the risk
profiles are changing or if equation need to be altered
• Action to be taken on medium term basis
• HR to keep close watch on the behavioral status to change from
Zone 3 (Active) “Medium” to “ High Risk” to analyze demographic data to identify
high risk supervisors / units and point them the BU heads
• Apply group interventions
Action to be taken on immediate basis
HR to send the list to concerned managers for immediate action
Zone 4 (Actuate) HR to validate the risk for consistency with the identified clusters
Managers to have one-on-one conversation with identified
© Santhanam P employees
Contact 58
Phone : 9600066166
Web : www.transbizconsulting.com
Email : santhanam@transbizconsulting.com /
transbizconsulting@gmail.com
Twitter : Transbiz1
Linked in : linkedin.com/company/transbizconsulting
© Santhanam P
Facebook : facebook.com/transbizconsulting
Thank You