33% found this document useful (3 votes)
1K views11 pages

Hair Salon PCA & Regression Analysis

The document describes a case study involving principal component analysis of a dataset containing variables related to a hair salon chain. It lists 8 questions to answer regarding exploratory data analysis, scaling of variables, checking for outliers, building the covariance matrix, determining the number of principal components, and discussing business implications. The respondent is asked to perform the listed analyses and provide inferences and discussion of the results.

Uploaded by

rishit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
33% found this document useful (3 votes)
1K views11 pages

Hair Salon PCA & Regression Analysis

The document describes a case study involving principal component analysis of a dataset containing variables related to a hair salon chain. It lists 8 questions to answer regarding exploratory data analysis, scaling of variables, checking for outliers, building the covariance matrix, determining the number of principal components, and discussing business implications. The respondent is asked to perform the listed analyses and provide inferences and discussion of the results.

Uploaded by

rishit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Problem Statement:

The ‘Hair Salon.csv’ dataset contains various variables used for the context of
Market Segmentation. This particular case study is based on various parameters
of a salon chain of hair products. You are expected to do Principal Component
Analysis for this case study according to the instructions given in the following
rubric.

Note: This particular dataset contains the target variable satisfaction as well.
Please do drop this variable before doing Principal Component Analysis.

Questions:

1) Perform Exploratory Data Analysis [both univariate and multivariate


analysis to be performed]. The inferences drawn from this should be
properly documented. – 5 points

2) Scale the variables and write the inference for using the type of scaling
function for this case study. - 3 points

3) Comment on the comparison between covariance and the correlation matrix


after scaling. - 2 points

4) Check the dataset for outliers before and after scaling. Draw your
inferences from this exercise. - 3 points

5) Build the covariance matrix, eigenvalues and eigenvector. - 4 points

6) Write the explicit form of the first PC (in terms of Eigen Vectors) – 5 points

7) Discuss the cumulative values of the eigenvalues. How does it help you to
decide on the optimum number of principal components? What do the
eigenvectors indicate? Perform PCA and export the data of the Principal
Component scores into a data frame. – 10 points
8) Mention the business implication of using the Principal Component Analysis for this case
study. – 5 points

Answer:
Correlations:
Simple Linear Models :

Satisfaction = 3.6759 + 0.4151 * ProdQual

1.beta-naught or intercept coefficient is equal to 3.6759

2.beta-slope or the variable coefficient Product quality = 0.4151

3.for any one unit change in product quality Satisfaction rating would impr ove by 0.4151 keeping
other things constant as explained by model

Satisfaction = 5.1516 + 0.4811 * Ecom


Satisfaction = 6.44757 + 0.08768 * TechSup

Satisfaction = 3.680 + 0.595 * CompRes

Satisfaction = 5.6259 + 0.3222 * Advertising

Satisfaction = 4.0220 + 0.4989 * ProdLine

Satisfaction = 4.070 + 0.556 * SalesFImage


Satisfaction = 8.0386 + (-0.1607) * ComPricing

Satisfaction = 5.3581 + 0.2581 * WartyClaim

Satisfaction = 4.0541 + 0.6695 * OrdBilling

Satisfaction = 3.2791 + 0.9364 * DelSpeed

Principal Component Analysis:


Conducting a bartlett sphericity test to check whether Principal Component Analysis can be done on
the predictor variables of the dataset:
Since the p value for the test is quite less signficance level of alpha = 0.001 so we reject the null
hypothesis Ho (that PCA cannot be conducted implying that there is no correlation amongst the
predictor variables)

PCA workout
Using the rotation type of varimax we conduct the PCA analysis with 4 factors Dataset hair.corr has
all 11 predictor variables (minus the ID column and dependent variable Satisfaction ratings)
PCA Explained
The 4 RCs explain explain about 80 % of cumulative variation in the dataset which is good number
After studying the PCA results on hair dataset an arbitrary number was choosen as cutoff (0.6) to
check whether the variablity of the predictors can be explained by single components. It worked and
we can see that every input variable can be explained by the single set of Components (RCs )

Scores for individual IDs (rows of observation) was extracted from the PCA analysis and rounded off
to two decimal places for ease of computation :

Table for Meaningful names of Principal Components

Components Meaningful Names Column Name

RC1 Purchasing Experience Pchexp

RC2 Brand Recognition Bdrecog

RC3 After Sales Service Aftsvc

RC4 Product Prodt

Explanation

1. RC1 - Purchasing Experience explains about variables affecting Complaint resolution, Order and
Billing and delivery speed to customers

2. RC2 - Brand recognition handles Ecommerce, image of Sales force , Advertising which is face of
the product
3. RC3 - After Sales Service gives information about Technical support, and Warranty and claims if
there is any problem to customer after he has bought the item

4. RC4 – Product talks about the qualities of product like its varieties and types, prices its quality i.e
all tangible aspects about the very existence of company.

Score matrix was converted into a data frame and its variables which are nothing but PCA
components were given meaningful names for further analysis We achieved a dimensionality
reduction where just 4 factors can explain the complete 11 predictor variables of the hair dataset
through PCA analysis.

Score head

Score data frame was combined with a smaller subset (extracted data frame - hair_new) having ID
and Satisfaction ratings as columns to form a meaningful dataset devoid of multicollinearity and
manageable predictor variables (just 4) for further Regression model building.

Multiple Linear Regression Model Validity:


Summary Explained
1. Looking at the Pr(t) values of Coefficients like Intercept (constant beta-naught) we see that it is
significant even at 0.001 level. so it definitely not zero and contributes to regression model

2. Similarly predictor variables like Purchase experience, Brand Recognition and Product have
significant betas implying that Response variable Satisfaction is linearly associated with them

3. After sales service is the only variable which has some high p-value implying that its beta
coefficient may not be contributing that significantly to the model or may be zero

4. All together Adj-R^2 explains that these predictors explains the 64.6 % of the variability in the
dataset which is still good enough (may not fall in excellent category)

5. Overall p-value (extremely less e raise to minus 16) of Model given by F-statistic gives evidence
against the null-hypothesis. Model is significantly valid at this point

Using the newly built multiple regression model new Satisfaction scores were predicted
(pred.Satisfn) to check the validity of the model New dataframe hair_new was formed to have
columns as 1. IDs, 2. Satisfaction ratings 3. Purchase Experience 4. Brand Recognition 5. After Sales
service 6. predicted satisfaction (from multiple linear model)

Predicted v/s Actual Satisfactions


Plot analysis revealed that our new MLR Regression model is quite good and close to actual
Satisfaction scores Blue dots represent Actual Satisfaction ratings Red dots represent Predicted
satisfaction scores derived from multiple linear regression model
Conclusion
Based on the consumer goods product – Hair – market segmentation data set, we can conclude that,
due to multicollinearity within independent variables, we cannot apply regression model directly on
the date set.

So, we created new data set – New hair – based on Principal Component Analysis. We have also
recommended subjective new variable names as ServDesk, MktDesk, SuppDesk and RechDesk to the
components. And then, based on Factor Analysis study we performed multi linear regressing.

Based on the regression model we have concluded that Sales Service Desk plays – the most
significant role in customer satisfaction. That means company should be extra cautious in Complain
Resolution, Order & Billing, and Delivery Speed fronts. If Delivery is late or complaint is not resolved
in time may leads to decline in company’s revenue. However, Brand Marketing Desk and Strategic
Research Desk also plays important role with 0.509 and 0.540 weighted respectively in the
regression model.

From the study, we have also concluded that due to consumer goods product type customer do not
give significance to Technical Support and Warranty & Claims, And hence SuppDesk variable does not
play significance role in customer satisfaction index.
In overall study, we removed multicollinearity from the data, we built regression model, we tested
regression model and based on BackTrack data we also predicted Actual vs. Predicted customer
satisfaction score in line chart.

In product or service based companies, if customer/prospect is satisfied with product, he will make
purchase again and again for that particular product, and that works as revenue multiplier for the
company. High customer satisfaction can also leads to cross selling of products.

Hence, we suggest management to conduct customer survey on regular bases to identify trends and
relationship for higher customer satisfaction experience.

You might also like