33% found this document useful (3 votes)

1K views11 pages

Hair Salon PCA & Regression Analysis

The document describes a case study involving principal component analysis of a dataset containing variables related to a hair salon chain. It lists 8 questions to answer regarding exploratory data analysis, scaling of variables, checking for outliers, building the covariance matrix, determining the number of principal components, and discussing business implications. The respondent is asked to perform the listed analyses and provide inferences and discussion of the results.

Uploaded by

rishit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

33% found this document useful (3 votes)

1K views11 pages

Hair Salon PCA & Regression Analysis

Uploaded by

rishit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Problem Statement:

The ‘Hair Salon.csv’ dataset contains various variables used for the context of
Market Segmentation. This particular case study is based on various parameters
of a salon chain of hair products. You are expected to do Principal Component
Analysis for this case study according to the instructions given in the following
rubric.

Note: This particular dataset contains the target variable satisfaction as well.
Please do drop this variable before doing Principal Component Analysis.

Questions:

1) Perform Exploratory Data Analysis [both univariate and multivariate

analysis to be performed]. The inferences drawn from this should be
properly documented. – 5 points

2) Scale the variables and write the inference for using the type of scaling
function for this case study. - 3 points

3) Comment on the comparison between covariance and the correlation matrix

after scaling. - 2 points

4) Check the dataset for outliers before and after scaling. Draw your
inferences from this exercise. - 3 points

5) Build the covariance matrix, eigenvalues and eigenvector. - 4 points

6) Write the explicit form of the first PC (in terms of Eigen Vectors) – 5 points

7) Discuss the cumulative values of the eigenvalues. How does it help you to
decide on the optimum number of principal components? What do the
eigenvectors indicate? Perform PCA and export the data of the Principal
Component scores into a data frame. – 10 points
8) Mention the business implication of using the Principal Component Analysis for this case
study. – 5 points

Answer:
Correlations:
Simple Linear Models :

Satisfaction = 3.6759 + 0.4151 * ProdQual

1.beta-naught or intercept coefficient is equal to 3.6759

2.beta-slope or the variable coefficient Product quality = 0.4151

3.for any one unit change in product quality Satisfaction rating would impr ove by 0.4151 keeping
other things constant as explained by model

Satisfaction = 5.1516 + 0.4811 * Ecom

Satisfaction = 6.44757 + 0.08768 * TechSup

Satisfaction = 3.680 + 0.595 * CompRes

Satisfaction = 5.6259 + 0.3222 * Advertising

Satisfaction = 4.0220 + 0.4989 * ProdLine

Satisfaction = 4.070 + 0.556 * SalesFImage

Satisfaction = 8.0386 + (-0.1607) * ComPricing

Satisfaction = 5.3581 + 0.2581 * WartyClaim

Satisfaction = 4.0541 + 0.6695 * OrdBilling

Satisfaction = 3.2791 + 0.9364 * DelSpeed

Principal Component Analysis:

Conducting a bartlett sphericity test to check whether Principal Component Analysis can be done on
the predictor variables of the dataset:
Since the p value for the test is quite less signficance level of alpha = 0.001 so we reject the null
hypothesis Ho (that PCA cannot be conducted implying that there is no correlation amongst the
predictor variables)

PCA workout
Using the rotation type of varimax we conduct the PCA analysis with 4 factors Dataset hair.corr has
all 11 predictor variables (minus the ID column and dependent variable Satisfaction ratings)
PCA Explained
The 4 RCs explain explain about 80 % of cumulative variation in the dataset which is good number
After studying the PCA results on hair dataset an arbitrary number was choosen as cutoff (0.6) to
check whether the variablity of the predictors can be explained by single components. It worked and
we can see that every input variable can be explained by the single set of Components (RCs )

Scores for individual IDs (rows of observation) was extracted from the PCA analysis and rounded off
to two decimal places for ease of computation :

Table for Meaningful names of Principal Components

Components Meaningful Names Column Name

RC1 Purchasing Experience Pchexp

RC2 Brand Recognition Bdrecog

RC3 After Sales Service Aftsvc

RC4 Product Prodt

Explanation

1. RC1 - Purchasing Experience explains about variables affecting Complaint resolution, Order and
Billing and delivery speed to customers

2. RC2 - Brand recognition handles Ecommerce, image of Sales force , Advertising which is face of
the product
3. RC3 - After Sales Service gives information about Technical support, and Warranty and claims if
there is any problem to customer after he has bought the item

4. RC4 – Product talks about the qualities of product like its varieties and types, prices its quality i.e
all tangible aspects about the very existence of company.

Score matrix was converted into a data frame and its variables which are nothing but PCA
components were given meaningful names for further analysis We achieved a dimensionality
reduction where just 4 factors can explain the complete 11 predictor variables of the hair dataset
through PCA analysis.

Score head

Score data frame was combined with a smaller subset (extracted data frame - hair_new) having ID
and Satisfaction ratings as columns to form a meaningful dataset devoid of multicollinearity and
manageable predictor variables (just 4) for further Regression model building.

Multiple Linear Regression Model Validity:

Summary Explained
1. Looking at the Pr(t) values of Coefficients like Intercept (constant beta-naught) we see that it is
significant even at 0.001 level. so it definitely not zero and contributes to regression model

2. Similarly predictor variables like Purchase experience, Brand Recognition and Product have
significant betas implying that Response variable Satisfaction is linearly associated with them

3. After sales service is the only variable which has some high p-value implying that its beta
coefficient may not be contributing that significantly to the model or may be zero

4. All together Adj-R^2 explains that these predictors explains the 64.6 % of the variability in the
dataset which is still good enough (may not fall in excellent category)

5. Overall p-value (extremely less e raise to minus 16) of Model given by F-statistic gives evidence
against the null-hypothesis. Model is significantly valid at this point

Using the newly built multiple regression model new Satisfaction scores were predicted
(pred.Satisfn) to check the validity of the model New dataframe hair_new was formed to have
columns as 1. IDs, 2. Satisfaction ratings 3. Purchase Experience 4. Brand Recognition 5. After Sales
service 6. predicted satisfaction (from multiple linear model)

Predicted v/s Actual Satisfactions

Plot analysis revealed that our new MLR Regression model is quite good and close to actual
Satisfaction scores Blue dots represent Actual Satisfaction ratings Red dots represent Predicted
satisfaction scores derived from multiple linear regression model
Conclusion
Based on the consumer goods product – Hair – market segmentation data set, we can conclude that,
due to multicollinearity within independent variables, we cannot apply regression model directly on
the date set.

So, we created new data set – New hair – based on Principal Component Analysis. We have also
recommended subjective new variable names as ServDesk, MktDesk, SuppDesk and RechDesk to the
components. And then, based on Factor Analysis study we performed multi linear regressing.

Based on the regression model we have concluded that Sales Service Desk plays – the most
significant role in customer satisfaction. That means company should be extra cautious in Complain
Resolution, Order & Billing, and Delivery Speed fronts. If Delivery is late or complaint is not resolved
in time may leads to decline in company’s revenue. However, Brand Marketing Desk and Strategic
Research Desk also plays important role with 0.509 and 0.540 weighted respectively in the
regression model.

From the study, we have also concluded that due to consumer goods product type customer do not
give significance to Technical Support and Warranty & Claims, And hence SuppDesk variable does not
play significance role in customer satisfaction index.
In overall study, we removed multicollinearity from the data, we built regression model, we tested
regression model and based on BackTrack data we also predicted Actual vs. Predicted customer
satisfaction score in line chart.

In product or service based companies, if customer/prospect is satisfied with product, he will make
purchase again and again for that particular product, and that works as revenue multiplier for the
company. High customer satisfaction can also leads to cross selling of products.

Hence, we suggest management to conduct customer survey on regular bases to identify trends and
relationship for higher customer satisfaction experience.

Points
0% (6)
Points
1 page
Data Analysis for Marketing Experts
100% (2)
Data Analysis for Marketing Experts
24 pages
Facebook Comment Prediction Guide
100% (1)
Facebook Comment Prediction Guide
12 pages
Arnab Chowdhury DM
75% (4)
Arnab Chowdhury DM
14 pages
Mra Project: Prepared By: Deepak Batabyal Date:-09 Feb 2020
100% (2)
Mra Project: Prepared By: Deepak Batabyal Date:-09 Feb 2020
32 pages
Assignment Report - Data Mining
No ratings yet
Assignment Report - Data Mining
24 pages
Data Mining Graded Assignment: Problem 1: Clustering Analysis
100% (3)
Data Mining Graded Assignment: Problem 1: Clustering Analysis
39 pages
Adv Stats Proj
95% (38)
Adv Stats Proj
25 pages
Lifi
100% (1)
Lifi
16 pages
Education - Post 12th Standard - CSV
88% (16)
Education - Post 12th Standard - CSV
11 pages
Anamit Deb Gupta Mra - Project Milestone - 1
100% (1)
Anamit Deb Gupta Mra - Project Milestone - 1
30 pages
CREDIT RISK and MARKETRISK MILESTONE2
100% (2)
CREDIT RISK and MARKETRISK MILESTONE2
34 pages
Data Insights for Auto Parts Company
100% (3)
Data Insights for Auto Parts Company
29 pages
Boston Condo Sale Story
0% (1)
Boston Condo Sale Story
11 pages
Report - Project8 - FRA - Surabhi - Report
0% (1)
Report - Project8 - FRA - Surabhi - Report
15 pages
Data Mining Project PCA Report
100% (1)
Data Mining Project PCA Report
27 pages
India Credit Risk Default Model - Nivedita Dey - PGP BABI May19 - 2
100% (4)
India Credit Risk Default Model - Nivedita Dey - PGP BABI May19 - 2
19 pages
Marketing and Retail Analytics Project
100% (1)
Marketing and Retail Analytics Project
17 pages
AS Project - 3 Business Report
0% (1)
AS Project - 3 Business Report
10 pages
Financial Risk Analysis Project Report Financial Risk Analysis Project Report
100% (2)
Financial Risk Analysis Project Report Financial Risk Analysis Project Report
29 pages
Data Mining
No ratings yet
Data Mining
24 pages
Predictive Modelling Project 1 PDF
50% (2)
Predictive Modelling Project 1 PDF
38 pages
Vivek Dubey - Marketing & Retail Analytics
100% (2)
Vivek Dubey - Marketing & Retail Analytics
20 pages
TSF Week3 Quiz Part2 PDF
67% (3)
TSF Week3 Quiz Part2 PDF
3 pages
SMDM Project Report
100% (1)
SMDM Project Report
19 pages
DM Gopala Satish Kumar Business Report G8 DSBA
100% (2)
DM Gopala Satish Kumar Business Report G8 DSBA
26 pages
Data Analysis for Python Users
100% (1)
Data Analysis for Python Users
14 pages
MRA Project Milestone 1 PDF
No ratings yet
MRA Project Milestone 1 PDF
1 page
FRA Project Business Report
100% (2)
FRA Project Business Report
27 pages
PM - ExtendedProject - Business Report
100% (5)
PM - ExtendedProject - Business Report
35 pages
MRA Project Milestone 1 - Maminulislam
83% (6)
MRA Project Milestone 1 - Maminulislam
30 pages
SMDM Project
No ratings yet
SMDM Project
16 pages
SMDM Business-Report Arvind Soni-2
0% (1)
SMDM Business-Report Arvind Soni-2
15 pages
SMDM Extended Project Report
No ratings yet
SMDM Extended Project Report
9 pages
PROJECT - Time Series Forecasting by Akshay Kharote PDF
100% (2)
PROJECT - Time Series Forecasting by Akshay Kharote PDF
85 pages
Auto Parts Customer Insights
100% (2)
Auto Parts Customer Insights
41 pages
MRA Assignment: by Chitra Mukadam
100% (2)
MRA Assignment: by Chitra Mukadam
19 pages
Project Time Series Forecasting
100% (1)
Project Time Series Forecasting
53 pages
Auto Parts Sales RFM Analysis
No ratings yet
Auto Parts Sales RFM Analysis
30 pages
Assignment Report - Predictive Modelling - Rahul Dubey
No ratings yet
Assignment Report - Predictive Modelling - Rahul Dubey
18 pages
MRA Project Milestone 2
100% (2)
MRA Project Milestone 2
31 pages
Capstone Project
100% (1)
Capstone Project
7 pages
Data Mining Quiz 2
100% (2)
Data Mining Quiz 2
8 pages
Business Report
No ratings yet
Business Report
12 pages
Data Mining Quiz 1 Clustering
100% (2)
Data Mining Quiz 1 Clustering
4 pages
TSF - Graded Quiz 4 - Great Lakes Institute
No ratings yet
TSF - Graded Quiz 4 - Great Lakes Institute
5 pages
DVT Group Assignment PDF
100% (1)
DVT Group Assignment PDF
14 pages
Business Report TSF - Rose DataSet
100% (4)
Business Report TSF - Rose DataSet
52 pages
Advance Statistics-Project Report
50% (2)
Advance Statistics-Project Report
17 pages
Time Series Forecasting Week 1 Quiz Part 2
67% (3)
Time Series Forecasting Week 1 Quiz Part 2
2 pages
Capstone Project Business: Predict Customer Churn in E-Commerce
100% (2)
Capstone Project Business: Predict Customer Churn in E-Commerce
10 pages
DVT Alternate Project
50% (2)
DVT Alternate Project
1 page
Problem Statement1
No ratings yet
Problem Statement1
1 page
Shivani Pandey TSF
100% (1)
Shivani Pandey TSF
32 pages
Mra Project
No ratings yet
Mra Project
12 pages
Advanced Statistics Project Report
100% (1)
Advanced Statistics Project Report
34 pages
Credit Risk Model Analysis
100% (1)
Credit Risk Model Analysis
31 pages
Hair Salon Market Segmentation Analysis
No ratings yet
Hair Salon Market Segmentation Analysis
14 pages
Data Science Project Analysis
No ratings yet
Data Science Project Analysis
21 pages
Market Segmentation - Product Service Management
No ratings yet
Market Segmentation - Product Service Management
16 pages
Data Mining
60% (10)
Data Mining
25 pages
SMDM Project Business
80% (5)
SMDM Project Business
13 pages
Advance Statistics Project
100% (9)
Advance Statistics Project
9 pages
Project 2 Factor Hair Revised Case Study
No ratings yet
Project 2 Factor Hair Revised Case Study
25 pages
Article 20 Vol 7 4 2018 2
No ratings yet
Article 20 Vol 7 4 2018 2
16 pages
Data Analytics Job Opportunities
No ratings yet
Data Analytics Job Opportunities
5 pages
Conjoint Analysis for Paint Preferences
No ratings yet
Conjoint Analysis for Paint Preferences
2 pages
Snacks & Namkeens
No ratings yet
Snacks & Namkeens
33 pages
Manual Therapy: Oliver P. Thomson, Nicola J. Petty, Ann P. Moore
No ratings yet
Manual Therapy: Oliver P. Thomson, Nicola J. Petty, Ann P. Moore
8 pages
Course 2 Google
No ratings yet
Course 2 Google
36 pages
Pengaruh Good Corporate Governance Terhadap Harga Saham Pada Perusahaan Manufactur Yang Terdaftar Di Bei Pada Tahun 2010-2012
No ratings yet
Pengaruh Good Corporate Governance Terhadap Harga Saham Pada Perusahaan Manufactur Yang Terdaftar Di Bei Pada Tahun 2010-2012
14 pages
Forecasting Methods
No ratings yet
Forecasting Methods
34 pages
Development of Education Information System
No ratings yet
Development of Education Information System
20 pages
Digilocker Survey Report
No ratings yet
Digilocker Survey Report
7 pages
Data Science & Analytics Basics
No ratings yet
Data Science & Analytics Basics
72 pages
Business Analytics Using Data Mining: Term 6
No ratings yet
Business Analytics Using Data Mining: Term 6
26 pages
Unit 3 Test
No ratings yet
Unit 3 Test
4 pages
Statistical Test Selection Guide
No ratings yet
Statistical Test Selection Guide
3 pages
Introduction To Quantitative Research
No ratings yet
Introduction To Quantitative Research
23 pages
Chapter One:: The Effect of Inflation On Poor People in Hargeisa
75% (4)
Chapter One:: The Effect of Inflation On Poor People in Hargeisa
42 pages
Format of Synopsis 2023
No ratings yet
Format of Synopsis 2023
2 pages
Data Mining and Business Intelligence
No ratings yet
Data Mining and Business Intelligence
42 pages
20ad41e2 - Data Science
No ratings yet
20ad41e2 - Data Science
2 pages
Data Science
No ratings yet
Data Science
29 pages
Tracing Eye Movement Protocols With Cognitive Process Models
No ratings yet
Tracing Eye Movement Protocols With Cognitive Process Models
6 pages
Customer Satisfaction With Reference To Yamaha Motors
0% (1)
Customer Satisfaction With Reference To Yamaha Motors
58 pages
Understanding Research Bias
No ratings yet
Understanding Research Bias
5 pages
Chapter 10 Test Bank Version1
No ratings yet
Chapter 10 Test Bank Version1
22 pages
Unit 2
No ratings yet
Unit 2
57 pages
Analisis Deskriptif cONTOH
No ratings yet
Analisis Deskriptif cONTOH
3 pages
Stevenson 14e Chap003
No ratings yet
Stevenson 14e Chap003
41 pages
CEE Candidate Report
No ratings yet
CEE Candidate Report
45 pages
Lecture - Slides - 3 - Correlation and Regression
No ratings yet
Lecture - Slides - 3 - Correlation and Regression
84 pages
Standard Normal Curve Table
67% (3)
Standard Normal Curve Table
3 pages

Hair Salon PCA & Regression Analysis

Uploaded by

Hair Salon PCA & Regression Analysis

Uploaded by

Problem Statement:

1) Perform Exploratory Data Analysis [both univariate and multivariate

3) Comment on the comparison between covariance and the correlation matrix

5) Build the covariance matrix, eigenvalues and eigenvector. - 4 points

Satisfaction = 3.6759 + 0.4151 * ProdQual

1.beta-naught or intercept coefficient is equal to 3.6759

2.beta-slope or the variable coefficient Product quality = 0.4151

Satisfaction = 5.1516 + 0.4811 * Ecom

Satisfaction = 3.680 + 0.595 * CompRes

Satisfaction = 5.6259 + 0.3222 * Advertising

Satisfaction = 4.0220 + 0.4989 * ProdLine

Satisfaction = 4.070 + 0.556 * SalesFImage

Satisfaction = 5.3581 + 0.2581 * WartyClaim

Satisfaction = 4.0541 + 0.6695 * OrdBilling

Satisfaction = 3.2791 + 0.9364 * DelSpeed

Principal Component Analysis:

Table for Meaningful names of Principal Components

Components Meaningful Names Column Name

RC1 Purchasing Experience Pchexp

RC2 Brand Recognition Bdrecog

RC3 After Sales Service Aftsvc

RC4 Product Prodt

Multiple Linear Regression Model Validity:

Predicted v/s Actual Satisfactions

You might also like