0% found this document useful (0 votes)

17 views4 pages

Data Collection

The project involved collecting gender pay gap data from reliable sources and cleaning it for analysis. A Linear Regression model was developed using Python to predict salaries based on factors like experience and education, revealing a noticeable salary gap even after controlling for these variables. The findings confirmed the existence of wage disparity and suggested that AI can help identify such patterns, with potential for further refinement of the model.

Uploaded by

sahhana kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views4 pages

Data Collection

Uploaded by

sahhana kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

DATA COLLECTION

I began the project by identifying reliable sources for gender pay gap data. I used datasets from
platforms such as Kaggle and government labor databases. These datasets included features
like:

DATA ATTRIBUTE DESCRIPTION

Gender Indicates the gender of the employee (e.g.,

Male, Female)

Salary Annual salary earned by the employee

Job Title The designation or position held by the

employee

Years of Experience Number of years the employee has worked

Education Level Highest qualification achieved (e.g.,

Bachelor's, Master's)

Location Geographical location of the job or employee

Q: How did you clean the data?

I removed missing values, duplicates, encoded categorical variables, and normalized numerical
fields.

Data Cleaning & Preparation:

- Removed null or inconsistent records.

- Converted categorical data (e.g., gender, education level) using label encoding.

- Applied feature scaling to numeric columns for consistent model input.

- Separated data into training and test sets (80:20 split).

- This preprocessing ensured the dataset was clean, balanced, and suitable for AI
modeling.

BUILD YOUR PROTOTYPE

To analyze patterns and predict salary based on features like experience, job title, and
education, I developed a Linear Regression Model using Python.

Tools & Libraries Used:

Pandas for data manipulation

Scikit-learn for building and training the model

Matplotlib and Seaborn for visualization

Steps Taken:

Defined features (X) and target (y as salary).

Trained a LinearRegression model using sklearn.

Evaluated performance using R² Score and Mean Absolute Error (MAE).

Visualized actual vs predicted values to understand how the model fits.

The prototype highlighted pay disparity patterns by comparing salaries for similar experience
and roles, across different genders.

Q: What model did you build and why?

I built a Linear Regression model to predict salary based on factors like experience, education,
and job role.

Q: What tools did you use?

Python, Pandas, Scikit-learn, Matplotlib, and Seaborn for model development and visualization.

SECTION: TEST YOUR SOLUTION

Q: How did you evaluate the model?□

I used R² score and Mean Absolute Error (MAE) on test data to assess model performance.

Q: What did the results show?□

The model revealed a noticeable salary gap even after controlling for experience and education,
supporting our hypothesis.

Data Cleaning & Preparation:

Removed null or inconsistent records.

Converted categorical data (e.g., gender, education level) using label encoding.

Applied feature scaling to numeric columns for consistent model input.

Separated data into training and test sets (80:20 split).

This preprocessing ensured the dataset was clean, balanced, and suitable for AI modeling.

TEST YOUR SOLUTION

To validate the effectiveness of the model, I tested it using unseen test data.

Testing Approach:

Predicted salaries were compared against actual salaries.

Analyzed whether salary predictions showed consistent discrepancies by gender.

Created scatter plots and regression lines to visually compare actual vs predicted values.

Q: How did you evaluate the model?

I used R² score and Mean Absolute Error (MAE) on test data to assess model performance.

Q: What did the results show?

The model revealed a noticeable salary gap even after controlling for experience and education,
supporting our hypothesis

Q: How did you evaluate the model?

I used R² score and Mean Absolute Error (MAE) on test data to assess model performance.

Q: What did the results show?□

The model revealed a noticeable salary gap even after controlling for experience and education,
supporting our hypothesis
Findings:

The model successfully captured general salary trends.

Predicted values exposed subtle gender-based gaps, even when controlling for other variables.

Conclusion of Testing:
The results confirmed the existence of wage disparity and supported our hypothesis that AI can
assist in identifying such patterns. Though the model is a prototype, it can be refined further for
higher accuracy and fairness audits.

Salary Data Analysis - Phase 1
No ratings yet
Salary Data Analysis - Phase 1
5 pages
Group 24 Miniproject
No ratings yet
Group 24 Miniproject
33 pages
Employee Salary Prediction
No ratings yet
Employee Salary Prediction
10 pages
Code Masters
No ratings yet
Code Masters
10 pages
Assessment 1 - UEL-CN-7000
No ratings yet
Assessment 1 - UEL-CN-7000
3 pages
Mini Project Report
No ratings yet
Mini Project Report
10 pages
Assessment 2 UEL CN 7000
No ratings yet
Assessment 2 UEL CN 7000
10 pages
Employee Salary Prediction Using Machine Learning Deep Learning
No ratings yet
Employee Salary Prediction Using Machine Learning Deep Learning
11 pages
Project Submission Edunet Foundation
No ratings yet
Project Submission Edunet Foundation
10 pages
Gladwin Tirkey Research Paper
No ratings yet
Gladwin Tirkey Research Paper
7 pages
Salary Prediction
No ratings yet
Salary Prediction
9 pages
Kaushik Project
No ratings yet
Kaushik Project
13 pages
Salary Hike Predictor Synopsis
No ratings yet
Salary Hike Predictor Synopsis
4 pages
Project Report
No ratings yet
Project Report
11 pages
Employee Salary Prediction-1
No ratings yet
Employee Salary Prediction-1
12 pages
Kel 2 - Uas Data Science
No ratings yet
Kel 2 - Uas Data Science
17 pages
Salary Predictions
No ratings yet
Salary Predictions
43 pages
Salary Prediction Using Machine Learning
No ratings yet
Salary Prediction Using Machine Learning
4 pages
Course Project - Machine Learning (DS PGC)
No ratings yet
Course Project - Machine Learning (DS PGC)
6 pages
Batch 1 Publication
No ratings yet
Batch 1 Publication
16 pages
Capstone Interim Report - HR CTC Prediction
80% (10)
Capstone Interim Report - HR CTC Prediction
16 pages
Synopsis Group 6 Final
No ratings yet
Synopsis Group 6 Final
6 pages
Linear Regression Research Paper
No ratings yet
Linear Regression Research Paper
2 pages
Adult Income Prediction
No ratings yet
Adult Income Prediction
9 pages
BT4234 - RPT - Mr. Sreenarayanan N M
No ratings yet
BT4234 - RPT - Mr. Sreenarayanan N M
32 pages
Internship PPT Salary-Prediction-Model-Leveraging-Machine-Learning
No ratings yet
Internship PPT Salary-Prediction-Model-Leveraging-Machine-Learning
10 pages
Technology in Education Technology Presentation in Blue Peach Illustrative Style
No ratings yet
Technology in Education Technology Presentation in Blue Peach Illustrative Style
11 pages
Shsconf Cdems2023 03013
No ratings yet
Shsconf Cdems2023 03013
5 pages
PPSD 1683560645
No ratings yet
PPSD 1683560645
9 pages
Salary Prediction
No ratings yet
Salary Prediction
4 pages
Edunetfoundation Ibm Skillsbuild Capstone Project - Indransh Srivastava
No ratings yet
Edunetfoundation Ibm Skillsbuild Capstone Project - Indransh Srivastava
12 pages
Volume6 Issue3 Paper10 2022
No ratings yet
Volume6 Issue3 Paper10 2022
6 pages
Article Review 11 Eng
No ratings yet
Article Review 11 Eng
18 pages
ML Report
No ratings yet
ML Report
20 pages
Final AML Report
No ratings yet
Final AML Report
20 pages
SSRN 3526707
No ratings yet
SSRN 3526707
5 pages
Hayudini, Mudzramer - Activity 4
No ratings yet
Hayudini, Mudzramer - Activity 4
9 pages
Stata Instructions
No ratings yet
Stata Instructions
7 pages
DS Final Project
No ratings yet
DS Final Project
20 pages
AMCAT Data Analysis
No ratings yet
AMCAT Data Analysis
18 pages
Break-Out Session 3b Winner PLC Salary Discrimination?: Kristin Fridgeirsdottir Data Analytics For Leaders
No ratings yet
Break-Out Session 3b Winner PLC Salary Discrimination?: Kristin Fridgeirsdottir Data Analytics For Leaders
6 pages
African Journal of Advanced Pure and Applied Sciences (AJAPAS)
No ratings yet
African Journal of Advanced Pure and Applied Sciences (AJAPAS)
13 pages
A Model To Predict Pay Scale Fixation in Job Marke
No ratings yet
A Model To Predict Pay Scale Fixation in Job Marke
6 pages
SSRN Id3990877
No ratings yet
SSRN Id3990877
8 pages
Data Scientist
No ratings yet
Data Scientist
10 pages
Adult Income Prediction Using Machine Learning Algorithms: Submitted by
No ratings yet
Adult Income Prediction Using Machine Learning Algorithms: Submitted by
9 pages
Categorical Predictor S
No ratings yet
Categorical Predictor S
41 pages
Machine Learning VIVEK
80% (5)
Machine Learning VIVEK
118 pages
EXAM PAPER FORMAT Statistics Question SET A 1
No ratings yet
EXAM PAPER FORMAT Statistics Question SET A 1
11 pages
Reddy Ranjith Kumar - Project
No ratings yet
Reddy Ranjith Kumar - Project
13 pages
DSci Homework Assignment V4
No ratings yet
DSci Homework Assignment V4
2 pages
MCD2080 Business Statistics Group Assignment-Final
No ratings yet
MCD2080 Business Statistics Group Assignment-Final
5 pages
M7 Homework
No ratings yet
M7 Homework
4 pages
RajivRanjan CapstoneProjectFinalReport HRData PGP-DSBA Sep2022-23
No ratings yet
RajivRanjan CapstoneProjectFinalReport HRData PGP-DSBA Sep2022-23
32 pages
12 Maths 1
No ratings yet
12 Maths 1
2 pages
12 Maths 2
No ratings yet
12 Maths 2
2 pages
Marketing Project Guidelines
No ratings yet
Marketing Project Guidelines
9 pages
Economics Holiday Assignment: Sahhana Xii B
No ratings yet
Economics Holiday Assignment: Sahhana Xii B
29 pages
IMS - DB Presentation2
No ratings yet
IMS - DB Presentation2
25 pages
MoTeC - PDM30
No ratings yet
MoTeC - PDM30
10 pages
Introduction To The Fifth Edition - 2011 - The Technique of Film and Video Editi
No ratings yet
Introduction To The Fifth Edition - 2011 - The Technique of Film and Video Editi
7 pages
GNSS-product Overview UBX-14000426
No ratings yet
GNSS-product Overview UBX-14000426
4 pages
Anti Forensics
No ratings yet
Anti Forensics
49 pages
BUSI1319 Proposal R2
No ratings yet
BUSI1319 Proposal R2
14 pages
TD000090 01 PUB Kardio650Cat 181202
No ratings yet
TD000090 01 PUB Kardio650Cat 181202
2 pages
1 Public Transportation System in Pokhara1
100% (1)
1 Public Transportation System in Pokhara1
9 pages
Valvula de Expansión Electronica
No ratings yet
Valvula de Expansión Electronica
4 pages
Supply Chain Management and E-Commerce
No ratings yet
Supply Chain Management and E-Commerce
6 pages
GS Del Ecp 103
No ratings yet
GS Del Ecp 103
55 pages
Thorn Outdoor Catalogue
100% (1)
Thorn Outdoor Catalogue
46 pages
CS8481 - Set1
No ratings yet
CS8481 - Set1
8 pages
Selenium With BDD Topics
No ratings yet
Selenium With BDD Topics
3 pages
Terminal Shortcut
No ratings yet
Terminal Shortcut
4 pages
TRP 5000
No ratings yet
TRP 5000
112 pages
Fast Hub Floating Point Adder
No ratings yet
Fast Hub Floating Point Adder
5 pages
(Ebook PDF) Accounting Information Systems 11th Edition by Patrick Wheeler Instant Download
No ratings yet
(Ebook PDF) Accounting Information Systems 11th Edition by Patrick Wheeler Instant Download
53 pages
Year 6 Maths Curriculum Guide
No ratings yet
Year 6 Maths Curriculum Guide
3 pages
Michelin X Works d2 - Product Sheet - en (1) - 2021
No ratings yet
Michelin X Works d2 - Product Sheet - en (1) - 2021
2 pages
Ucs415 MST With Solutions
No ratings yet
Ucs415 MST With Solutions
9 pages
Procedure For Evaluation and Selection
100% (1)
Procedure For Evaluation and Selection
2 pages
Uber Interview Questions and Answers 43612
No ratings yet
Uber Interview Questions and Answers 43612
12 pages
IE5202
No ratings yet
IE5202
1 page
SSRN 5226168
No ratings yet
SSRN 5226168
56 pages
Solis S6 Pro Inverter Setup Guide
No ratings yet
Solis S6 Pro Inverter Setup Guide
6 pages
CN Practical File
No ratings yet
CN Practical File
57 pages
Battle of Bots 2024 Line Tracing
No ratings yet
Battle of Bots 2024 Line Tracing
2 pages
Lesson 1 Quadratic Equation SY 2022 - 2023
No ratings yet
Lesson 1 Quadratic Equation SY 2022 - 2023
14 pages
Block Diagram of Computer System
No ratings yet
Block Diagram of Computer System
7 pages

Data Collection

Uploaded by

Data Collection

Uploaded by

DATA COLLECTION

DATA ATTRIBUTE DESCRIPTION

Gender Indicates the gender of the employee (e.g.,

Salary Annual salary earned by the employee

Job Title The designation or position held by the

Years of Experience Number of years the employee has worked

Education Level Highest qualification achieved (e.g.,

Location Geographical location of the job or employee

Q: How did you clean the data?

Data Cleaning & Preparation:

-​ Removed null or inconsistent records.

-​ Applied feature scaling to numeric columns for consistent model input.

-​ Separated data into training and test sets (80:20 split).

BUILD YOUR PROTOTYPE

Tools & Libraries Used:

Pandas for data manipulation

Scikit-learn for building and training the model

Matplotlib and Seaborn for visualization

Defined features (X) and target (y as salary).

Trained a LinearRegression model using sklearn.

Evaluated performance using R² Score and Mean Absolute Error (MAE).

Visualized actual vs predicted values to understand how the model fits.

Q: What model did you build and why?

Q: What tools did you use?

SECTION: TEST YOUR SOLUTION

Q: How did you evaluate the model?□

Q: What did the results show?□

Data Cleaning & Preparation:

Removed null or inconsistent records.

Applied feature scaling to numeric columns for consistent model input.

Separated data into training and test sets (80:20 split).

TEST YOUR SOLUTION

Predicted salaries were compared against actual salaries.

Analyzed whether salary predictions showed consistent discrepancies by gender.

Q: How did you evaluate the model?

Q: What did the results show?

Q: How did you evaluate the model?

Q: What did the results show?□

The model successfully captured general salary trends.

You might also like

- Removed null or inconsistent records.

- Applied feature scaling to numeric columns for consistent model input.

- Separated data into training and test sets (80:20 split).