0% found this document useful (0 votes)
17 views4 pages

Data Collection

The project involved collecting gender pay gap data from reliable sources and cleaning it for analysis. A Linear Regression model was developed using Python to predict salaries based on factors like experience and education, revealing a noticeable salary gap even after controlling for these variables. The findings confirmed the existence of wage disparity and suggested that AI can help identify such patterns, with potential for further refinement of the model.

Uploaded by

sahhana kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views4 pages

Data Collection

The project involved collecting gender pay gap data from reliable sources and cleaning it for analysis. A Linear Regression model was developed using Python to predict salaries based on factors like experience and education, revealing a noticeable salary gap even after controlling for these variables. The findings confirmed the existence of wage disparity and suggested that AI can help identify such patterns, with potential for further refinement of the model.

Uploaded by

sahhana kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

DATA COLLECTION

I began the project by identifying reliable sources for gender pay gap data. I used datasets from
platforms such as Kaggle and government labor databases. These datasets included features
like:

DATA ATTRIBUTE DESCRIPTION

Gender Indicates the gender of the employee (e.g.,


Male, Female)

Salary Annual salary earned by the employee

Job Title The designation or position held by the


employee

Years of Experience Number of years the employee has worked

Education Level Highest qualification achieved (e.g.,


Bachelor's, Master's)

Location Geographical location of the job or employee

Q: How did you clean the data?


I removed missing values, duplicates, encoded categorical variables, and normalized numerical
fields.

Data Cleaning & Preparation:

-​ Removed null or inconsistent records.

-​ Converted categorical data (e.g., gender, education level) using label encoding.

-​ Applied feature scaling to numeric columns for consistent model input.

-​ Separated data into training and test sets (80:20 split).


-​ This preprocessing ensured the dataset was clean, balanced, and suitable for AI
modeling.

BUILD YOUR PROTOTYPE


To analyze patterns and predict salary based on features like experience, job title, and
education, I developed a Linear Regression Model using Python.

Tools & Libraries Used:

Pandas for data manipulation

Scikit-learn for building and training the model

Matplotlib and Seaborn for visualization

Steps Taken:

Defined features (X) and target (y as salary).

Trained a LinearRegression model using sklearn.

Evaluated performance using R² Score and Mean Absolute Error (MAE).

Visualized actual vs predicted values to understand how the model fits.

The prototype highlighted pay disparity patterns by comparing salaries for similar experience
and roles, across different genders.

Q: What model did you build and why?


I built a Linear Regression model to predict salary based on factors like experience, education,
and job role.

Q: What tools did you use?


Python, Pandas, Scikit-learn, Matplotlib, and Seaborn for model development and visualization.

SECTION: TEST YOUR SOLUTION

Q: How did you evaluate the model?□


I used R² score and Mean Absolute Error (MAE) on test data to assess model performance.

Q: What did the results show?□


The model revealed a noticeable salary gap even after controlling for experience and education,
supporting our hypothesis.

Data Cleaning & Preparation:

Removed null or inconsistent records.

Converted categorical data (e.g., gender, education level) using label encoding.

Applied feature scaling to numeric columns for consistent model input.

Separated data into training and test sets (80:20 split).

This preprocessing ensured the dataset was clean, balanced, and suitable for AI modeling.

TEST YOUR SOLUTION


To validate the effectiveness of the model, I tested it using unseen test data.

Testing Approach:

Predicted salaries were compared against actual salaries.

Analyzed whether salary predictions showed consistent discrepancies by gender.

Created scatter plots and regression lines to visually compare actual vs predicted values.

Q: How did you evaluate the model?


I used R² score and Mean Absolute Error (MAE) on test data to assess model performance.

Q: What did the results show?


The model revealed a noticeable salary gap even after controlling for experience and education,
supporting our hypothesis

Q: How did you evaluate the model?


I used R² score and Mean Absolute Error (MAE) on test data to assess model performance.

Q: What did the results show?□


The model revealed a noticeable salary gap even after controlling for experience and education,
supporting our hypothesis
Findings:

The model successfully captured general salary trends.

Predicted values exposed subtle gender-based gaps, even when controlling for other variables.

Conclusion of Testing:
The results confirmed the existence of wage disparity and supported our hypothesis that AI can
assist in identifying such patterns. Though the model is a prototype, it can be refined further for
higher accuracy and fairness audits.

You might also like