CAPSTONE PROJECT
EMPLOYEE SALARY PREDICTION
   Presented By:
   1. Student Name- Balkrishna Shukla
   2. College Name- Government Engineering College, Ajmer
   3. Department- CSE Cyber Security
OUTLINE
 Problem Statement
 System Development Approach
 Algorithm & Deployment (Step by Step Procedure)
 Result
 Conclusion
 Future
 References
PROBLEM STATEMENT
• In today’s data-driven world, predicting employee salaries is essential
    for companies to ensure fair compensation and workforce planning.
    Organizations deal with massive employee data that includes factors
    like education, experience, job role, and working hours.
• Manually analyzing this data to estimate salaries is time-consuming
    and prone to bias.
• A machine learning–based approach can provide accurate and
    automated salary predictions by learning from historical data.
•   This project aims to build a predictive model that helps HR teams
    make data-backed salary decisions efficiently and fairly.
SYSTEM APPROACH
 Data Collection & Cleaning – Used the Adult Income dataset, removed missing
  values, and cleaned data.
 Data Encoding – Converted categorical values (like education, job, gender) into
  numeric form using Label Encoding.
 Model Selection – Chose Random Forest Classifier for its high accuracy and ability
  to handle mixed data.
 Training & Testing – Split data (70% training, 30% testing) to build and validate
  the model.
 Evaluation – Measured accuracy, generated a classification report, and visualized
  key features affecting salary.
ALGORITHM & DEPLOYMENT
1. Problem Defined – Predict if an employee earns > ₹50K using data.
2. Data Collected – Used Adult Income dataset.
3. Data Cleaned & Encoded – Removed missing values and converted text to numbers.
4. Data Split – 70% training, 30% testing.
5. Model Built – Used Random Forest Classifier.
6. Model Trained & Tested – Learned from training data and predicted on test data.
7.Evaluation – Checked accuracy, classification report, and feature importance.
8. Result & Conclusion – Analyzed predictions and suggested future improvements.
RESULT
GITHUB LIN K:
https://github.com/bkshukla91/Employee-Salary-Prediction-using-Machine-Learning
ID: bkshukla91
CONCLUSION
Findings
 The model achieved high accuracy in predicting whether an employee earns more than ₹50K or not.
 Key factors influencing salary included education, occupation, hours worked, and age.
 Effectiveness
 Automates salary prediction, saving time and reducing bias in manual assessment.
 Provides insights into what factors impact salaries the most for better HR decision-making.
Challenges Faced
 Dataset contained missing and inconsistent values that required cleaning.
 Model performance depends on the quality and variety of available data.
Potential Improvements
 Use advanced models like XGBoost or Neural Networks for higher accuracy.
 Include more real-world employee data for better generalization.
FUTURE SCOPE(OPTIONAL)
 Integration with HR Systems – Can be connected to HR management software for real-time salary
  prediction.
 Advanced Algorithms – Implement models like XGBoost, LightGBM, or Neural Networks for
  improved accuracy.
 Larger and Live Datasets – Use real-time company data or government salary surveys to make the
  model more robust.
 Web and Mobile Dashboard – Build a user-friendly platform for HR teams to input employee data
  and get instant salary insights.
 Explainable AI (XAI) – Add explainability features to show why the model predicts a certain salary,
  improving trust.
 Global Expansion – Adapt the model for different countries, currencies, and industry standards for
  broader use.
REFERENCES
 UCI Machine Learning Repository – Adult Income Dataset
   https://archive.ics.uci.edu/ml/datasets/adult
 Scikit-learn Documentation
   https://scikit-learn.org/stable/
 Pandas & NumPy Official Documentation
   Used for data cleaning, preprocessing, and manipulation.
   https://pandas.pydata.org/
   https://numpy.org/
 Research Paper: “Income Classification using Machine Learning Algorithms” (IJERT, 2020)
   Provided insights into model selection for income prediction tasks.
 Seaborn & Matplotlib Documentation
   Used for data visualization and plotting feature importance.
   https://seaborn.pydata.org/
   https://matplotlib.org/
THANK YOU