0% found this document useful (0 votes)
48 views27 pages

Bhargav

The internship report by Dokala Bhargav details the completion of an internship at APSSDC focused on Artificial Intelligence and Machine Learning, specifically a project analyzing employee burnout. The report outlines the methodologies, learning objectives, and weekly activities undertaken during the internship, highlighting the development of a machine learning model for predicting employee burnout. Acknowledgments are made to faculty and the institution for their support throughout the internship experience.

Uploaded by

Karthik Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views27 pages

Bhargav

The internship report by Dokala Bhargav details the completion of an internship at APSSDC focused on Artificial Intelligence and Machine Learning, specifically a project analyzing employee burnout. The report outlines the methodologies, learning objectives, and weekly activities undertaken during the internship, highlighting the development of a machine learning model for predicting employee burnout. Acknowledgments are made to faculty and the institution for their support throughout the internship experience.

Uploaded by

Karthik Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

INTERNSHIP REPORT

An internship report submitted in partial fulfillment of the requirements of II B. Tech II Semester


of

BACHELOR OF TECHNOLOGY
in

COMPUTER SCIENCE AND ENGINEERING

by

Dokala
Bhargav
22ME1A0512
Under Supervision of
Mrs.D. Nagamalika
Assistant Professor
Department of CSE

Department of Computer Science and Engineering


RAMACHANDRA COLLEGE OF
ENGINEERING (AUTONOMOUS)
(Approved by AICTE, Affiliated to JNTUK, Kakinada) Accredited by NBA, NAAC A+
NH-16 Bypass, Vatluru (V), Eluru -534007, Eluru Dist., A.P

2024-2025
RAMACHANDRA COLLEGE OF ENGINEERING
(AUTONOMOUS)
(Approved by AICTE, Affiliated to JNTUK, Kakinada) Accredited by NBA, NAAC A+

NH-16 Bypass, Vatluru (V), Eluru -534007, Eluru Dist., A.P


Department of Computer Science and Engineering

CERTIFICATE

This is to certify that the “Internship report” submitted by Dokala Bhargav

22ME1A0512 is work done by him/her and submitted during 2023- 2024 academic year in
partial fulfilment of the requirements for the award of the degree of BACHELOR OF
TECHNOLOGY in COMPUTER SCIENCE AND ENGINEERING, at Edunet ,
APSSDC, IBM SKILLS .

Mrs.D. Nagamalika Dr. G. Chamundeswari

Assistant Professor, Dept. of CSE Professor, HOD- CSE

External Examiner
ACKNOWLEDGEMENT

I would like to take the opportunity to express our deep gratitude to all the people who
have extended their cooperation in various ways during my internship. It is my pleasure and
responsibility to acknowledge the help of all those individuals.

I have extended our sincere thanks Mrs.D. Nagamalika, Assistant Professor in the
Department of CSE for helping me the in successful completion of my internship.

I am very grateful to Dr. G. Chamundeswari, Head of the Department, Department of


Computer Science & Engineering for her encouragement in all respects in carrying throughout
my internship.

I would like to express my deepest gratitude to Dr. M. Muralidhara Rao, Principal and
Dr. S. Subrahmanya Sarma, Dean Academics, Ramachandra College of Engineering, Eluru
for their valuable suggestions during the preparation of draft in my document.

I express my deepest gratitude to The Management of Ramachandra College of


Engineering, Eluru for their support and encouragement in completing my internship and
providing me necessary facilities.

I sincerely thank all the faculty members and staff of the Department of CSE for their
valuable advices, suggestions and constant encouragement which played a vital role in carrying
out my internship.

Finally, I thank one and all who directly or indirectly helped me to complete my internship
successfully.

Dokala
Bhargav
22ME1A051
2
Abstract
The IBM Skills Build Artificial Intelligence and Machine Learning - 2024 internship program,
conducted at APSSDC, offered a comprehensive learning experience in the realms of Artificial
Intelligence (AI) and Machine Learning (ML). This immersive program aimed me to equip with
hands-on expertise in designing, developing, and deploying AI and ML models.

As part of the program, I undertook a project focused on Employee Burnout Analysis and
prediction, where I developed a machine learning model to predict the burn rate for employees
in a company. This involved a range of tasks, including data pre-processing, Data visualisation,
feature extraction, and model training. I leveraged various machine learning algorithms and
techniques to achieve accurate sentiment classification.

Through this program, participants gained invaluable experience in AI and ML, enhancing their
skills in data analysis, model development, and problem-solving. The internship provided a
unique opportunity for me to apply theoretical concepts to real-world problems, fostering a
deeper understanding of AI and ML applications in industry and commerce.
Programs and Opportunities at the Organization
1. APSSDC (Andhra Pradesh State Skill Development Corporation)
APSSDC enhances employability by offering skill development programs in emerging fields.
Key opportunities include:
• Skill Development Courses: Courses in AI, ML, Data Science, Cloud
Computing, and Digital Marketing.
• Industry Partnerships: Collaborations with industries to align training with
market demands.
• Internship and Placement Support: Connecting trained individuals with
companies for internships and job opportunities.
• Certification Programs: Certifications in partnership with industry leaders to
validate skills.
2. Edunet Foundation
Edunet Foundation focuses on cutting-edge technology education, providing opportunities like:
• Technology Training: Specialized programs in AI, ML, and Data Science, with both
theoretical and practical experience.
• Workshops and Bootcamps: Hands-on exposure through organized workshops and
bootcamps.
• Mentorship and Career Support: Guidance and mentorship for transitioning from
education to employment.
• Collaborations with Industry Leaders: Access to tools, resources, and certifications
through industry partnerships.
3. IBM
IBM offers a wide range of learning programs through SkillsBuild, helping individuals develop
skills in technology. Key offerings include:
• IBM SkillsBuild: Free learning resources and certifications in AI, Data Science, Cloud
Computing, and Cybersecurity.
• Internship and Job Opportunities: Opportunities for real-world experience in tech
domains.
• Research and Innovation Projects: Involvement in cutting-edge projects in AI, quantum
computing, and blockchain.
• Global Learning Initiatives: Access to global learning programs and industry-recognized
certifications.
Methodologies
Learning AI and ML methodologies involves a structured approach combining
theoretical understanding with practical application. Beginning with foundational
concepts of machine learning and artificial intelligence, learners delve into data
preprocessing and visualization to model building. Hands-on projects and guided
tutorials provide opportunities to apply these methodologies to real-world
problems, Supervised learning. Additionally, engaging with online courses,
documentation, and community forums fosters continuous learning and skill
development in the rapidly evolving field of AI and ML.

Key parts of the report


➢ Analyzed and explored diverse datasets using relevant tools and techniques.
➢ Leveraged the power of Alter for data cleaning, transformation, and
advanced analytics. Automated routine tasks and workflows to improve
efficiency and productivity.
➢ Embraced the principles of agile methodologies for project management
and development.

Benefits of the Company/Institution through my report


Edu Skills offers diverse education and skill programs, while NEAT Cell and AICTE focus on
technology-driven learning and quality technical education. The collaboration ensures a
dynamic, industry-aligned ecosystem, enhancing practical skills and preparing individuals for
the evolving workforce, fostering a well-rounded and future-ready skill set.

➢ Industry relevant Skill Development


➢ Technology Driven Learning
➢ Bridging Academia and Industry
➢ Hands on Training
➢ Inclusive Education
➢ Empowering Talent
➢ Future Ready Workforce
INDEX

S. No Contents Page No
1
Abstract 1
2
Introduction to Company/Institution 2-3
3
Internship Certificate 4
4
Learning Objectives/Internship Objectives 6
5
Weekly overview of internship activities 7-9
6
Introduction to Internship Topic 10
Module1:Datapreprocessing and
Visualization.
Module 2 :Feature encoding and model
Scaling. Module3:Train-test split
for development. model
6.1 Module4:Linear regression 13-21
development
Module5:Model evaluation matrix Future
Module6: Deployment and
enhancements.

7
Analysis 21-22
8
Software Requirement Specification 22
9
Technology 22
10
Coding 22-25
11
Conclusion 25
12
Bibliography 25-29
Learning Objectives/Internship Objectives

• Internships are generally thought of to be reserved for college students


looking to gain experience in a particular field. However, a wide array of
people can benefit from Training Internships in order to receive real world
experience and develop their skills.

• An objective for this position should emphasize the skills you already possess
in the area and your interest in learning more.

• Internships are utilized in a number of different career fields, including


architecture, engineering, healthcare, economics, advertising and many
more.

• Some internship is used to allow individuals to perform scientific research


while others are specifically designed to allow people to gain first-hand
experience working.

• Utilizing internships is a great way to build your resume and develop skills
that can be emphasized in your resume for future jobs. When you are applying
for a Training Internship, make sure to highlight any special skills or talents
that can make you stand apart from the rest of the applicants so that you have
an improved chance of landing the position.
WEEKLY OVERVIEW OF INTERNSHIP ACTIVITIES

Week Date Day Name of topic / Module completed


27-05-2024 Monday Preparation for the project
28-05-2024 Tuesday Research on Employee Burnout Analysis techniques and tools
29-05-2024 Wednesday Dataset selection and exploration
I 30-05-2024 Thursday Literature review on Employee Burnout Analysis
31-05-2024 Friday Dataset preparation and validation
01-06-2024 Saturday Summary of initial findings and planning next steps

Week Date Day Name of topic / Module completed


03-06-2024 Monday Summary of initial findings and planning next steps
04-06-2024 Tuesday Basics of Natural Language Processing (NLP)
05-06-2024 Wednesday Data collection: sourcing text data
II 06-06-2024 Thursday Data preprocessing: cleaning text
07-06-2024 Friday Organizing and tokenizing text data
08-06-2024 Saturday Feature extraction basics for text data

Week Date Day Name of topic / Module completed


10-06-2024 Monday Exploratory Data Analysis (EDA): Text patterns
11-06-2024 Tuesday Identifying employee burnout trends in data
12-06-2024 Wednesday Implementing a baseline Naive Bayes model
III 13-06-2024 Thursday Testing and validating the baseline model
14-06-2024 Friday Introduction to Support Vector Machines (SVM) for NLP
15-06-2024 Saturday Regression

Week Date Day Name of topic / Module completed


17-06-2024 Monday Introduction to Deep Learning for Employee Burnout Analysis
18-06-2024 Tuesday Basics of Recurrent Neural Networks (RNNs)
19-06-2024 Wednesday Implementing LSTM for Employee Burnout Analysis
IV 20-06-2024 Thursday Evaluating LSTM performance
21-06-2024 Friday Introduction to GRU for text analysis
22-06-2024 Saturday Implementing GRU Employee Burnout Analysis model

Week Date Day Name of topic / Module completed


24-06-2024 Monday Introduction to pre-trained models: BERT, ROBERT
25 -06-2024 Tuesday Fine-tuning BERT for Employee Burnout classification
26 -06-2024 Wednesday Fine-tuning ROBERT for Employee Burnout classification
V
27 -06-2024 Thursday Comparing BERT and ROBERT performances
28 -06-2024 Friday Optimization techniques for pre-trained models
29 -06-2024 Saturday Creating a pipeline for pre-trained model inference
Week Date Day Name of topic / Module completed
01 -07-2024 Monday Model evaluation metrics: accuracy, precision, recall
02 -07-2024 Tuesday Optimizing models for better performance
03-07-2024 Wednesday Visualization techniques: Employee Burnout results charts
VI 04 -07-2024 Thursday Creating dashboards for Employee Burnout Analysis
visualization
05-07-2024 Friday Finalizing optimized models
06-07-2024 Saturday Integration of visual tools into the workflow

Week Date Day Name of topic / Module completed


08-07-2024 Monday Model deployment strategies
09-07-2024 Tuesday Building a user interface for the Employee Burnout Analysis
model
VII 10-07-2024 Wednesday Testing the model with real-world data
11-07-2024 Thursday Debugging and refining the deployment pipeline
12-07-2024 Friday Final presentation preparation
13-07-2024 Saturday Rehearsal of the project presentation

Week Date Day Name of topic / Module completed


15-07-2024 Monday Compilation of project documentation
16-07-2024 Tuesday Writing the final report: methodology
17-07-2024 Wednesday Writing the final report: results and analysis
VIII 18-07-2024 Thursday Writing the final report: conclusions and recommendations
19-07-2024 Friday Reviewing and refining the final report
20-07-2024 Saturday Compilation of project documentation
Introduction to Internship Topic:

Artificial Intelligence (AI) and Machine Learning (ML) represent cutting-edge technologies

transforming the landscape of computer science and beyond.

Artificial Intelligence (AI): AI involves the creation of computer systems capable of

performing tasks that typically require human intelligence. This encompasses a range of

capabilities such as learning, reasoning, problem-solving, understanding natural language, and

perception. AI can be categorized into Narrow AI, designed for specific tasks, and General AI,

aspiring to human-like intelligence across diverse activities.

Machine Learning (ML): ML is a subset of AI that focuses on enabling computers to learn

from data and experiences, improving performance without explicit programming. Three main

types of ML include:

1. Supervised Learning: Learning from labelled data.

2. Unsupervised Learning: Identifying patterns in unlabelled data.

3. Reinforcement Learning: Learning from interaction with an environment, guided by rewards

or penalties.

AIML refers to a suite of tools, frameworks, and services provided by Google to enable

developers and businesses to harness the power of artificial intelligence (AI) and machine

learning (ML). AIML encompasses various products and platforms designed to facilitate tasks

such as data analysis, pattern recognition, natural language processing, image recognition, and

more.
Some key components of AIML include:

1. TensorFlow: An open-source machine learning framework developed by Google for

building and training ML models.

2. Google Cloud AI: A set of AI services offered on Google Cloud Platform (GCP),

including Vision AI, Speech-to-Text, Natural Language Processing (NLP), Translation API, and

more.

3. Google AI Platform: A managed service that enables developers and data scientists to

build, deploy, and scale ML models efficiently.

4. AutoML: Google's suite of tools for automating the process of building and deploying

custom machine learning models without requiring extensive expertise in ML.

5. Google AI Research: Google's research division dedicated to advancing the field of

artificial intelligence through fundamental research and development of new AI techniques and

algorithms.

Modules in AI- ML Internship:

There are 6 modules present in the AI-ML Internship.

➢ Data Preprocessing and Visualization


➢ Feature Encoding and Scaling
➢ Train-Test Split for Model Building
➢ Linear Regression Model Development
➢ Model Evaluation Metrics
➢ Deployment and Future enhancement
Module – 1 Data Preprocessing and Visualisation:

This module involves preparing the dataset, handling missing values, and performing exploratory
data analysis (EDA) using visualizations.

1. Data Loading

Objective: Load the dataset and understand its structure

Code:

import pandas as pd

df = pd.read_excel('employee_burnout_analysis-AI.xlsx')

print(df.head()) # Display the first few rows print(df.tail())

# Display the last few rows print(df.describe()) # Summary

statistics print(df.columns.to_list()) # List column names

print(df.nunique()) # Count of unique values in each

column print(df.info()) # Data type and memory usage

print(df.isnull().sum()) # Check for missing values

2. Correlation Analysis
Objective: Analyze relationships between features and target variable.

Code:

correlation = df.corr(numeric_only=True)['Burn Rate'][:-1]

print("Correlation with Burn Rate:")

print(correlation)
3. Handling Missing Values and Dropping Unnecessary Columns

• Objective: Clean the dataset by removing null values and irrelevant columns.

• Code:

from matplotlib import pyplot as plt

plt.pie( [

df[(df['Company Type'] == 'Service') & (df['Gender'] == 'Male')].shape[0],

df[(df['Company Type'] == 'Service') & (df['Gender'] == 'Female')].shape[0],

df[(df['Company Type'] == 'Product') & (df['Gender'] == 'Male')].shape[0],

df[(df['Company Type'] == 'Product') & (df['Gender'] == 'Female')].shape[0]

],

labels=['Male in Service', 'Female in Service', 'Male in Product', 'Female in Product'],

colors=["#ffa600", "#53777a", "#c02942", "#d95b43"]

plt.title('Share of Male and Female in Different Company Types') plt.show()


4. Burn Rate Visualization

Objective: Analyze employees with a burn rate greater than 50% across gender and company
types.

Code:
plt.pie
(
[
df[(df['Company Type'] == 'Service') & (df['Gender'] == 'Male') & (df['Burn Rate'] >
0.5)].shape[0], df[(df['Company Type'] == 'Service') & (df['Gender'] == 'Female') &
(df['Burn Rate'] >
0.5)].shape[0], df[(df['Company Type'] == 'Product') & (df['Gender'] == 'Male') &
(df['Burn Rate'] >
0.5)].shape[0], df[(df['Company Type'] == 'Product') & (df['Gender'] == 'Female') &
(df['Burn Rate'] > 0.5)].shape[0]
],
labels=['Male in Service > 50%', 'Female in Service > 50%', 'Male in Product >
50%', 'Female in Product > 50%'], colors=["#ffa600",
"#53777a", "#c02942", "#d95b43"]
)
plt.title('Burn Rate > 50% by Gender and Company
Type') plt.show()
Module 2: Feature Encoding and Scaling
1. One-Hot Encoding of Categorical Features
Objective: Convert categorical features into numerical ones using one-hot encoding.
Code:
if all(col in df.columns for col in ['Company Type', 'WFH Setup Available', 'Gender']):
df = pd.get_dummies( df,
columns=['Company Type', 'WFH Setup Available', 'Gender'],
drop_first=True # Avoid multicollinearity
)
print("One-Hot Encoding Completed. Encoded Columns:")
print(df.head())
else:
print("One or more specified columns are missing in the DataFrame.")

2. Define Target and Feature Variables


Objective: Separate the target variable ('Burn Rate') and the feature variables to prepare for
training.
Code:
y = df['Burn Rate'] # Target variable
X = df.drop('Burn Rate', axis='columns') # Feature variables print("Target
and Features separated successfully.")

3. Scaling the Features


Objective: Standardize the feature variables to improve model performance by bringing them
to the same scale.
Code: from sklearn.preprocessing import
StandardScaler import pandas as pd scaler =
StandardScaler() scaler.fit(X_train)
X_train = pd.DataFrame(scaler.transform(X_train), index=X_train.index,
columns=X_train.columns)
X_test = pd.DataFrame(scaler.transform(X_test),
index=X_test.index, columns=X_test.columns) print("Feature
scaling completed for training and testing datasets.")
Module 3: Train-Test Split for Model Development
1. Imortance of splitting

Objective: Ensure that the machine learning model is trained on a subset of the data (training set)
and evaluated on unseen data (testing set).

Key Benefits:

➢ Prevents overfitting by testing the model on unseen data.

➢ Helps estimate the generalization ability of the model.

➢ Ensures fairness by maintaining the independence of training and testing datasets.

2. Code Implementation

Step 1: Separation of features and Target

Objective: Divide the dataset into


X: Features (independent variables)
y: Target variable (dependent variable, 'Burn Rate' in this case)

Code:

# Separate the target and features

y = df['Burn Rate'] # Target variable

X = df.drop('Burn Rate', axis='columns') # Feature variables print(f"Target

Variable Shape: {y.shape}")

print(f"Feature Variables Shape: {X.shape}")

3. Why shuffle data?

• Shuffling ensures the data is randomly distributed, preventing bias caused by patterns in
the dataset.

• Randomization is crucial, especially when the data is ordered (e.g., based on time
or categories).
4. Insights

➢ Balanced Training and Testing Sets: The split ensures the model is trained on a
representative subset of the data and evaluated on unseen data for unbiased performance
evaluation.

➢ Prevention of Data leakage: The separation guarantees that no information from the
test set is used during training.

➢ Reproducibility: The use of a fixed random seed (random_state) ensures consistent


results across different runs.
Module 4: Linear Regression Model Development

In this module, we develop a Linear Regression model to predict the burnout rate of
employees. Linear Regression is a foundational machine learning algorithm suitable for
problems where the target variable is continuous, such as our burnout rate prediction task.

1. What is Linear Regression?


Linear Regression finds a linear relationship between the independent variables (features)
and the dependent variable (target).
The equation for the model is:
y = β₀ + β₁X₁ + β₂X₂ + ... + βnXn + ε
where:

• y is the target variable.


• β₀ is the intercept.
• β₁, β₂, ..., βn are the coefficients of the features.
• ε is the error term.


2. Code Implementation
Objective: Fit the Linear Regression model to the training data.
Code:
from sklearn.linear_model import
LinearRegression # Initialize the Linear Regression
model linear_regression_model =
LinearRegression() # Train the model on the
training dataset
linear_regression_model.fit(X_train, y_train)
print("Model Training Completed.")

Module 5: Model Evaluation Metrics

This module delves into evaluating the performance of the Linear Regression model using
quantitative metrics. These metrics help us understand the accuracy, reliability, and limitations
of the model in predicting employee burnout rates.

1. Importance of Model evaluation Metrics

Evaluation metrics are critical in determining the effectiveness of the model and its ability to
generalize to unseen data.

They provide insights into areas where the model performs well and where it may require
improvement.

2. Key Metrics for Linear Regression

The following metrics are commonly used to evaluate regression models:

1. Mean Squared Error (MSE):

➢ Measures the average squared difference between actual and predicted values.
➢ Interpretation: A smaller MSE value indicates better model performance.

2. Root Mean Squared Error (RMSE):

➢ Square root of the MSE, providing the error in the same unit as the target variable.
➢ Interpretation: Easier to interpret than MSE due to its unit equivalence.
3. Mean Absolute Error (MAE):

➢ Measures the average absolute difference between actual and predicted values.
➢ Interpretation: Provides a straightforward measure of prediction error.

4. R-Squared (R²):
Indicates the proportion of variance in the target variable explained by the model.

➢ Interpretation: R2 ranges between 0 and 1.Higher values indicate that the model
explains more variability in the data.

3. Code Implementation

Objective: Calculate the evaluation metrics for the predictions made by the
model.
Code: from sklearn.metrics import mean_squared_error, mean_absolute_error,
r2_score
# Calculate metrics mse = mean_squared_error(y_test,
y_pred) rmse = mean_squared_error(y_test, y_pred,
squared=False) mae = mean_absolute_error(y_test,
y_pred) r2 = r2_score(y_test, y_pred)
# Display metrics print("Model Evaluation Metrics:")
print(f"Mean Squared Error (MSE): {mse:.4f}")
print(f"Root Mean Squared Error (RMSE):
{rmse:.4f}") print(f"Mean Absolute Error (MAE):
{mae:.4f}") print(f"R-Squared (R²): {r2:.4f}")
Module 6: Deployment and Future Enhancements
Deployment
The project can be deployed using cloud platforms like Google Colab or local
servers. Deployment steps include hosting models on the cloud, building a user
interface with Flask or Django, and integrating with HR databases for
automated data collection and analysis. Future Enhancements
1. Model Optimization: Use advanced algorithms like Random Forest
or deep learning for better accuracy.
2. Expanded Data Sources: Include additional datasets and real-time data
for improved predictions.
3. Improved User Experience: Add dashboards and multilingual support for
wider usability.
4. HR System Integration: Connect with HR tools for actionable insights.
5. Scalability: Use cloud platforms like AWS or Google Cloud for
large- scale deployment.
Analysis:
Google AI ML represents a pivotal force in modern technology, fundamentally
altering how businesses innovate and individuals interact with digital platforms.
Through a diverse array of capabilities ranging from cutting-edge machine
learning models and frameworks like TensorFlow to accessible tools such as
AutoML and Google Cloud AI APIs, Google empowers developers and
enterprises to harness the potential of artificial intelligence. The impact of Google
AI ML spans across various sectors, from healthcare and finance to e-commerce
and beyond, revolutionizing processes with applications like image analysis,
language translation, and recommendation systems. However, alongside its
transformative power, Google AI ML also presents challenges, including ethical
considerations surrounding data privacy and algorithmic bias. Yet, with ongoing
research initiatives and collaborations fueling innovation.

Software Requirement Specification:


The Software Requirements Specification (SRS) for AI and ML systems
provides a concise yet comprehensive blueprint for the development,
deployment, and management of intelligent software. In outlining the system's
scope, functionalities, and constraints, the document establishes a clear
understanding of project boundaries and objectives. The SRS delves into the
intricacies of AI/ML models, detailing their training data requirements, expected
performance metrics, and deployment processes.
Technology:
Programming Languages: Python(for implementation and analysis)
Libraries and Frameworks:
Numpy and Pandas : For data manipulation and preprocessing.
Scikit-learn: For implementing machine learning Algorithms like Linear
Regression and Evaluation metrics.
Matplotlib: For data visualization
Development Environment: Jupyter Notebook or PyCharm for writing and testing
Python Code.
Dataset: Text datasets in CSV format for model construction
Hardware Requirements: A system with atleast 4GB RAM and an updated
Python Environment.
Others: Online resources and APIs for dataset access, along with cloud-based
libraries where necessary.

Coding:
import pandas as pd
from sklearn.model_selection
import train_test_split from
sklearn.preprocessing import StandardScaler
from sklearn.linear_model import
LinearRegression from sklearn.metrics
import mean_squared_error,mean_absolute_error,r2_score from matplotlib
import pyplot as plt
df=pd.read_excel('employee_burnout_analysis-AI.xlsx')
df.head()
df.tail()
df.describe()
df.columns.to_list()
df.nunique()
df.info()
df.isnull().sum()
df.corr(numeric_only=True)['Burn Rate'][:-
1] df.dropna(inplace=True)
df.drop('EmployeeID',axis='columns',inplace=True
) plt.pie(
[
df[(df['Company Type']=='Service')&(df['Gender']=='Male')].shape[0],
df[(df['CompanyType']=='Service')&(df['Gender']=='Female')].shape[0]
, df[(df['Company Type']=='product')&(df['Gender']=='Male')].shape[0],
df[(df['Company Type']=='Product')&(df['Gender']=='Female')].shape[0]
],
labels=['Male in Service based company','Female in Service based company','Male in
Product based company','Female in Product based company'],
colors=["#ffa600", "#53777a", "#c02942", "#d95b43"], )
plt.title('Share of Male and Female Working in different companys')
plt.show()
plt.pie( [ df[(df['Company Type']=='Service') & (df['Gender']=='Male') & (df['Burn
Rate']>0.5)].shape[0], df[(df['Company Type']=='Service') &
(df['Gender']=='Female') & (df['Burn
Rate']>0.5)].shape[0], df[(df['Company Type']=='product') &
(df['Gender']=='Male') & (df['Burn
Rate']>0.5)].shape[0], df[(df['Company Type']=='Product') &
(df['Gender']=='Female') & (df['Burn Rate']>0.5)].shape[0]
],
labels=['Male in Service based company','Female in Service based company','Male in
Product based company','Female in Product based company'],
colors=["#ffa600", "#53777a", "#c02942", "#d95b43"],
)
plt.title('Share Male and Female in types of companies with Burn Rate Greater than 50
percent') plt.show() if all(col in df.columns for col in ['Company Type','WFH Setup
Available','Gender']):
df=pd.get_dummies(df,columns=['Company Type', 'WFH Setup
Available','Gender'],drop_first=True )
df.head()
encoded_columns=df.columns else:
print('One or more Specified Columns are missing in the present DataFrame')
y=df['Burn Rate']
X=df.drop('Burn Rate',axis='columns')
#train-test split
X_train,X_test,y_train,
y_test=train_test_split(X,y,train_size=0.7,shuffle=True,random_state=1)
scaler=StandardScaler()
scaler.fit(X_train)
X_train=pd.DataFrame(scaler.transform(X_train),
index=X_train.index,columns=X_train.colum ns)
X_test=pd.DataFrame(scaler.transform(X_test),
index=X_test.index,columns=X_test.columns)
X_train.dropna(inplace=True)
y_train.dropna(inplace=True)
linear_regression_model=LinearRegression()
linear_regression_model.fit(X_train,y_train)
print("Model Evaluation Mterics")
y_pred=linear_regression_model.predict(X_test)
mse=mean_squared_error(y_test,y_pred)
print("Mean Squared Error:",mse,sep=' ')
rmse=mean_squared_error(y_test,y_pred,squared=False)
print("Root Mean Squared Error",rmse,sep=' ')
mae=mean_absolute_error(y_test,y_pred)
print("Mean Absolute Error",mae,sep=' ')
r2=r2_score(y_test,y_pred) print("R-squared
Error”,r2)
Conclusion:
This Internship provided a Comprehensive understanding of Machine Learning covering both
theoretical concepts and practical implementation. Through the step-by-step exploration of
data preprocessing techniques, machine learning algorithms and data visualization tools, we
successfully developed a Employee Burnout Prediction and Analysis Model capable of
predicting by how much percentage he is willing to leave the company. We have gained
hands- on experience in model construction and evaluation. The internship not only enhanced
our technical skills but also deepend our understanding of real-worls applications of Machine
Learning to handle similar challenges in Future.

Online Resources
1. Google AI website: https://ai.google/
Provides a wealth of information on AI research, tools, and
techniques developed by Google.
2. TensorFlow Documentation: https://www.tensorflow.org/
Official documentation for TensorFlow, an open-source machine
learning framework for building neural networks, which is widely used
for sentiment analysis tasks.
3. Google Cloud AI Documentation: https://cloud.google.com/ai
Information about Google’s cloud-based AI tools, including APIs for
Natural Language Processing (NLP), which can be used for sentiment
analysis.
Bibliography
1. Google AI website: https://ai.google/
2. TensorFlow documentation: https://www.tensorflow.org/ 3.
Google Cloud AI documentation: https://cloud.google.com/ai
Books:
1. "Deep Learning" by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.
2. "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow"
by Aurélien Géron.
3. "Machine Learning Yearning" by Andrew Ng (available for free on
his website).
Academic Papers:
1. "TensorFlow: A System for Large-Scale Machine Learning" by
Martín Abadi et al
(https://www.tensorflow.org/about/bib)
2. "Rethinking the Inception Architecture for Computer Vision" by
Christian Szegedy et al.
3. "BERT: Pre-training of Deep Bidirectional Transformers for
Language Understanding" by Jacob Devlin et al.
Industry Reports and Articles:
1. "TheCurrentStateofMachineLearning"byForbes
(https://www.forbes.com/sites/louiscolumbus/2021/09/05/the-current-state-of-
machine-learning)
2. "The State of Artificial Intelligence in 2021" by McKinsey & Company
(https://www.mckinsey.com/featured-insights/artificial-intelligence/the-state-
of- ai-in-2021)

You might also like