Bhargav
Bhargav
BACHELOR OF TECHNOLOGY
in
by
Dokala
Bhargav
22ME1A0512
Under Supervision of
Mrs.D. Nagamalika
Assistant Professor
Department of CSE
2024-2025
RAMACHANDRA COLLEGE OF ENGINEERING
(AUTONOMOUS)
(Approved by AICTE, Affiliated to JNTUK, Kakinada) Accredited by NBA, NAAC A+
CERTIFICATE
22ME1A0512 is work done by him/her and submitted during 2023- 2024 academic year in
partial fulfilment of the requirements for the award of the degree of BACHELOR OF
TECHNOLOGY in COMPUTER SCIENCE AND ENGINEERING, at Edunet ,
APSSDC, IBM SKILLS .
External Examiner
ACKNOWLEDGEMENT
I would like to take the opportunity to express our deep gratitude to all the people who
have extended their cooperation in various ways during my internship. It is my pleasure and
responsibility to acknowledge the help of all those individuals.
I have extended our sincere thanks Mrs.D. Nagamalika, Assistant Professor in the
Department of CSE for helping me the in successful completion of my internship.
I would like to express my deepest gratitude to Dr. M. Muralidhara Rao, Principal and
Dr. S. Subrahmanya Sarma, Dean Academics, Ramachandra College of Engineering, Eluru
for their valuable suggestions during the preparation of draft in my document.
I sincerely thank all the faculty members and staff of the Department of CSE for their
valuable advices, suggestions and constant encouragement which played a vital role in carrying
out my internship.
Finally, I thank one and all who directly or indirectly helped me to complete my internship
successfully.
Dokala
Bhargav
22ME1A051
2
Abstract
The IBM Skills Build Artificial Intelligence and Machine Learning - 2024 internship program,
conducted at APSSDC, offered a comprehensive learning experience in the realms of Artificial
Intelligence (AI) and Machine Learning (ML). This immersive program aimed me to equip with
hands-on expertise in designing, developing, and deploying AI and ML models.
As part of the program, I undertook a project focused on Employee Burnout Analysis and
prediction, where I developed a machine learning model to predict the burn rate for employees
in a company. This involved a range of tasks, including data pre-processing, Data visualisation,
feature extraction, and model training. I leveraged various machine learning algorithms and
techniques to achieve accurate sentiment classification.
Through this program, participants gained invaluable experience in AI and ML, enhancing their
skills in data analysis, model development, and problem-solving. The internship provided a
unique opportunity for me to apply theoretical concepts to real-world problems, fostering a
deeper understanding of AI and ML applications in industry and commerce.
Programs and Opportunities at the Organization
1. APSSDC (Andhra Pradesh State Skill Development Corporation)
APSSDC enhances employability by offering skill development programs in emerging fields.
Key opportunities include:
• Skill Development Courses: Courses in AI, ML, Data Science, Cloud
Computing, and Digital Marketing.
• Industry Partnerships: Collaborations with industries to align training with
market demands.
• Internship and Placement Support: Connecting trained individuals with
companies for internships and job opportunities.
• Certification Programs: Certifications in partnership with industry leaders to
validate skills.
2. Edunet Foundation
Edunet Foundation focuses on cutting-edge technology education, providing opportunities like:
• Technology Training: Specialized programs in AI, ML, and Data Science, with both
theoretical and practical experience.
• Workshops and Bootcamps: Hands-on exposure through organized workshops and
bootcamps.
• Mentorship and Career Support: Guidance and mentorship for transitioning from
education to employment.
• Collaborations with Industry Leaders: Access to tools, resources, and certifications
through industry partnerships.
3. IBM
IBM offers a wide range of learning programs through SkillsBuild, helping individuals develop
skills in technology. Key offerings include:
• IBM SkillsBuild: Free learning resources and certifications in AI, Data Science, Cloud
Computing, and Cybersecurity.
• Internship and Job Opportunities: Opportunities for real-world experience in tech
domains.
• Research and Innovation Projects: Involvement in cutting-edge projects in AI, quantum
computing, and blockchain.
• Global Learning Initiatives: Access to global learning programs and industry-recognized
certifications.
Methodologies
Learning AI and ML methodologies involves a structured approach combining
theoretical understanding with practical application. Beginning with foundational
concepts of machine learning and artificial intelligence, learners delve into data
preprocessing and visualization to model building. Hands-on projects and guided
tutorials provide opportunities to apply these methodologies to real-world
problems, Supervised learning. Additionally, engaging with online courses,
documentation, and community forums fosters continuous learning and skill
development in the rapidly evolving field of AI and ML.
S. No Contents Page No
1
Abstract 1
2
Introduction to Company/Institution 2-3
3
Internship Certificate 4
4
Learning Objectives/Internship Objectives 6
5
Weekly overview of internship activities 7-9
6
Introduction to Internship Topic 10
Module1:Datapreprocessing and
Visualization.
Module 2 :Feature encoding and model
Scaling. Module3:Train-test split
for development. model
6.1 Module4:Linear regression 13-21
development
Module5:Model evaluation matrix Future
Module6: Deployment and
enhancements.
7
Analysis 21-22
8
Software Requirement Specification 22
9
Technology 22
10
Coding 22-25
11
Conclusion 25
12
Bibliography 25-29
Learning Objectives/Internship Objectives
• An objective for this position should emphasize the skills you already possess
in the area and your interest in learning more.
• Utilizing internships is a great way to build your resume and develop skills
that can be emphasized in your resume for future jobs. When you are applying
for a Training Internship, make sure to highlight any special skills or talents
that can make you stand apart from the rest of the applicants so that you have
an improved chance of landing the position.
WEEKLY OVERVIEW OF INTERNSHIP ACTIVITIES
Artificial Intelligence (AI) and Machine Learning (ML) represent cutting-edge technologies
performing tasks that typically require human intelligence. This encompasses a range of
perception. AI can be categorized into Narrow AI, designed for specific tasks, and General AI,
from data and experiences, improving performance without explicit programming. Three main
types of ML include:
or penalties.
AIML refers to a suite of tools, frameworks, and services provided by Google to enable
developers and businesses to harness the power of artificial intelligence (AI) and machine
learning (ML). AIML encompasses various products and platforms designed to facilitate tasks
such as data analysis, pattern recognition, natural language processing, image recognition, and
more.
Some key components of AIML include:
2. Google Cloud AI: A set of AI services offered on Google Cloud Platform (GCP),
including Vision AI, Speech-to-Text, Natural Language Processing (NLP), Translation API, and
more.
3. Google AI Platform: A managed service that enables developers and data scientists to
4. AutoML: Google's suite of tools for automating the process of building and deploying
artificial intelligence through fundamental research and development of new AI techniques and
algorithms.
This module involves preparing the dataset, handling missing values, and performing exploratory
data analysis (EDA) using visualizations.
1. Data Loading
Code:
import pandas as pd
df = pd.read_excel('employee_burnout_analysis-AI.xlsx')
2. Correlation Analysis
Objective: Analyze relationships between features and target variable.
Code:
print(correlation)
3. Handling Missing Values and Dropping Unnecessary Columns
• Objective: Clean the dataset by removing null values and irrelevant columns.
• Code:
plt.pie( [
],
Objective: Analyze employees with a burn rate greater than 50% across gender and company
types.
Code:
plt.pie
(
[
df[(df['Company Type'] == 'Service') & (df['Gender'] == 'Male') & (df['Burn Rate'] >
0.5)].shape[0], df[(df['Company Type'] == 'Service') & (df['Gender'] == 'Female') &
(df['Burn Rate'] >
0.5)].shape[0], df[(df['Company Type'] == 'Product') & (df['Gender'] == 'Male') &
(df['Burn Rate'] >
0.5)].shape[0], df[(df['Company Type'] == 'Product') & (df['Gender'] == 'Female') &
(df['Burn Rate'] > 0.5)].shape[0]
],
labels=['Male in Service > 50%', 'Female in Service > 50%', 'Male in Product >
50%', 'Female in Product > 50%'], colors=["#ffa600",
"#53777a", "#c02942", "#d95b43"]
)
plt.title('Burn Rate > 50% by Gender and Company
Type') plt.show()
Module 2: Feature Encoding and Scaling
1. One-Hot Encoding of Categorical Features
Objective: Convert categorical features into numerical ones using one-hot encoding.
Code:
if all(col in df.columns for col in ['Company Type', 'WFH Setup Available', 'Gender']):
df = pd.get_dummies( df,
columns=['Company Type', 'WFH Setup Available', 'Gender'],
drop_first=True # Avoid multicollinearity
)
print("One-Hot Encoding Completed. Encoded Columns:")
print(df.head())
else:
print("One or more specified columns are missing in the DataFrame.")
Objective: Ensure that the machine learning model is trained on a subset of the data (training set)
and evaluated on unseen data (testing set).
Key Benefits:
2. Code Implementation
Code:
• Shuffling ensures the data is randomly distributed, preventing bias caused by patterns in
the dataset.
• Randomization is crucial, especially when the data is ordered (e.g., based on time
or categories).
4. Insights
➢ Balanced Training and Testing Sets: The split ensures the model is trained on a
representative subset of the data and evaluated on unseen data for unbiased performance
evaluation.
➢ Prevention of Data leakage: The separation guarantees that no information from the
test set is used during training.
In this module, we develop a Linear Regression model to predict the burnout rate of
employees. Linear Regression is a foundational machine learning algorithm suitable for
problems where the target variable is continuous, such as our burnout rate prediction task.
•
2. Code Implementation
Objective: Fit the Linear Regression model to the training data.
Code:
from sklearn.linear_model import
LinearRegression # Initialize the Linear Regression
model linear_regression_model =
LinearRegression() # Train the model on the
training dataset
linear_regression_model.fit(X_train, y_train)
print("Model Training Completed.")
This module delves into evaluating the performance of the Linear Regression model using
quantitative metrics. These metrics help us understand the accuracy, reliability, and limitations
of the model in predicting employee burnout rates.
Evaluation metrics are critical in determining the effectiveness of the model and its ability to
generalize to unseen data.
They provide insights into areas where the model performs well and where it may require
improvement.
➢ Measures the average squared difference between actual and predicted values.
➢ Interpretation: A smaller MSE value indicates better model performance.
➢ Square root of the MSE, providing the error in the same unit as the target variable.
➢ Interpretation: Easier to interpret than MSE due to its unit equivalence.
3. Mean Absolute Error (MAE):
➢ Measures the average absolute difference between actual and predicted values.
➢ Interpretation: Provides a straightforward measure of prediction error.
4. R-Squared (R²):
Indicates the proportion of variance in the target variable explained by the model.
➢ Interpretation: R2 ranges between 0 and 1.Higher values indicate that the model
explains more variability in the data.
3. Code Implementation
Objective: Calculate the evaluation metrics for the predictions made by the
model.
Code: from sklearn.metrics import mean_squared_error, mean_absolute_error,
r2_score
# Calculate metrics mse = mean_squared_error(y_test,
y_pred) rmse = mean_squared_error(y_test, y_pred,
squared=False) mae = mean_absolute_error(y_test,
y_pred) r2 = r2_score(y_test, y_pred)
# Display metrics print("Model Evaluation Metrics:")
print(f"Mean Squared Error (MSE): {mse:.4f}")
print(f"Root Mean Squared Error (RMSE):
{rmse:.4f}") print(f"Mean Absolute Error (MAE):
{mae:.4f}") print(f"R-Squared (R²): {r2:.4f}")
Module 6: Deployment and Future Enhancements
Deployment
The project can be deployed using cloud platforms like Google Colab or local
servers. Deployment steps include hosting models on the cloud, building a user
interface with Flask or Django, and integrating with HR databases for
automated data collection and analysis. Future Enhancements
1. Model Optimization: Use advanced algorithms like Random Forest
or deep learning for better accuracy.
2. Expanded Data Sources: Include additional datasets and real-time data
for improved predictions.
3. Improved User Experience: Add dashboards and multilingual support for
wider usability.
4. HR System Integration: Connect with HR tools for actionable insights.
5. Scalability: Use cloud platforms like AWS or Google Cloud for
large- scale deployment.
Analysis:
Google AI ML represents a pivotal force in modern technology, fundamentally
altering how businesses innovate and individuals interact with digital platforms.
Through a diverse array of capabilities ranging from cutting-edge machine
learning models and frameworks like TensorFlow to accessible tools such as
AutoML and Google Cloud AI APIs, Google empowers developers and
enterprises to harness the potential of artificial intelligence. The impact of Google
AI ML spans across various sectors, from healthcare and finance to e-commerce
and beyond, revolutionizing processes with applications like image analysis,
language translation, and recommendation systems. However, alongside its
transformative power, Google AI ML also presents challenges, including ethical
considerations surrounding data privacy and algorithmic bias. Yet, with ongoing
research initiatives and collaborations fueling innovation.
Coding:
import pandas as pd
from sklearn.model_selection
import train_test_split from
sklearn.preprocessing import StandardScaler
from sklearn.linear_model import
LinearRegression from sklearn.metrics
import mean_squared_error,mean_absolute_error,r2_score from matplotlib
import pyplot as plt
df=pd.read_excel('employee_burnout_analysis-AI.xlsx')
df.head()
df.tail()
df.describe()
df.columns.to_list()
df.nunique()
df.info()
df.isnull().sum()
df.corr(numeric_only=True)['Burn Rate'][:-
1] df.dropna(inplace=True)
df.drop('EmployeeID',axis='columns',inplace=True
) plt.pie(
[
df[(df['Company Type']=='Service')&(df['Gender']=='Male')].shape[0],
df[(df['CompanyType']=='Service')&(df['Gender']=='Female')].shape[0]
, df[(df['Company Type']=='product')&(df['Gender']=='Male')].shape[0],
df[(df['Company Type']=='Product')&(df['Gender']=='Female')].shape[0]
],
labels=['Male in Service based company','Female in Service based company','Male in
Product based company','Female in Product based company'],
colors=["#ffa600", "#53777a", "#c02942", "#d95b43"], )
plt.title('Share of Male and Female Working in different companys')
plt.show()
plt.pie( [ df[(df['Company Type']=='Service') & (df['Gender']=='Male') & (df['Burn
Rate']>0.5)].shape[0], df[(df['Company Type']=='Service') &
(df['Gender']=='Female') & (df['Burn
Rate']>0.5)].shape[0], df[(df['Company Type']=='product') &
(df['Gender']=='Male') & (df['Burn
Rate']>0.5)].shape[0], df[(df['Company Type']=='Product') &
(df['Gender']=='Female') & (df['Burn Rate']>0.5)].shape[0]
],
labels=['Male in Service based company','Female in Service based company','Male in
Product based company','Female in Product based company'],
colors=["#ffa600", "#53777a", "#c02942", "#d95b43"],
)
plt.title('Share Male and Female in types of companies with Burn Rate Greater than 50
percent') plt.show() if all(col in df.columns for col in ['Company Type','WFH Setup
Available','Gender']):
df=pd.get_dummies(df,columns=['Company Type', 'WFH Setup
Available','Gender'],drop_first=True )
df.head()
encoded_columns=df.columns else:
print('One or more Specified Columns are missing in the present DataFrame')
y=df['Burn Rate']
X=df.drop('Burn Rate',axis='columns')
#train-test split
X_train,X_test,y_train,
y_test=train_test_split(X,y,train_size=0.7,shuffle=True,random_state=1)
scaler=StandardScaler()
scaler.fit(X_train)
X_train=pd.DataFrame(scaler.transform(X_train),
index=X_train.index,columns=X_train.colum ns)
X_test=pd.DataFrame(scaler.transform(X_test),
index=X_test.index,columns=X_test.columns)
X_train.dropna(inplace=True)
y_train.dropna(inplace=True)
linear_regression_model=LinearRegression()
linear_regression_model.fit(X_train,y_train)
print("Model Evaluation Mterics")
y_pred=linear_regression_model.predict(X_test)
mse=mean_squared_error(y_test,y_pred)
print("Mean Squared Error:",mse,sep=' ')
rmse=mean_squared_error(y_test,y_pred,squared=False)
print("Root Mean Squared Error",rmse,sep=' ')
mae=mean_absolute_error(y_test,y_pred)
print("Mean Absolute Error",mae,sep=' ')
r2=r2_score(y_test,y_pred) print("R-squared
Error”,r2)
Conclusion:
This Internship provided a Comprehensive understanding of Machine Learning covering both
theoretical concepts and practical implementation. Through the step-by-step exploration of
data preprocessing techniques, machine learning algorithms and data visualization tools, we
successfully developed a Employee Burnout Prediction and Analysis Model capable of
predicting by how much percentage he is willing to leave the company. We have gained
hands- on experience in model construction and evaluation. The internship not only enhanced
our technical skills but also deepend our understanding of real-worls applications of Machine
Learning to handle similar challenges in Future.
Online Resources
1. Google AI website: https://ai.google/
Provides a wealth of information on AI research, tools, and
techniques developed by Google.
2. TensorFlow Documentation: https://www.tensorflow.org/
Official documentation for TensorFlow, an open-source machine
learning framework for building neural networks, which is widely used
for sentiment analysis tasks.
3. Google Cloud AI Documentation: https://cloud.google.com/ai
Information about Google’s cloud-based AI tools, including APIs for
Natural Language Processing (NLP), which can be used for sentiment
analysis.
Bibliography
1. Google AI website: https://ai.google/
2. TensorFlow documentation: https://www.tensorflow.org/ 3.
Google Cloud AI documentation: https://cloud.google.com/ai
Books:
1. "Deep Learning" by Ian Goodfellow, Yoshua Bengio, and Aaron Courville.
2. "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow"
by Aurélien Géron.
3. "Machine Learning Yearning" by Andrew Ng (available for free on
his website).
Academic Papers:
1. "TensorFlow: A System for Large-Scale Machine Learning" by
Martín Abadi et al
(https://www.tensorflow.org/about/bib)
2. "Rethinking the Inception Architecture for Computer Vision" by
Christian Szegedy et al.
3. "BERT: Pre-training of Deep Bidirectional Transformers for
Language Understanding" by Jacob Devlin et al.
Industry Reports and Articles:
1. "TheCurrentStateofMachineLearning"byForbes
(https://www.forbes.com/sites/louiscolumbus/2021/09/05/the-current-state-of-
machine-learning)
2. "The State of Artificial Intelligence in 2021" by McKinsey & Company
(https://www.mckinsey.com/featured-insights/artificial-intelligence/the-state-
of- ai-in-2021)