SUMMER TRAINING REPORT
On
DATA ANALYTICS INTERNSHIP
YBI FOUNDATION
Submitted in partial fulfillment of the requirement of
B.Tech (CSE) – Seventh Semester
Session: 2024 – 2025
Guru Gobind Singh Indraprastha University, Delhi
Under the Guidance of Submitted by
Ms. Niharika Chaudhary Richa Anand
Assistant professor B.Tech -VII Sem
Enrollment No. – 05825502721
Department of Computer Science & Engineering
JIMS ENGINEERING MANAGEMENT TECHNICAL CAMPUS
48/4 Knowledge Park III, Greater Noida-201306 (U.P.)
TABLE OF CONTENTS
S.No. CONTENTS
1. Declaration
2. Bonafide Certificate
3. Acknowledgment
4. Offer Letter
5. Completion Certificate
6. Abstract
7. Objectives
Company Profile
A. Introduction
8.
B. Activities and Features
C. Roles and Responsibilities of Individuals
9. About Internship
10. Domain: Data Analytics
Project 1: Big Sales Prediction
● About
11.
● Source Code
● Outcome
Project 2: Digit Recognition
● About
12.
● Source Code
● Outcome
Project 3: Mileage Prediction
● About
13.
● Source Code
● Outcome
Project 4: Financial Market News - Sentiment Analysis
● About
14.
● Source Code
● Outcome
15. Reflection Notes
16. Conclusion
17. Bibliography
DECLARATION
I Richa Anand, hereby declare that this Summer Training Report II submitted by me
to JEMTEC, Greater Noida is a bonafide work undertaken during the period from July,
2024 to September 2024 by me and has not been submitted to any other University or
Institution for the award of any degree/ diploma certificate or published any time
before
Name: Richa Anand Date: / / 24
Enroll. No.: 05825502721
Course: B.Tech (CSE) (VII- Sem)
BONAFIDE CERTIFICATE
This is to certify that as per best of my belief the project entitled “DATA
ANALYTICS INTERNSHIP” is the bonafide work carried out by RICHA
ANAND student of B.Tech (CSE), JEMTEC, Greater Noida, in partial fulfillment
of the requirements for the Summer Training Report II for the Degree of Bachelor
of Technology.
She has worked under my guidance.
I wish her success in all her future career endeavors.
Industry Guide
Name: Mr. Alok Yadav
Designation: Program Director
(Signature with Date)
Faculty Guide
Name: Ms. Bhumika Pal
Designation: Assistant Professor
(Signature with Date)
ACKNOWLEDGEMENT
I offer my sincere thanks and humble regards to JEMTEC, Greater Noida for
imparting us very valuable professional training in B.Tech (CSE).
I pay my gratitude and sincere regards to Ms. Niharika Chaudhary, my project
mentor, for giving me the cream of her knowledge. I am thankful to her as she has
been a constant source of advice, motivation, and inspiration. I am also thankful
to her for providing suggestions and encouragement throughout the project work.
I take the opportunity to express my gratitude and thanks to our computer Lab
staff and library staff for providing me the opportunity to utilize their resources
for the completion of the project.
I am also thankful to my family and friends for constantly motivating me to
complete the project and providing me an environment, which enhanced my
knowledge.
Name: Richa Anand
Enroll. No.: 05825502721
Course: B.Tech (CSE) (VII- Sem)
Date: / / 24
OFFER LETTER
COMPLETION CERTIFICATE
ABSTRACT
Programming is a fundamental skill expected from every software professional. This
internship project aimed to provide hands-on experience in data analytics and machine
learning, focusing on predictive analysis and data-driven decision-making. The project
introduced key concepts in data processing, data visualization, model training, and
evaluation using machine learning techniques. We worked on real-world datasets,
including the Big Sales Data and the MPG (Mileage Prediction) dataset, leveraging
Python libraries such as pandas, seaborn, and scikit-learn to implement machine
learning algorithms.
The project involved applying regression analysis techniques, data preprocessing steps,
and evaluating model performance through accuracy checks and error metrics.
Additionally, the internship offered a platform to gain practical experience in
industry-relevant technologies while improving problem-solving skills. I chose this
internship to deepen my understanding of data science, machine learning, and
predictive analytics, all of which have vast applications in various fields such as
business, healthcare, and finance.
This internship presented an opportunity to not only apply theoretical knowledge but
also gain proficiency in real-world data analytics, preparing me for future career
challenges in the ever-evolving tech landscape.
OBJECTIVES
● Internships are generally thought of to be reserved for college students looking to
gain experience in a particular field. However, a wide array of people can benefit
from Training Internships in order to receive real world experience and develop
their skills.
● An objective for this position should emphasize the skills you already possess in the
area and your interest in learning more.
● Internships are utilized in a number of different career fields, including architecture,
engineering, healthcare, economics, advertising and many more.
● Some internships are used to allow individuals to perform scientific research while
others are specifically designed to allow people to gain first-hand experience
working.
COMPANY PROFILE
A. INTRODUCTION
At YBI Foundation, they are dedicated to empowering learners by providing a one stop
learning and upskilling platform design to shape a skilled workforce. Why Bear
Foundation offers a range of online internships and certification programs in various
technical and non-technical domains.
Connect students with internship opportunities: YBI Foundation connects students and
professionals with internships in various fields, providing them the chance to gain
hands-on experience, develop skills and expand their professional network.
Experience commerce skills, and networking: Internships at YBI Foundation help
participants gain valuable industry experience, develop new skills, and build
connections that are crucial for their career growth.
Simplified internships:
Explore virtual internships in data science, artificial intelligence, machine learning,
business analytics, cloud computing, big data and python programming tailored for
focused and immersive learning experiences.
Build your network:
Ybi Foundation emphasizes networking among interns and professionals, creating
opportunities for peer-to-peer learning and collaboration in shared areas of interest.
Building student trust:
Ybi foundation strives to create a trusted platform that offers valuable and structured
learning experiences with a focus on student success.
Structured learning:
Participants enjoy growth oriented internships with the curriculum that prioritizes skill
development and practices experience, preparing them for future challenges in the
workforce.
B. ACTIVITIES AND FEATURES
Email notifications:
Ybi foundation provides email notifications about important events, including
deadlines, new assignments, and schedule changes to keep interns updated and
organized.
Simple interface:
The platform’s simple and user-friendly interface ensures a smooth navigation
experience, making learning accessible and enjoyable for everyone.
Large community:
YBI Foundation fosters a large community of learners and interns who are passionate
about similar fields, encouraging collaboration and knowledge sharing.
Deadline reminders:
YBI Foundation sends timely reminders through their Telegram channel and email,
helping interns manage their tasks effectively and meet deadlines with their complete
projects.
DOMAINS OFFERED:
● Data Science:
Learn to extract insights from structured and unstructured data using scientific
methods, programming and machine learning techniques. Data science
internships covered the entire data lifecycle from data collection to
communication of results.
● Artificial intelligence and machine learning (AIML):
Focuses on creating systems that learn from data to perform tasks without
explicit programming. Aimer internships cover a wide range of applications
including classification, clustering, and natural language processing.
● Python Programming:
Aimed at teaching programming fundamentals using python, one of the most
versatile and widely used programming languages in the tech industry.
● Business Analytics:
Equip participants with the skills to analyze business data, make strategic
decisions, and drive business growth through data driven insights.
● Cloud computing and big data:
Learn about managing, processing, and storing large data sets on cloud
platforms, focusing on practical applications in real world scenarios.
C. ROLES AND RESPONSIBILITIES OF INDIVIDUALS
YBI foundations remote internships are designed to ensure smooth onboarding and
training of interns, supported by a dedicated team.
Operation and strategy head responsible for ensuring interns face no onboarding issues,
arranging expert mentors, and conducting doubt clearing sessions.
Program goals include
● Providing opportunities for leadership growth,
● Encouraging academic achievement,
● Promoting student involvement in shared interest,
● Offering challenging real world projects,
● Enabling participants to gain practical hands-on experience.
ABOUT INTERNSHIP
Seminar/Guidance:
This internship will provide you the opportunity to work under guidance and chances to
attend several interactive sessions
Achievement Certificate:
Once you successfully complete the internship completion certificate will be awarded.
Open Work Opportunity:
Once you get the completion certificate you can also get a chance of a Multicultural
and diverse working environment available for interns.
Basic Requirements:
You’ll use your own computers with internet connections and coordinate with your
team members through Telegram and LinkedIn.
Unpaid Internship:
An unpaid internship is an opportunity for students or individuals to gain valuable work
experience in their field of interest without receiving a salary. It usually lasts for a
specific period, and the intern is expected to learn new skills and gain practical
experience under the supervision of mentors.
Flexible working hours:
Working from home during an internship requires self-discipline, time management
skills, and a strong work ethic. While the tasks assigned may be easy, it is essential to
approach them with diligence and professionalism to make the most out of the
internship experience.
Requisite task for Intern:
You are required to work on a MINIMUM of 1 PROJECT from the list of assigned
tasks provided within your internship program.
DOMAIN: DATA ANALYTICS
Data Analytics is the science of analyzing raw data to draw meaningful insights and
inform decision-making. It involves examining data sets to uncover patterns, trends,
relationships, and anomalies that help in understanding complex information. Data
analytics is widely used across various industries, including business, healthcare,
finance, marketing, and more, to make data-driven decisions.
Key Skills in Data Analytics
1. Statistical Analysis:
Understanding statistical methods and techniques is crucial in data analytics. It
helps in interpreting data, testing hypotheses, and drawing meaningful
conclusions from data sets.
2. Data Visualization:
Data visualization involves creating graphical representations of data, such as
charts, graphs, and dashboards, to communicate findings clearly. Tools like
Tableau, Power BI, and Matplotlib are commonly used to make data visually
understandable.
3. Programming Languages:
Proficiency in programming languages such as Python, R, or SQL is essential
for data manipulation, analysis, and automation. Python, for instance, offers
libraries like pandas and NumPy for data manipulation, while SQL is used for
querying and managing databases.
4. Data Wrangling:
Data wrangling is the process of cleaning, transforming, and organizing raw
data into a format suitable for analysis. This step involves handling missing
values, correcting errors, and ensuring data consistency.
5. Machine Learning:
Machine learning uses algorithms and statistical models to enable systems to
learn from data for tasks like predictive analytics, classification, clustering, and
advanced data analysis without explicit programming.
6. Problem-Solving:
Analytical skills are necessary to identify patterns, trends, and anomalies in
data. This involves understanding the problem at hand, analyzing the data, and
proposing solutions based on the findings.
7. Attention to Detail:
Being meticulous with data ensures accuracy and reliability. Small errors in data
can lead to incorrect conclusions, making attention to detail a critical skill.
8. Communication Skills:
Data analysts must be able to present complex data insights in a clear and
concise manner to stakeholders, often using visual aids to make data-driven
recommendations understandable.
9. Software Proficiency:
Familiarity with data analytics software and tools like Excel, Tableau, Power
BI, and SAS is important for performing analyses, visualizing data, and
reporting results.
10. Domain Knowledge:
Understanding the specific industry or business context where data analytics is
applied is crucial. It helps in interpreting the data correctly and making relevant
recommendations.
Applications of Data Analytics
● Business Intelligence: Used to analyze market trends, customer behavior, and
business performance to optimize operations and strategies.
● Healthcare: Helps in diagnosing diseases, predicting outbreaks, and
personalizing patient care.
● Finance: Assists in fraud detection, risk management, and investment
strategies.
● Marketing: Enables targeted advertising, customer segmentation, and
campaign effectiveness analysis.
Steps in Data Analytics:
1. Data Collection: Gathering data from various sources, including databases,
spreadsheets, and real-time feeds.
2. Data Cleaning: Removing inconsistencies, handling missing values, and
preparing data for analysis.
3. Data Exploration: Examining the data to identify patterns, correlations, and
key metrics.
4. Data Modeling: Applying statistical models or machine learning algorithms to
analyze data and make predictions.
5. Data Interpretation: Drawing insights from the analysis to inform
decision-making.
6. Data Reporting: Presenting findings through reports, visualizations, or
dashboards to communicate results to stakeholders.
PROJECT 1: BIG SALES PREDICTION
The Big Sales Prediction project is an advanced machine learning solution designed to
forecast retail sales by leveraging sophisticated data analysis and predictive modeling
techniques. By integrating complex data preprocessing, feature engineering, and
ensemble learning algorithms, the project aims to provide accurate sales estimates for
different product categories and outlet types.
Key Objectives:
● Develop a robust predictive model for retail sales
● Analyze complex relationships between product attributes and sales
● Implement advanced machine learning techniques
● Provide actionable insights for business decision-making
Technical Approach:
Utilizing Random Forest Regression and Linear Regression, the project processes and
analyzes retail sales data. The methodology includes comprehensive data
preprocessing, categorical variable encoding, feature selection, and model training. The
Random Forest algorithm captures non-linear relationships, while Linear Regression
provides a baseline comparative analysis.
Primary Challenges Addressed:
● Handling diverse and complex retail datasets
● Managing categorical and numerical variables
● Predicting sales with minimal error margins
● Providing scalable and adaptable predictive models
Practical Applications:
● Inventory management optimization
● Strategic product placement
● Sales forecasting
● Resource allocation planning
SOURCE CODE:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_percentage_error
# Load Dataset
data=pd.read_csv("https://github.com/YBI-Foundation/Dataset/raw/main/Big%20Sales
%20Data.csv", encoding='ISO-8859-1')
# Data Preprocessing
data = data.dropna()
# Encode Categorical Variables
data.replace({
'Item_Fat_Content': {'LF': 'LowFat', 'reg': 'Regular', 'lowfat': 'LowFat'},
'Item_Fat_Content': {'LowFat': 0, 'Regular': 1}
}, inplace=True)
# Prepare Features and Target
y = data['Item_Outlet_Sales']
X = data.drop(['Item_Identifier', 'Item_Outlet_Sales'], axis=1)
# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=2529,
train_size=0.7)
# Model Training and Evaluation
rf_model = RandomForestRegressor(random_state=2529)
rf_model.fit(X_train, y_train)
y_pred = rf_model.predict(X_test)
# Performance Metrics
mape = mean_absolute_percentage_error(y_test, y_pred)
accuracy = 100 - (mape * 100)
print(f"Accuracy: {accuracy:.2f}%")
OUTCOME:
PROJECT 2: DIGIT RECOGNITION
The Digit Recognition project demonstrates an advanced machine learning approach to
image classification using the scikit-learn digits dataset.By implementing Random
Forest classification, the project showcases the potential of machine learning in
automated image recognition and pattern detection.
Key Objectives:
● Develop an accurate digit recognition model
● Demonstrate image preprocessing techniques
● Implement ensemble learning for classification
● Evaluate model performance through comprehensive metrics
Technical Approach:
The project employs Random Forest Classifier to process and classify handwritten digit
images. Key steps include image data preprocessing, feature extraction through image
flattening, model training, and performance evaluation.
Primary Challenges Addressed:
● Image feature extraction
● Handling diverse handwritten digit variations
● Achieving high classification accuracy
● Implementing efficient machine learning algorithms
Practical Applications:
● Automated document processing
● Postal service digit recognition
● Handwriting recognition systems
● Optical character recognition (OCR) technologies
SOURCE CODE:
from IPython import get_ipython
from IPython.display import display
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Load the digits dataset
df = load_digits()
# Data Preprocessing
# Flatten the images
n_samples = len(df.images)
data = df.images.reshape((n_samples, -1))
# Scale the image data
data = data / 16 # Scale pixel values to the range [0, 1]
# Create training and testing split
X_train, X_test, y_train, y_test = train_test_split(data, df.target, test_size=0.2,
random_state=42)
# Random Forest Model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict on the test data
y_pred = model.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy:.2%}")
# Visualization of Predictions
def plot_predictions(X_test, y_test, y_pred, num_images=5):
# Randomly select images to display
indices = np.random.choice(len(X_test), num_images, replace=False)
# Create a figure with subplots
fig, axes = plt.subplots(1, num_images, figsize=(15, 3))
fig.suptitle('Prediction vs Actual', fontsize=16)
for i, idx in enumerate(indices):
# Reshape the flattened image back to 2D
image = X_test[idx].reshape(8, 8)
# Plot the image
axes[i].imshow(image, cmap='gray')
# Set title with prediction and actual label
if y_pred[idx] == y_test[idx]:
# Correct prediction in green
axes[i].set_title(f'Predicted: {y_pred[idx]}\nActual: {y_test[idx]}',
color='green')
else:
# Incorrect prediction in red
axes[i].set_title(f'Predicted: {y_pred[idx]}\nActual: {y_test[idx]}',
color='red')
axes[i].axis('off')
plt.tight_layout()
plt.show()
# Plot prediction comparisons
plot_predictions(X_test, y_test, y_pred)
OUTCOME:
PROJECT 3: MILEAGE PREDICTION
The Mileage Prediction project focuses on developing a regression analysis model to
estimate vehicle fuel efficiency (MPG) based on various vehicle characteristics. By
applying Linear Regression and comprehensive data analysis, the project provides
insights into factors influencing vehicle mileage.
Key Objectives:
● Predict vehicle miles per gallon (MPG)
● Analyze correlations between vehicle attributes
● Develop an accurate predictive regression model
● Provide insights into fuel efficiency factors
Technical Approach:
Utilizing Linear Regression, the project processes vehicle data including weight,
displacement, horsepower, and acceleration. The methodology involves data
exploration, correlation analysis, feature selection, model training, and performance
evaluation.
Primary Challenges Addressed:
● Identifying key factors influencing fuel efficiency
● Managing multiple numerical variables
● Developing accurate predictive models
● Handling complex automotive datasets
Practical Applications:
● Automotive industry research
● Vehicle design optimization
● Fuel efficiency analysis
● Environmental impact assessment
SOURCE CODE:
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_percentage_error
# Load Dataset
data = pd.read_csv("https://github.com/YBI-Foundation/Dataset/raw/main/MPG.csv",
encoding='ISO-8859-1')
data = data.dropna()
# Data Visualization
sns.pairplot(data, x_vars=['weight', 'displacement', 'acceleration', 'mpg', 'horsepower'],
y_vars=['mpg'])
# Prepare Features and Target
y = data['mpg']
X = data[['weight', 'displacement', 'horsepower', 'acceleration']]
# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=2529,
train_size=0.7)
# Model Training
model = LinearRegression()
model.fit(X_train, y_train)
# Prediction and Evaluation
y_pred = model.predict(X_test)
mape = mean_absolute_percentage_error(y_test, y_pred)
accuracy = 100 - (mape * 100)
print(f"Model Accuracy: {accuracy:.2f}%")
OUTCOME:
PROJECT 4: FINANCIAL MARKET NEWS SENTIMENT ANALYSIS
The Financial Market News Sentiment Analysis project implements a machine learning
approach to classify and predict sentiment in financial news articles. By utilizing
Random Forest Classification and text vectorization techniques, the project
demonstrates advanced natural language processing capabilities.
Key Objectives:
● Analyze sentiment in financial market news
● Develop an accurate text classification model
● Implement advanced feature extraction techniques
● Provide insights into market sentiment prediction
Technical Approach:
The project uses Count Vectorization to transform text data and Random Forest
Classifier for sentiment prediction. Key steps include text preprocessing, feature
extraction, model training, and comprehensive performance evaluation.
Primary Challenges Addressed:
● Processing unstructured textual data
● Capturing sentiment nuances
● Implementing efficient text classification
● Handling complex financial language
Practical Applications:
● Financial market analysis
● Investment decision support
● Sentiment-based trading strategies
● Media and news sentiment tracking
SOURCE CODE:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
# Load Dataset
data=pd.read_csv("https://github.com/YBI-Foundation/Dataset/raw/main/Financial%20
Market%20News.csv", encoding='ISO-8859-1')
# Prepare Text Features
news = [' '.join(str(x) for x in data.iloc[row, 2:27]) for row in range(len(data.index))]
# Vectorize Text
vectorizer = CountVectorizer(lowercase=True, ngram_range=(1,1))
X = vectorizer.fit_transform(news)
y = data['Label']
# Train-Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=2529,
train_size=0.8, stratify=y)
# Model Training
model = RandomForestClassifier(n_estimators=200)
model.fit(X_train, y_train)
# Prediction and Evaluation
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
OUTCOME:
REFLECTION NOTES
Based on the internship experience, YBI Foundation provides an exceptional learning
and upskilling platform with a highly supportive work culture. The staff, including
approachable and knowledgeable mentors, fosters a positive environment that
encourages growth and skill development. The training sessions are well-organized,
and the internship structure provides a clear path for building practical expertise in
real-world applications.
Interns at YBI Foundation are supported through effective communication channels,
with inquiries and concerns addressed promptly via email and the Telegram channel.
This setup ensures a smooth learning experience, allowing interns to focus on skill
acquisition and project completion without hurdles.
In summary, YBI Foundation is an outstanding platform for learners seeking to enhance
their knowledge, build relevant skills, and gain valuable hands-on experience. Being
part of such a well-regarded organization has been an incredibly rewarding experience,
providing both personal and professional growth.
Technical Outcomes:
Hardware Requirements:
● Processor: x86 or x64
● Hard Disk: 256 GB or more
● RAM: 4 GB or more
Operating System:
● Windows 10 or Windows 11
Tools Used:
● Google Collab
● VS Code
● Python Libraries (Pandas, Seaborn, Matplotlib, Scikit-Learn)
CONCLUSION
I have gained proficiency in key technologies essential for data analysis, predictive
modeling, and sentiment analysis. My primary focus was on developing data-driven
solutions across various projects, such as sales prediction, digit recognition, and
financial sentiment analysis. Besides the technical learning, this internship provided me
with the opportunity to expand my professional network through meaningful
connections on LinkedIn.
Throughout this internship, I acquired valuable knowledge and hands-on experience,
which I am confident will be beneficial in my future career. While I faced challenges in
certain complex areas, overcoming these difficulties has given me invaluable insights
and resilience in problem-solving.
This experience has not only helped me refine my existing skills but also allowed me to
acquire new competencies in machine learning, data preprocessing, and visualization. I
gained practical experience in using analytical tools, conducting research, and applying
theoretical knowledge to real-world data.
In essence, this internship has transformed academic knowledge into practical
expertise, enriching my professional foundation. Overall, the diverse experiences and
projects during this internship have significantly contributed to my professional growth
and readiness for future endeavors.
BIBLIOGRAPHY
● www.google.com
● https://www.ybifoundation.org/
● YBI Foundation. (n.d.). Big Sales Data [Data set]. Retrieved from -
https://github.com/YBI-Foundation/Dataset/raw/main/Big_Sales_Data.csv
● YBI Foundation. (n.d.). MPG Data for Mileage Prediction [Data set]. Retrieved
from https://github.com/YBI-Foundation/Dataset/raw/main/MPG.csv
● Python Software Foundation. (n.d.). Pandas Documentation. Retrieved from
https://pandas.pydata.org/docs/
● DataCamp. (n.d.). Learning Data Science and Machine Learning. Retrieved
from https://datacamp.com
● YBI Foundation. (n.d.). Internship Programs Overview. Retrieved from YBI
Foundation internal training materials and program guide.