0% found this document useful (0 votes)

23 views41 pages

My Internship Document

The Summer Internship Report by Shaik Shahid focuses on the training received in Data Science using Python at R.V.R & J.C. College of Engineering. It outlines the objectives, processes, and learnings from the internship, including data science concepts, Python programming, and statistical analysis. The report also highlights the importance of data science in various applications such as predictive modeling, recommendation systems, and big data analysis.

Uploaded by

shahidss743886

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views41 pages

My Internship Document

Uploaded by

shahidss743886

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 41

Summer Internship Report

“DATA SCIENCE”

IT 353 – SUMMER INTERNSHIP

Submitted By

SHAIK SHAHID(Y21IT117)

NOVEMBER-2023

R.V.R & J.C. COLLEGE OF ENGINEERING (Autonomous)

NAAC A+ Grade, NBA Accredited (Approved by A.I.C.T.E)

(AFFILIATED TO ACHARYA NAGARJUNA UNIVERSITY)

DEPARTMENT OF INFORMATION TECHNOLOGY
R.V.R & J.C COLLEGE OF ENGINEERING (Autonomous)
GUNTUR-522019

R.V.R & J.C COLLEGE OF ENGINEERING

i
DEPARTMENT OF INFORMATION TECHNOLOGY

BONAFIDE CERTIFICATE

This is to certify that this internship report “DATA SCIENCE USING PYTHON”

is the Bonafede work of “SHAIK SHAHID (Y21IT117)” who have carried out the
work under my supervision and submitted in partial fulfillment for the award of
INTERNSHIP (IT-353) during the year 2023-2024.

Smt. B. Manasa Dr. A. SriKrishna

Assistant Professor, IT Prof.& HOD, Department of IT
INTERNSHIP CERTIFICATION

ii
INDEX
Sr. Topic Page No.
No.
1 Title 1
2 Index 3
3 Training Certificate 4
4 Declaration 6
5 Acknowledgement 7
6 About Training 8
7 About Internshala 8
8 Objectives 8
9 Data Science 8-10
10 My Learnings 10-25
11 Final Project 26-32
12 Reason for choosing Data Science 33
13 Learning Outcome 34
14 Scope in Data Science 35
15 Results 36

iii
The successful completion of any task would be incomplete without proper
suggestions, guidance, and environment. The combination of these three factors acts like
backbone to our internship “DATA SCIENEC USING PYTHON”.
iv
I would like to express our gratitude to the Management of R.V.R&J.C.
COLLEGE OF ENGINEERING for providing us with a pleasant environment and
excellent lab facility.

I regard my sincere thanks to our Principal, Dr. Kolla Srinivas, for providing
support and a stimulating environment.

I am indebted to Dr. A. Srikrishna, Professor, and Head of the Department

Information Technology, for her valuable suggestions during the internship.

I would like to express our special thanks to our guide Smt. B. Manasa,
Assistant Professor who helped us in doing the internship successfully.

SHAIK SHAHID
(Y21IT117)

v
DECLARATION

I hereby certify that the work which is being presented in the report entitled
“Data Science” in fulfilment of the requirement for completion of industrial
training in Department of

SHAIK SHAHID
Y21IT117

vi
ACKNOWLEDGEMENT

The work in this report is an outcome of continuous work over a period and drew
intellectual support from Internshala and other sources. I would like to articulate
our profound gratitude and indebtedness to Internshala helped us in completion of
the training. I am thankful to Internshala Training Associates for teaching and
assisting me in making the training successful.

SHAIK SHAHID
Y21IT117

1. ABOUT TRAINING

o NAME OF TRAINING: DATA SCIENCE

o HOSTING INSTITUTION: INTERNSHALA
vii
2. ABOUT INTERNSHALA
Internshala is an internship and online training platform, based in Gurgaon, India. Founded in
2011 by Sarvesh Agrawal, an IIT Madras alumni. The site offers searching and posting
internships, and other career services such as counselling, cover-letter writing, resume building
and training programs to students.

3. OBJECTIVES
To explore, sort and analyse mega data from various sources to take advantage of them and
reach conclusions to optimize business processes and for decision support.
Examples include machine maintenance or (predictive maintenance), in the fields of marketing
and sales with sales forecasting based on weather.

4. DATA SCIENCE
Data Science as a multi-disciplinary subject that uses mathematics, statistics, and computer
science to study and evaluate data. The key objective of Data Science is to extract valuable
information for use in strategic decision making, product development, trend analysis, and
forecasting.
Data Science concepts and processes are mostly derived from data engineering, statistics,
programming, social engineering, data warehousing, machine learning, and natural language
processing. The key techniques in use are data mining, big data analysis, data extraction and
data retrieval.
Data science is the field of study that combines domain expertise, programming skills, and
knowledge of mathematics and statistics to extract meaningful insights from data. Data science
practitioners apply machine learning algorithms to numbers, text, images, video, audio, and
more to produce artificial intelligence (AI) systems to perform tasks that ordinarily require
human intelligence. In turn, these systems generate insights which analysts and business users
can translate into tangible business value.

DATA SCIENCE PROCESS:

1. The first step of this process is setting a research goal. The main purpose here is making
sure all the stakeholders understand the what, how, and why of the project.

viii
2. The second phase is data retrieval. You want to have data available for analysis, so this
step includes finding suitable data and getting access to the data from the data owner.
The result is data in its raw form, which probably needs polishing and transformation
before it becomes usable.
3. Now that you have the raw data, it’s time to prepare it. This includes transforming the data
from a raw form into data that’s directly usable in your models. To achieve this, you’ll
detect and correct different kinds of errors in the data, combine data from different data
sources, and transform it. If you have successfully completed this step, you can progress
to data visualization and modeling.
4. The fourth step is data exploration. The goal of this step is to gain a deep understanding
of the data. You’ll look for patterns, correlations, and deviations based on visual and
descriptive techniques. The insights you gain from this phase will enable you to start
modeling.
5. Finally, we get to the sexiest part: model building (often referred to as “data modeling”
throughout this book). It is now that you attempt to gain the insights or make the
predictions stated in your project charter. Now is the time to bring out the heavy guns,
but remember research has taught us that often (but not always) a combination of simple
models tends to outperform one complicated model. If you’ve done this phase right,
you’re almost done.
6. The last step of the data science model is presenting your results and automating the
analysis, if needed. One goal of a project is to change a process and/or make better
decisions. You may still need to convince the business that your findings will indeed
change the business process as expected. This is where you can shine in your influencer
role. The importance of this step is more apparent in projects on a strategic and tactical
level. Certain projects require you to perform the business process over and over again,
so automating the project will save time.

ix
5. MY LEARNINGS
1) INTRODUCTION TO DATA SCIENCE
• Overview & Terminologies in Data Science •
Applications of Data Science
➢ Unfamiliar detection (fraud, disease, etc.)
➢ Automation and decision-making (credit worthiness, etc.)
➢ Classifications (classifying emails as “important” or “junk”)
➢ Forecasting (sales, revenue, etc.)
➢ Pattern detection (weather patterns, financial market patterns, etc.)
➢ Recognition (facial, voice, text, etc.)
➢ Recommendations (based on learned preferences, recommendation engines can
refer you to movies, restaurants and books you may like)

2) PYTHON FOR DATA SCIENCE

Introduction to Python, Understanding Operators, Variables and Data Types, Conditional
Statements, Looping Constructs, Functions, Data Structure, Lists, Dictionaries, Understanding
Standard Libraries in Python, reading a CSV File in Python, Data Frames and basic operations
with Data Frames, Indexing Data Frame.

3) UNDERSTANDING THE STATISTICS FOR DATA SCIENCE

Introduction to Statistics, Measures of Central Tendency, Understanding the spread of data,
Data Distribution, Introduction to Probability, Probabilities of Discrete and Continuous Variables,
Normal Distribution, Introduction to Inferential Statistics, Understanding the Confidence Interval
and margin of error, Hypothesis Testing, Various Tests, Correlation.
4) PREDICTIVE MODELING AND BASICS OF MACHINE LEARNING
Introduction to Predictive Modeling, Types and Stages of Predictive Models, Hypothesis
Generation, Data Extraction and Exploration, Variable Identification, Univariate Analysis for
Continuous Variables and Categorical Variables, Bivariate Analysis, Treating Missing Values and
Outliers, Transforming the Variables, Basics of Model Building, Linear and Logistic Regression,
Decision Trees, K-means Algorithms in Python.
Summary of Procedure of Analyzing Data:
Data science generally has a five-stage life cycle that consists of:

x
• Capture: data entry, signal reception, data extraction
• Maintain: Data cleansing, data staging, data processing.
• Process: Data mining, clustering/classification, data modelling
• Communicate: Data reporting, data visualization
• Analyse: Predictive analysis, regression

Introduction to Data Science

Data Science

The field of bringing insights from data using scientific techniques is called data science.

Applications

Amazon Go – No checkout lines

Computer Vision - The advancement in recognizing an image by a computer involves

processing large sets of image data from multiple objects of same category. For example,
Face recognition.

xi
Spectrum of Business Analysis

What can happen?

Given data is
collected and used.

Big Data

What is likely to
Complexity happen?

Predictive Analysis

What’s happening
now?

Dashboards

Why did it
happen?

Detective Analysis

What happened?

Reporting

Value added to organization

• Reporting / Management Information System

• To track what is happening in organization.
• Detective Analysis
• Asking questions based on data we are seeing, like. Why something happened?
• Dashboard / Business Intelligence
• Utopia of reporting. Every action about business is reflected in front of screen.
• Predictive Modelling
• Using past data to predict what is happening at granular level.

xii
Big Data

 Stage where complexity of handling data gets beyond the traditional system.
 Can be caused because of volume, variety or velocity of data. Use specific tools to
analyse such scale data.

Application of Data Science

 Recommendation System
Example-In Amazon recommendations are different for different users according to their past
search.

• Social Media
1. Recommendation Engine

2. Ad placement

3. Sentiment Analysis

• Deciding the right credit limit for credit card customers.

• Suggesting right products from e-commerce companies
1. Recommendation System

2. Past Data Searched

3. Discount Price Optimization

• How google and other search engines know what are the more relevant results for our
search query?
1. Apply ML and Data Science

2. Fraud Detection

3. AD placement

4. Personalized search results

xiii
Python Introduction
Python is an interpreted, high-level, general-purpose programming language. It has efficient
high-level data structures and a simple but effective approach to object-oriented programming.
Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an
ideal language for scripting and rapid application development in many areas on most
platforms.

Python for Data science:

Why Python???

1. Python is an open source language.

2. Syntax as simple as English.

3. Very large and Collaborative developer community.

4. Extensive Packages.

• UNDERSTANDING OPERATORS:
Theory of operators: - Operators are symbolic representation of Mathematical tasks.
• VARIABLES AND DATATYPES:
Variables are named bounded to objects. Data types in python are int (Integer), Float,
Boolean and strings.
• CONDITIONAL STATEMENTS:
If-else statements (Single condition)
If- elif- else statements (Multiple Condition)
• LOOPING CONSTRUCTS:
For loop

xiv
• FUNCTIONS:
Functions are re-usable piece of code. Created for solving specific problem.
Two types: Built-in functions and User- defined functions.
Functions cannot be reused in python.
• DATA STRUCTURES:

Two types of Data structures:

LISTS: A list is an ordered data structure with elements separated by comma and

enclosed within square brackets.

DICTIONARY: A dictionary is an unordered data structure with elements separated by

comma and stored as key: value pair, enclosed with curly braces {}.

Statistics
Descriptive Statistic
Mode
It is a number which occurs most frequently in the data series.
It is robust and is not generally affected much by addition of couple of
new values. Code import pandas as pd data=pd.read_csv( "Mode.csv")
//reads data from csv file
data.head() //print first five lines
mode_data=data['Subject'].mode() //to take mode of
subject column print(mode_data) Mean
import pandas as pd data=pd.read_csv( "mean.csv")
//reads data from csv file
data.head() //print first five lines
mean_data=data[Overallmarks].mean() //to take mode of subject column
print(mean_data)

xv
Median
Absolute central value of data set.
import pandas as pd data=pd.read_csv( "data.csv")
//reads data from csv file
data.head() //print first five lines
median_data=data[Overallmarks].median() //to take mode of subject column
print(median_data)
Types of variables
• Continous – Which takes continuous numeric values. Eg-marks
• Categorial-Which have discrete values. Eg- Gender
• Ordinal – Ordered categorial variables. Eg- Teacher feedback
• Nominal – Unorderd categorial variable. Eg- Gender

Outliers
Any value which will fall outside the range of the data is termed as a outlier. Eg- 9700 instead
of 97.

Reasons of Outliers
• Types-During collection. Eg-adding extra zero by mistake.
• Measurement Error-Outliers in data due to measurement operator being faulty.
• Intentional Error-Errors which are induced intentionally. Eg-claiming smaller amount of
alcohol consumed then actual.
• Legit Outlier—These are values which are not actually errors but in data due to
legitimate reasons.
Eg - a CEO’s salary might actually be high as compared to other employees.
Interquartile Range (IQR)
Is difference between third and first quartile from last. It is robust to outliers.

xvi
Histograms
Histograms depict the underlying frequency of a set of discrete or continuous data that are
measured on an interval scale.
import pandas as pd
histogram=pd.read_csv(histogram.csv)
import matplotlib.pyplot as plt
%matplot inline plt.hist(x= 'Overall
Marks',data=histogram) plt.show()

Inferential Statistics
Inferential statistics allows to make inferences about the population from the sample data.

Hypothesis Testing
Hypothesis testing is a kind of statistical inference that involves asking a question, collecting
data, and then examining what the data tells us about how to proceed. The hypothesis to be
tested is called the null hypothesis and given the symbol Ho. We test the null hypothesis
against an alternative hypothesis, which is given the symbol Ha.

T Tests
When we have just a sample not population statistics.
Use sample standard deviation to estimate population standard deviation.
T test is more prone to errors, because we just have samples.

xvii
Z Score
The distance in terms of number of standard deviations, the observed value is away from
mean, is standard score or z score.

+Z – value is above mean.

-Z – value is below mean.
The distribution once converted to z- score is always same as that of shape of original

distribution.

Chi Squared Test

To test categorical variables.
Correlation
Determine the relationship between two variables.
It is denoted by r. The value ranges from -1 to +1. Hence, 0 means no
relation. Syntax import pandas as pd
import numpy as np
data=pd.read_csv("data.csv")
data.corr()

Predictive Modelling

Making use of past data and attributes we predict future using this
data. Eg-
Past Horror Movies
Future Unwatched Horror Movies

xviii
Predicting stock price movement
1. Analysing past stock prices.

2. Analysing similar stocks.

3. Future stock price required.

Types
1. Supervised Learning
Supervised learning is a type algorithm that uses a known dataset (called the training
dataset) to make predictions. The training dataset includes input data and response
values.
• Regression-which have continuous possible values. Eg-Marks
• Classification-which have only two values. Eg-Cancer prediction is either 0 or 1.
2. Unsupervised Learning
Unsupervised learning is the training of machine using information that is neither
classified nor. Here the task of machine is to group unsorted information according to
similarities, patterns and differences without any prior training of data.
• Clustering: A clustering problem is where you want to discover the inherent
groupings in the data, such as grouping customers by purchasing behaviour.
• Association: An association rule learning problem is where you want to discover
rules that describe large portions of your data, such as people that buy X also tend
to buy Y.

Stages of Predictive Modelling

1. Problem definition

2. Hypothesis Generation

3. Data Extraction/Collection

4. Data Exploration and Transformation

5. Predictive Modelling

6. Model Development/Implementation

xix
Problem Definition
Identify the right problem statement, ideally formulate the problem mathematically.

Hypothesis Generation

List down all possible variables, which might influence problem objective. These variables
should be free from personal bias and preferences.
Quality of model is directly proportional to quality of hypothesis.
Data Extraction/Collection
Collect data from different sources and combine those for exploration and model building.
While looking at data we might come across new hypothesis.
Data Exploration and Transformation
Data extraction is a process that involves retrieval of data from various sources for further data
processing or data storage.
Steps of Data
Extraction • Reading
the data Eg- From csv
file
• Variable identification
• Univariate Analysis
• Bivariate Analysis
• Missing value treatment
• Outlier treatment
• Variable Transformation

Variable Treatment
It is the process of identifying whether variable is
1. Independent or dependent variable
xx
2. Continuous or categorical variable

Why do we perform variable identification?

1. Techniques like supervised learning require identification of dependent variable.

2. Different data processing techniques for categorical and continuous data.

Categorical variable- Stored as object.

Continuous variable-Stored as int or float.
Univariate Analysis
1. Explore one variable at a time.

2. Summarize the variable.

3. Make sense out of that summary to discover insights, anomalies, etc.

Bivariate Analysis
• When two variables are studied together for their empirical relationship.
• When you want to see whether the two variables are associated with each other.
• It helps in prediction and detecting anomalies.
Missing Value Treatment
Reasons of missing value
1. Non-response – Eg-when you collect data on people’s income and many choose not to
answer. 2. Error in data collection. Eg- Faculty data
3. Error in data reading.
Types
1. MCAR (Missing completely at random): Missing values have no relation to the variable in

which missing value exist and other variables in dataset.

2. MAR (Missing at random): Missing values have no relation to the in which missing value

exist and the variables other than the variables in which missing values exist.
3. MNAR (Missing not at random): Missing values have relation to the variable in which

missing value exists

Identifying
Syntax: - 1.
describe()

xxi
2. Isnull()
Output will we in True or False
Different methods to deal with missing values
1. Imputation

Continuous-Impute with help of mean, median or regression mode.

Categorical-With mode, classification model.
2. Deletion

Row wise or column wise deletion. But it leads to loss of data.

Outlier Treatment
Reasons of Outliers
1. Data entry Errors

2. Measurement Errors

3. Processing Errors

4. Change in underlying population

Types of Outlier
Univariate
Analysing only one variable for outlier.
Eg – In box plot of height and weight.
Weight will we analysed for outlier

Bivariate
Analysing both variables for outlier.
Eg- In scatter plot graph of height and weight. Both will we analysed.

Identifying Outlier
xxii
Graphical Method
• Box Plot

 Scatter Plot

Formula Method
Using Box Plot
< Q1 - 1.5 * IQR or > Q3+1.5 * IQR
Where IQR= Q3 – Q1
Q3=Value of 3rd quartile
Q1=Value of 1st quartile
Treating Outlier
1. Deleting observations
2. Transforming and binning values
3. Imputing outliers like missing values

xxiii
4. Treat them as separate Variable Transformation Is the process by which-
1. We replace a variable with some function of that variable. Eg – Replacing a
variable x with its log.
2. We change the distribution or relationship of a variable with others. Used to

–
1. Change the scale of a variable

2. Transforming non linear relationships into linear relationship

3. Creating symmetric distribution from skewed distribution.

Common methods of Variable Transformation – Logarithm, Square root, Cube root, Binning,
etc.

Model Building

It is a process to create a mathematical model for estimating / predicting the future based on
past data.
Eg-
A retail wants to know the default behaviour of its credit card customers. They want to predict
the probability of default for each customer in next three months.
• Probability of default would lie between 0 and 1.
• Assume every customer has a 10% default rate.
Probability of default for each customer in next 3 months=0.1
It moves the probability towards one of the extremes based on attributes of past information.
A customer with volatile income is more likely (closer to) to default.
A customer with healthy credit history for last years has low chances of default (closer to 0).

Steps in Model Building

1. Algorithm Selection

2. Training Model

3. Prediction / Scoring
xxiv
Algorithm Selection
Example-

Eg- Predict the customer will buy product or not.

Algorithms
• Logistic Regression
• Decision Tree
• Random Forest

Training Model

It is a process to learn relationship / correlation between independent and dependent

variables.
We use dependent variable of train data set to
predict/estimate. Dataset

xxv
• Train
Past data (known dependent variable).
Used to train model.

• Test
Future data (unknown dependent
variable) Used to score.

Prediction / Scoring
It is the process to estimate/predict dependent variable of train data set by applying model
rules.
We apply training learning to test data set for prediction/estimation.

Algorithm of Machine Learning

Linear Regression
Linear regression is a statistical approach for modelling relationship between a dependent
variable with a given set of independent variables.
It is assumed that the wo variables are linearly related. Hence, we try to find a linear function.
That predicts the response value(y) as accurately as possible as a function of the feature or
independent variable(x).
xxvi
The equation of regression line is
Y-Values
14 represented as:
12

6 The squared error or cost function, J as:

0
0 1 2 3 4 5 6 7 8 9

Logistic Regression

Logistic regression is a statistical model that in its basic form uses a logistic function to model a
binary dependent variable, although many more complex extensions exist.

C = -y (log(y) – (1-y) log(1-y))

K-Means Clustering (Unsupervised learning)

K-means clustering is a type of unsupervised learning, which is used when you have unlabelled
data (i.e., data without defined categories or groups). The goal of this algorithm is to find
groups in the data, with the number of groups represented by the variable K. The algorithm

xxvii
works iteratively to assign each data point to one of K groups based on the features that are
provided. Data points are clustered based on feature similarity.

6. FINAL PROJECT
PREDICTING IF CUSTOMER BUYS TERM DEPOSIT
Problem Statement:

Your client is a retail banking institution. Term deposits are a major source of income for a bank.
A term deposit is a cash investment held at a financial institution. Your money is invested for
an agreed rate of interest over a fixed amount of time, or term. The bank has various outreach

xxviii
plans to sell term deposits to their customers such as email marketing, advertisements,
telephonic marketing and digital marketing.
Telephonic marketing campaigns still remain one of the most effective ways to reach out to
people. However, they require huge investment as large call centers are hired to actually
execute these campaigns. Hence, it is crucial to identify the customers most likely to convert
beforehand so that they can be specifically targeted via call.
You are provided with the client data such as: age of the client, their job type, their marital
status, etc. Along with the client data, you are also provided with the information of the call such
as the duration of the call, day and month of the call, etc. Given this information, your task is to
predict if the client will subscribe to term deposit.

Data Dictionary: -

xxix
Prerequisites:
We have the following files:
• train.csv: This dataset will be used to train the model. This file contains all the client and
call details as well as the target variable “subscribed”.
• test.csv: The trained model will be used to predict whether a new set of clients will
subscribe the term deposit or not for this dataset. TEST.csv file: -

xxx
TRAIN.csv file: -

xxxi
Problem Description
Provided with following files: train.csv and test.csv.
Use train.csv dataset to train the model. This file contains all the client and call details as well
as the target variable “subscribed”. Then use the trained model to predict whether a new set
of clients will subscribe the term deposit.

xxxii
xxxiii
xxxiv
xxxv
Reason for choosing data science

Data Science has become a revolutionary technology that everyone seems to talk about.
Hailed as the ‘sexiest job of the 21st century’. Data Science is a buzzword with very few people
knowing about the technology in its true sense.
While many people wish to become Data Scientists, it is essential to weigh the pros and cons of
data science and give out a real picture. In this article, we will discuss these points in detail and
provide you with the necessary insights about Data Science.

Advantages: - 1.
It’s in Demand
2. Abundance of Positions

3. A Highly Paid Career

4. Data Science is Versatile

Disadvantages: -
1. Mastering Data Science is near to impossible

2. A large Amount of Domain Knowledge Required

3. Arbitrary Data May Yield Unexpected Results

The problem of Data Privacy

xxxvi
Learning Outcome
After completing the training, I am able to:
• Develop relevant programming abilities.
• Demonstrate proficiency with statistical analysis of data.
• Develop the skill to build and assess data-based model.
• Execute statistical analysis with professional statistical software.
• Demonstrate skill in data management.
• Apply data science concepts and methods to solve problem in real-world contexts and
will communicate these solutions effectively.

xxxvii
7. SCOPE IN DATA SCIENCE FIELD
Few factors that point out to data science’s future, demonstrating compelling reasons why it is
crucial to today’s business needs are listed below:

• Companies’ Inability to handle data

Data is being regularly collected by businesses and companies for transactions and through
website interactions. Many companies face a common challenge – to analyze and categorize the
data that is collected and stored. A data scientist becomes the savior in a situation of mayhem
like this. Companies can progress a lot with proper and efficient handling of data, which results
in productivity.

• Revised Data Privacy Regulations

Countries of the European Union witnessed the passing of the General Data Protection
Regulation (GDPR) in May 2018. A similar regulation for data protection will be passed by
California in 2020. This will create co-dependency between companies and data scientists for
the need of storing data adequately and responsibly. In today’s times, people are generally more
cautious and alert about sharing data to businesses and giving up a certain amount of control to
them, as there is rising awareness about data breaches and their malefic consequences.
Companies can no longer afford to be careless and irresponsible about their data. The GDPR will
ensure some amount of data privacy in the coming future.

• Data Science is constantly evolving

Career areas that do not carry any growth potential in them run the risk of stagnating. This
indicates that the respective fields need to constantly evolve and undergo a change for
opportunities to arise and flourish in the industry. Data science is a broad career path that is
undergoing developments and thus promises abundant opportunities in the future. Data
science job roles are likely to get more specific, which in turn will lead to specializations in the
field. People inclined towards this stream can exploit their opportunities and pursue what suits
them best through these specifications and specializations.

• An astonishing incline in data growth

xxxviii
Data is generated by everyone on a daily basis with and without our notice. The interaction we
have with data daily will only keep increasing as time passes. In addition, the amount of data
existing in the world will increase at lightning speed. As data production will be on the rise, the
demand for data scientists will be crucial to help enterprises use and manage it well.

• Virtual Reality will be friendlier

In today’s world, we can witness and are in fact witnessing how Artificial Intelligence is
spreading across the globe and companies’ reliance on it. Big data prospects with its current
innovations will flourish more with advanced concepts like Deep Learning and neural
networking. Currently, machine learning is being introduced and implemented in almost every
application. Virtual Reality (VR) and Augmented Reality (AR) are undergoing monumental
modifications too. In addition, human and machine interaction, as well as dependency, is likely
to improve and increase drastically.

• Blockchain updating with Data science

The main popular technology dealing with cryptocurrencies like Bitcoin is referred to as
Blockchain. Data security will live true to its function in this aspect as the detailed transactions
will be secured and made note of. If big data flourishes, then Iot will witness growth too and
gain popularity. Edge computing will be responsible for dealing with data issues and address
them.

8. RESULTS
In this complete 6 weeks training I successfully learnt about DATA SCIENCE. Also, now I’m
able to perform data analysis using python. I also attempted various quizzes and assignments
provided for periodic evaluation during 6 weeks and completed this training with 100% score in
Final Test.

xxxix
9. TRAINING CERTIFICATE

***

xl
xli

Data Science Report - Compress
No ratings yet
Data Science Report - Compress
31 pages
Harsh Synopsis
No ratings yet
Harsh Synopsis
21 pages
Internship
No ratings yet
Internship
28 pages
Internship Report
No ratings yet
Internship Report
9 pages
Final Industrial Report
No ratings yet
Final Industrial Report
34 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Data-Science-Report - Priyesh
No ratings yet
Data-Science-Report - Priyesh
32 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
School of Engineering and Technology: Data Science"
No ratings yet
School of Engineering and Technology: Data Science"
18 pages
File
No ratings yet
File
27 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
Manoj Intern Data Science
No ratings yet
Manoj Intern Data Science
37 pages
C0 Report
No ratings yet
C0 Report
50 pages
Internship Report: T.J.Instituteoftechnology
No ratings yet
Internship Report: T.J.Instituteoftechnology
29 pages
Skill Report
No ratings yet
Skill Report
36 pages
PDF 1
No ratings yet
PDF 1
20 pages
Data
No ratings yet
Data
36 pages
Data Science Training Insights
No ratings yet
Data Science Training Insights
32 pages
Training Report On Data Sciencep
No ratings yet
Training Report On Data Sciencep
80 pages
Data Science Training Report.
100% (1)
Data Science Training Report.
73 pages
Dsa Report
No ratings yet
Dsa Report
24 pages
E.venkatasai Ir
No ratings yet
E.venkatasai Ir
204 pages
Data Science Internship Report
No ratings yet
Data Science Internship Report
40 pages
Full Data Science Internship Report
No ratings yet
Full Data Science Internship Report
15 pages
Internshala Summer Training Report On Data Science
No ratings yet
Internshala Summer Training Report On Data Science
70 pages
It Report
No ratings yet
It Report
24 pages
Ap Internship Last
No ratings yet
Ap Internship Last
30 pages
Internshala Summer Training Report On Data Science
77% (22)
Internshala Summer Training Report On Data Science
70 pages
Data Science Training Report 2023
No ratings yet
Data Science Training Report 2023
32 pages
Zidio Development
No ratings yet
Zidio Development
36 pages
Internship Report
No ratings yet
Internship Report
61 pages
Seminar Report Maddu Ravindra 19103335 - Ravindra Babu
No ratings yet
Seminar Report Maddu Ravindra 19103335 - Ravindra Babu
21 pages
Data Science and Big Data by IBM CE Allsoft Summer Training Final Report
100% (1)
Data Science and Big Data by IBM CE Allsoft Summer Training Final Report
41 pages
Naveen Python - For - Data-Science-Report
100% (1)
Naveen Python - For - Data-Science-Report
24 pages
Inbound 8735677009415310819
No ratings yet
Inbound 8735677009415310819
17 pages
An Industrial Training Report On Data Science
No ratings yet
An Industrial Training Report On Data Science
36 pages
Sushil 7th (1 PDF
No ratings yet
Sushil 7th (1 PDF
29 pages
Bca Ctis Sem-5 Introduction To Data Science
No ratings yet
Bca Ctis Sem-5 Introduction To Data Science
14 pages
5th Sem Internship Eport
No ratings yet
5th Sem Internship Eport
83 pages
Internship Report
No ratings yet
Internship Report
73 pages
Data Science Intern
No ratings yet
Data Science Intern
19 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
25 pages
Anjali It Presentation 2024
No ratings yet
Anjali It Presentation 2024
25 pages
Data Science-Logbook
No ratings yet
Data Science-Logbook
101 pages
Data Science Lifecycle Explained
No ratings yet
Data Science Lifecycle Explained
9 pages
File of ML
No ratings yet
File of ML
42 pages
Ayush Cse Synopsis2
No ratings yet
Ayush Cse Synopsis2
11 pages
Himadev
No ratings yet
Himadev
37 pages
Unit I - Notes
No ratings yet
Unit I - Notes
15 pages
Sindhu Internship Report
No ratings yet
Sindhu Internship Report
38 pages
Altair Data Science Internship Report
No ratings yet
Altair Data Science Internship Report
47 pages
Data Science Report
No ratings yet
Data Science Report
46 pages
Editor Document
No ratings yet
Editor Document
27 pages
Sameer111 PDF
No ratings yet
Sameer111 PDF
20 pages
DATA SCIENCE Basics
No ratings yet
DATA SCIENCE Basics
6 pages
BBA Info System Lab Guide
No ratings yet
BBA Info System Lab Guide
21 pages
3 Researches
No ratings yet
3 Researches
9 pages
Odam Buma: Dao Ll2O22
No ratings yet
Odam Buma: Dao Ll2O22
18 pages
Bayaua, Loui Mark B. AC 301 Asset Management System
No ratings yet
Bayaua, Loui Mark B. AC 301 Asset Management System
2 pages
2021 KASSU COMPUTER STUDIES P2 QNS Teacher - Co - .Ke - 2
No ratings yet
2021 KASSU COMPUTER STUDIES P2 QNS Teacher - Co - .Ke - 2
5 pages
SAS Overview Short
No ratings yet
SAS Overview Short
30 pages
JPA-ORM-OGM Lab
No ratings yet
JPA-ORM-OGM Lab
59 pages
10 SQL TOP, LIMIT or ROWNUM Clause
No ratings yet
10 SQL TOP, LIMIT or ROWNUM Clause
2 pages
Your Research 3
No ratings yet
Your Research 3
40 pages
Summit Learning Rubric
No ratings yet
Summit Learning Rubric
11 pages
Data Science: Concepts and Practice: Course Slides
No ratings yet
Data Science: Concepts and Practice: Course Slides
9 pages
Cost-Sensitive Cache Replacement Algorithms: Jaeheon Jeong and Michel Dubois
No ratings yet
Cost-Sensitive Cache Replacement Algorithms: Jaeheon Jeong and Michel Dubois
11 pages
Final Research Paper
No ratings yet
Final Research Paper
15 pages
Sample Data Science Resume
No ratings yet
Sample Data Science Resume
1 page
Ayush's Resume-1
No ratings yet
Ayush's Resume-1
1 page
Leica GeoMoS Monitor DS
No ratings yet
Leica GeoMoS Monitor DS
2 pages
RCI M.A. Clinical Psychology Syllabus 2025-26
100% (1)
RCI M.A. Clinical Psychology Syllabus 2025-26
40 pages
DEN0001C Principles of Arm Memory Maps
No ratings yet
DEN0001C Principles of Arm Memory Maps
25 pages
Aspiring Engineer for Rapyuta Robotics
No ratings yet
Aspiring Engineer for Rapyuta Robotics
1 page
B1 I FW01 Dev Env
No ratings yet
B1 I FW01 Dev Env
28 pages
PHOTOMOD. General Information PDF
100% (1)
PHOTOMOD. General Information PDF
88 pages
Ml-Data Wrangling-Assignment 01
No ratings yet
Ml-Data Wrangling-Assignment 01
2 pages
A Concept paper-WPS Office
No ratings yet
A Concept paper-WPS Office
15 pages
Quiz 1
0% (1)
Quiz 1
5 pages
Data Comm Error Correction Lab
No ratings yet
Data Comm Error Correction Lab
2 pages
ISPRS Journal Editorial Team
No ratings yet
ISPRS Journal Editorial Team
1 page
Psychological Statistics
No ratings yet
Psychological Statistics
8 pages
ATHENA+Vol+1+Issue+4 2
No ratings yet
ATHENA+Vol+1+Issue+4 2
12 pages
ML Deployment
No ratings yet
ML Deployment
25 pages

My Internship Document

Uploaded by

My Internship Document

Uploaded by

Summer Internship Report

IT 353 – SUMMER INTERNSHIP

R.V.R & J.C. COLLEGE OF ENGINEERING (Autonomous)

(AFFILIATED TO ACHARYA NAGARJUNA UNIVERSITY)

R.V.R & J.C COLLEGE OF ENGINEERING

Smt. B. Manasa Dr. A. SriKrishna

I am indebted to Dr. A. Srikrishna, Professor, and Head of the Department

o NAME OF TRAINING: DATA SCIENCE

DATA SCIENCE PROCESS:

2) PYTHON FOR DATA SCIENCE

3) UNDERSTANDING THE STATISTICS FOR DATA SCIENCE

Introduction to Data Science

Amazon Go – No checkout lines

Computer Vision - The advancement in recognizing an image by a computer involves

What can happen?

Value added to organization

• Reporting / Management Information System

Application of Data Science

• Deciding the right credit limit for credit card customers.

2. Past Data Searched

3. Discount Price Optimization

4. Personalized search results

Python for Data science:

1. Python is an open source language.

2. Syntax as simple as English.

3. Very large and Collaborative developer community.

Two types of Data structures:

enclosed within square brackets.

DICTIONARY: A dictionary is an unordered data structure with elements separated by

+Z – value is above mean.

Chi Squared Test

2. Analysing similar stocks.

3. Future stock price required.

Stages of Predictive Modelling

4. Data Exploration and Transformation

Why do we perform variable identification?

2. Different data processing techniques for categorical and continuous data.

Categorical variable- Stored as object.

2. Summarize the variable.

3. Make sense out of that summary to discover insights, anomalies, etc.

which missing value exist and other variables in dataset.

missing value exists

Continuous-Impute with help of mean, median or regression mode.

Row wise or column wise deletion. But it leads to loss of data.

4. Change in underlying population

2. Transforming non linear relationships into linear relationship

3. Creating symmetric distribution from skewed distribution.

Steps in Model Building

Eg- Predict the customer will buy product or not.

It is a process to learn relationship / correlation between independent and dependent

Algorithm of Machine Learning

6 The squared error or cost function, J as:

C = -y (log(y) – (1-y) log(1-y))

K-Means Clustering (Unsupervised learning)

3. A Highly Paid Career

4. Data Science is Versatile

2. A large Amount of Domain Knowledge Required

3. Arbitrary Data May Yield Unexpected Results

The problem of Data Privacy

• Companies’ Inability to handle data

• Revised Data Privacy Regulations

• Data Science is constantly evolving

• An astonishing incline in data growth

• Virtual Reality will be friendlier

• Blockchain updating with Data science

You might also like