0% found this document useful (0 votes)

34 views5 pages

Data Analysis with Python Tools

The document loads and explores a dataset containing employees' years of experience and salaries. Descriptive statistics are calculated and visualizations like pairplots and heatmaps are used to analyze relationships in the data. A logistic regression model is then trained to predict whether a salary is above a threshold based on years of experience. The model is trained on a sample of the data and evaluated on the remaining data using metrics like accuracy and the confusion matrix.

Uploaded by

avnimote121

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views5 pages

Data Analysis with Python Tools

Uploaded by

avnimote121

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

3/6/24, 9:59 AM Untitled27.

ipynb - Colaboratory

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline

Data = pd.read_csv('Salary_Data - Salary_Data.csv')

# Display basic info about the dataset

Data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 YearsExperience 30 non-null float64
1 Salary 30 non-null int64
dtypes: float64(1), int64(1)
memory usage: 608.0 bytes

# Display descriptive statistics of the dataset

Data.describe()

YearsExperience Salary

count 30.000000 30.000000

mean 5.313333 76003.000000

std 2.837888 27414.429785

min 1.100000 37731.000000

25% 3.200000 56720.750000

50% 4.700000 65237.000000

75% 7.700000 100544.750000

max 10.500000 122391.000000

# Plot pairwise relationships in the dataset

sns.pairplot(Data)

https://colab.research.google.com/drive/1kvXbQLsxeB40qzEbiBRoPPq-tqCGiUU3#scrollTo=Haca6ahBBeB3&printMode=true 1/5
3/6/24, 9:59 AM Untitled27.ipynb - Colaboratory

<seaborn.axisgrid.PairGrid at 0x78cced27a410>

# Plot heatmap of correlations

sns.heatmap(Data.corr(), annot=True)

https://colab.research.google.com/drive/1kvXbQLsxeB40qzEbiBRoPPq-tqCGiUU3#scrollTo=Haca6ahBBeB3&printMode=true 2/5
3/6/24, 9:59 AM Untitled27.ipynb - Colaboratory

<Axes: >

# Plot distribution of Salary

sns.distplot(Data["Salary"])

https://colab.research.google.com/drive/1kvXbQLsxeB40qzEbiBRoPPq-tqCGiUU3#scrollTo=Haca6ahBBeB3&printMode=true 3/5
3/6/24, 9:59 AM Untitled27.ipynb - Colaboratory

<ipython-input-7-f50739657602>:2: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(Data["Salary"])
<Axes: xlabel='Salary', ylabel='Density'>

# Assuming you have a threshold to classify whether salary is above a certain level
threshold = 70000 # Example threshold

# Creating a binary target variable based on the threshold

Data['AboveThreshold'] = (Data['Salary'] > threshold).astype(int)

# Splitting the dataset into train and test sets

X = Data[['YearsExperience']]
y = Data['AboveThreshold']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=21

# Training the logistic regression model

from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

https://colab.research.google.com/drive/1kvXbQLsxeB40qzEbiBRoPPq-tqCGiUU3#scrollTo=Haca6ahBBeB3&printMode=true 4/5
3/6/24, 9:59 AM Untitled27.ipynb - Colaboratory

▾ LogisticRegression
LogisticRegression()

# Making predictions
predictions = log_reg.predict(X_test)

# Evaluating the model

from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
accuracy = accuracy_score(y_test, predictions)
conf matrix = confusion matrix(y test predictions)

https://colab.research.google.com/drive/1kvXbQLsxeB40qzEbiBRoPPq-tqCGiUU3#scrollTo=Haca6ahBBeB3&printMode=true 5/5

Salary Prediction
No ratings yet
Salary Prediction
32 pages
2 Program
No ratings yet
2 Program
8 pages
Office Sales Data Visualization
No ratings yet
Office Sales Data Visualization
27 pages
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
Linear - Regression - Ipynb - Colaboratory
No ratings yet
Linear - Regression - Ipynb - Colaboratory
4 pages
Eda 3
No ratings yet
Eda 3
6 pages
Employee Info
No ratings yet
Employee Info
2 pages
Salaries For San Francisco Employee
No ratings yet
Salaries For San Francisco Employee
30 pages
ML Expt 1 Description
No ratings yet
ML Expt 1 Description
15 pages
Data Analysis CheatSheet
No ratings yet
Data Analysis CheatSheet
2 pages
Ai&Ml Bail606 ML Lab Manual
No ratings yet
Ai&Ml Bail606 ML Lab Manual
50 pages
Experiment No 11
No ratings yet
Experiment No 11
19 pages
Salaries For San Francisco Employee - ML - FA - DA Projects
No ratings yet
Salaries For San Francisco Employee - ML - FA - DA Projects
33 pages
Boston House Prediction - Colab1
No ratings yet
Boston House Prediction - Colab1
10 pages
Exp1d
No ratings yet
Exp1d
6 pages
Co 2 Multivariate Analysis
No ratings yet
Co 2 Multivariate Analysis
71 pages
2,3. Introduction Pandas & Matplotlib
No ratings yet
2,3. Introduction Pandas & Matplotlib
32 pages
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
Data Engineer Interview 1740985064
No ratings yet
Data Engineer Interview 1740985064
14 pages
ML Projects
No ratings yet
ML Projects
22 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages
ML Lab1 Python Panda
No ratings yet
ML Lab1 Python Panda
9 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
17 pages
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
Predicting Salary with Experience
100% (1)
Predicting Salary with Experience
7 pages
Data Python
No ratings yet
Data Python
2 pages
ML Merged
No ratings yet
ML Merged
28 pages
Maxbox Starter139 Top5 Data Diagram Types
No ratings yet
Maxbox Starter139 Top5 Data Diagram Types
4 pages
Intro Pandas
No ratings yet
Intro Pandas
18 pages
Data Project
No ratings yet
Data Project
12 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Geo Python Doc (1) 7,8 Bavesh
No ratings yet
Geo Python Doc (1) 7,8 Bavesh
9 pages
Step-by-Step Explanation of Python Data Preprocessing Script
No ratings yet
Step-by-Step Explanation of Python Data Preprocessing Script
9 pages
Pandas Plotting Capabilities
No ratings yet
Pandas Plotting Capabilities
27 pages
Pandas Visualisation
No ratings yet
Pandas Visualisation
27 pages
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
No ratings yet
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
10 pages
Advanced Plot Types With Seaborn
No ratings yet
Advanced Plot Types With Seaborn
8 pages
Python For Data Science
No ratings yet
Python For Data Science
45 pages
Data Analysis Exam for CS Majors
No ratings yet
Data Analysis Exam for CS Majors
12 pages
Data Analysis W Pandas
No ratings yet
Data Analysis W Pandas
4 pages
Quantile Bins and Countplots for Diamonds Dataset
No ratings yet
Quantile Bins and Countplots for Diamonds Dataset
12 pages
Eda 2 Code
No ratings yet
Eda 2 Code
20 pages
West Rox
No ratings yet
West Rox
29 pages
L6 and 7-Data Preprocessing-Coding
No ratings yet
L6 and 7-Data Preprocessing-Coding
34 pages
Unit - 4 - Part 2
No ratings yet
Unit - 4 - Part 2
36 pages
Data Visualization
No ratings yet
Data Visualization
13 pages
Descriptive Analytics - Ipynb - Colab
No ratings yet
Descriptive Analytics - Ipynb - Colab
9 pages
Pandas
No ratings yet
Pandas
35 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Hint Sheet
No ratings yet
Hint Sheet
13 pages
Usage of NumPy For Numerical Data in Detail
No ratings yet
Usage of NumPy For Numerical Data in Detail
52 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
DV 6
No ratings yet
DV 6
9 pages
8I Heating and Cooling Multiple Choice Test
90% (10)
8I Heating and Cooling Multiple Choice Test
3 pages
Oxygene Station 20 m3
No ratings yet
Oxygene Station 20 m3
2 pages
Advanced Welding Technology in Metals
100% (1)
Advanced Welding Technology in Metals
448 pages
Edta
100% (1)
Edta
6 pages
AI Project Cycle Tanushri
No ratings yet
AI Project Cycle Tanushri
2 pages
Pioglitazone A Review of Analytical Met - 2014 - Journal of Pharmaceutical Anal
No ratings yet
Pioglitazone A Review of Analytical Met - 2014 - Journal of Pharmaceutical Anal
8 pages
Since
No ratings yet
Since
12 pages
Rules For The Survey and Construction of Steel Ships: Part C
No ratings yet
Rules For The Survey and Construction of Steel Ships: Part C
19 pages
Stress and Strain
No ratings yet
Stress and Strain
8 pages
Newautomatic Control Stg1 2023
No ratings yet
Newautomatic Control Stg1 2023
103 pages
Scilab Manual For Image Processing by MR Gautam Pal Computer Engineering Tripura Institute of Technlogy
No ratings yet
Scilab Manual For Image Processing by MR Gautam Pal Computer Engineering Tripura Institute of Technlogy
50 pages
Final Report Sabina Khatri
No ratings yet
Final Report Sabina Khatri
36 pages
B.Tech CSE BEY 2022 Batch
No ratings yet
B.Tech CSE BEY 2022 Batch
10 pages
Engine Dismantling
100% (1)
Engine Dismantling
9 pages
User'S Manual: Single Phase Multi-Function Protector
100% (2)
User'S Manual: Single Phase Multi-Function Protector
6 pages
ASTM D6166-97 (Color Gardner)
No ratings yet
ASTM D6166-97 (Color Gardner)
3 pages
Synthetic Detergent and Linear Alkyl Benzene
100% (1)
Synthetic Detergent and Linear Alkyl Benzene
37 pages
S RV Calculator
No ratings yet
S RV Calculator
4 pages
Chapter 1 - B
No ratings yet
Chapter 1 - B
6 pages
Trabajo Final Ingles Tecnico
100% (1)
Trabajo Final Ingles Tecnico
73 pages
History of Indian Philosophy Vol 2 - Frauwallner, Erich
No ratings yet
History of Indian Philosophy Vol 2 - Frauwallner, Erich
274 pages
Syllabus Bscit 2015
No ratings yet
Syllabus Bscit 2015
63 pages
SAS B.Inggris
No ratings yet
SAS B.Inggris
4 pages
1 Development Platform
No ratings yet
1 Development Platform
12 pages
Engineering Fracture Mechanics
No ratings yet
Engineering Fracture Mechanics
16 pages
HTML Notes by Manthan
No ratings yet
HTML Notes by Manthan
9 pages
ML Naive Bayes 1
No ratings yet
ML Naive Bayes 1
19 pages
Portfolio Analysis and Selection
No ratings yet
Portfolio Analysis and Selection
57 pages
HVAC/Plumbing QC Interview Guide
No ratings yet
HVAC/Plumbing QC Interview Guide
23 pages