Presentation

Uploaded by

harithmsylhy3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views19 pages

Presentation

Uploaded by

harithmsylhy3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Eng: Mahmoud Yahia

Recap
Create Dataframe
df = pd.DataFrame({'Name':['John','Smith','Paul','Mark'],'Age':[25,30,50,45]})
df = pd.DataFrame([['John',25],['Smith',30],['Paul',50],['Mark',45]],columns=['Name','Age'])
df = pd.DataFrame([{'Name':'John','Age':25},{'Name':'Smith','Age':30},{'Name':'Paul','Age':50},{'Name':'Mark','Age':45}])

Features

Observations

Create series

Ser = pd.Series([1, 2, 3, 4])

DataFrame Index
Recap
Create Dataframe
Create series

Concat Dataframes
Recap
Create Dataframe
Create series

Concat Dataframes
Recap
Create Dataframe
Create series Values Count
Concat Dataframes
Recap
Create Dataframe
Create series
Concat Dataframes
Shape/Len
Values Count
Recap
Create Dataframe
Create series
Concat Dataframes

Values Count

Shape/Len nunique()

describe()
Recap
Create Dataframe df.drop_duplicates()
Create series
Concat Dataframes

Values Count

Shape/Len df.drop_duplicates(subset=['Name'])
nunique()

describe()

df.drop_duplicates(keep='first')

df.drop_duplicates(subset=['Name'], keep='last')
Recap
Create Dataframe
Create series
Concat Dataframes

Values Count sort_values()

Shape/Len
nunique()

describe()
drop_duplicates()
rename()

sort_index()
reset_index()
MatplotLib
Matplotlib is a low-level library that provides a wide range of plotting options. It allows you to create basic charts,
such as line charts, scatter plots, and bar charts, as well as more complex visualizations, such as heatmaps, contour
plots, and 3D plots. Matplotlib provides a lot of control over the details of the plot, but it can be more difficult to
use than Seaborn.

Seaborn is a high-level library that is built on top of Matplotlib. It provides a simpler interface for creating common
types of statistical plots, such as scatter plots, line plots, and bar plots. Seaborn also provides more advanced
statistical visualizations, such as violin plots, box plots, and heatmaps. Seaborn is designed to work well with Pandas
data frames, and it provides built-in support for many common data visualization tasks, such as grouping data by
categories and computing summary statistics.

In general, if you need to create complex visualizations or have very specific requirements for your plots, Matplotlib
may be the better choice. If you want to create common types of statistical plots quickly and easily, or if you are
working with Pandas data frames, Seaborn may be the better choice.
Data cleaning is the process of identifying and correcting or removing errors, inconsistencies, and inaccuracies in
data. It is an important step in the data analysis process, as it ensures that the data is accurate and reliable.

There are several techniques that can be used for data cleaning, such as removing duplicates, handling missing values,
correcting data types, and standardizing data formats. The specific techniques used will depend on the nature of the
data and the goals of the analysis.

Categorical data is data that consists of categories or labels, rather than numerical values. Examples of categorical
data include gender, race, and occupation.

When working with categorical data in Python, it is important to encode the data in a way that can be used in machine
learning models. One common technique is one-hot encoding, which creates a binary column for each category. Another
technique is label encoding, which assigns a numerical value to each category.

The pandas library provides several functions for working with categorical data, including pd.Categorical(), which
creates a categorical variable, and pd.get_dummies(), which performs one-hot encoding. The sklearn library also
provides several functions for working with categorical data, including sklearn.preprocessing.LabelEncoder(), which
performs label encoding, and sklearn.preprocessing.OneHotEncoder(), which performs one-hot encoding.

When working with categorical data, it is important to choose the appropriate encoding technique based on the nature
of the data and the goals of the analysis.
A Scatterplot displays the value of 2 sets of data on 2 dimensions. Each dot represents an observation. The position
on the X (horizontal) and Y (vertical) axis represents the values of the 2 variables. It is really useful to study
the relationship between both variables. It is common to provide even more information using colors or shapes (to
show groups, or a third variable). It is also possible to map another variable to the size of each dot, what makes a
bubble plot. If you have many dots and struggle with overplotting, consider using 2D density plot.
A line plot is a type of chart that displays data as a series of points connected by straight lines. It is useful for
showing trends over time or for comparing two or more sets of data. Each point on the line represents a data value,
and the line shows how the values change over time or across different categories. Line plots are commonly used in
scientific research, finance, and other fields where data analysis is important.

An histogram is an accurate graphical representation of the distribution of numerical data. It takes as input one
numerical variable only. The variable is cut into several bins, and the number of observation per bin is represented
by the height of the bar.
A barplot (or barchart) is one of the most common type of plot. It shows the relationship between a numerical
variable and a categorical variable. For example, you can display the height of several individuals using bar chart.

A Box and Whisker Plot (or Box Plot) is a convenient way of visually displaying the data distribution through
their quartiles.
Thank you

Practical 2 fKs4RPadH3
No ratings yet
Practical 2 fKs4RPadH3
4 pages
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
No ratings yet
Tung Wah College GEN3005 / GED3005 Big Data and Data Sciences
7 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Dev Lab Manual Org
No ratings yet
Dev Lab Manual Org
28 pages
3-Data Description
No ratings yet
3-Data Description
91 pages
Data Visualization Essentials
No ratings yet
Data Visualization Essentials
87 pages
2 Mark Key DS
No ratings yet
2 Mark Key DS
3 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
Lecture 4
No ratings yet
Lecture 4
60 pages
Unit 5
No ratings yet
Unit 5
25 pages
Unit II 09 Data Visualization Matplotlib
No ratings yet
Unit II 09 Data Visualization Matplotlib
9 pages
DS Day 5
No ratings yet
DS Day 5
11 pages
Data Visualization Techniques Guide
No ratings yet
Data Visualization Techniques Guide
48 pages
NumPy, Pandas, MatplotLib, Seaborn, ScikitLearn (SkLearn)
No ratings yet
NumPy, Pandas, MatplotLib, Seaborn, ScikitLearn (SkLearn)
14 pages
DVA Practical
No ratings yet
DVA Practical
19 pages
Comprehensive Data Visualization With Matplotlib and Seaborn
No ratings yet
Comprehensive Data Visualization With Matplotlib and Seaborn
40 pages
Aphical Representation
No ratings yet
Aphical Representation
8 pages
BDA File
No ratings yet
BDA File
26 pages
EDAV Manual With Code
No ratings yet
EDAV Manual With Code
70 pages
Matplotlib Notes
No ratings yet
Matplotlib Notes
5 pages
Unit 3
No ratings yet
Unit 3
36 pages
Seaborn
No ratings yet
Seaborn
17 pages
Data Visualization
No ratings yet
Data Visualization
35 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Chapt-3 Data Visualization
No ratings yet
Chapt-3 Data Visualization
73 pages
Data Analysis
No ratings yet
Data Analysis
20 pages
16 Mark Ds
No ratings yet
16 Mark Ds
18 pages
DSBDAL - Assignment No 9
No ratings yet
DSBDAL - Assignment No 9
12 pages
19 Matplotlib
No ratings yet
19 Matplotlib
26 pages
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
ITS62604 Tutorial 6 (Answer)
No ratings yet
ITS62604 Tutorial 6 (Answer)
2 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
DAUP Exam Notes - 2in1
No ratings yet
DAUP Exam Notes - 2in1
35 pages
Mvda - Question Bank
No ratings yet
Mvda - Question Bank
14 pages
DMV Unit-4-1 PDF
No ratings yet
DMV Unit-4-1 PDF
10 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
Python Data Visualization Guide
No ratings yet
Python Data Visualization Guide
34 pages
STQS2223 CH 4
No ratings yet
STQS2223 CH 4
30 pages
Experiment No 9
No ratings yet
Experiment No 9
13 pages
Machine Learning: Technical Requirements & Data Processing Guide
No ratings yet
Machine Learning: Technical Requirements & Data Processing Guide
30 pages
Pandas Cheat Sheet 2
No ratings yet
Pandas Cheat Sheet 2
12 pages
L5 6 DataViz
No ratings yet
L5 6 DataViz
79 pages
Data Visualization with Matplotlib
No ratings yet
Data Visualization with Matplotlib
50 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
AI & Data Science Lab Record
No ratings yet
AI & Data Science Lab Record
28 pages
Lec 19
No ratings yet
Lec 19
14 pages
Ai&Ml Bail606 ML Lab Manual
No ratings yet
Ai&Ml Bail606 ML Lab Manual
50 pages
IntroToPython Unit 5
No ratings yet
IntroToPython Unit 5
42 pages
Presentation - University
No ratings yet
Presentation - University
52 pages
ML ch-1
No ratings yet
ML ch-1
32 pages
Unit 2
No ratings yet
Unit 2
36 pages
Seaborn
No ratings yet
Seaborn
7 pages
Python Libraries
No ratings yet
Python Libraries
27 pages
Matplotlib Visualization Guide
No ratings yet
Matplotlib Visualization Guide
47 pages
Logistic Regression
No ratings yet
Logistic Regression
87 pages
Python Lec3
No ratings yet
Python Lec3
28 pages
Linear Regression
No ratings yet
Linear Regression
80 pages
Assignment Numpy
No ratings yet
Assignment Numpy
1 page
Python Lec3
No ratings yet
Python Lec3
46 pages
Assignment Pandas#
No ratings yet
Assignment Pandas#
1 page
Assignment ML3
No ratings yet
Assignment ML3
2 pages
Regression Assignment#
No ratings yet
Regression Assignment#
1 page
11 What Is Hashing in DBMS
No ratings yet
11 What Is Hashing in DBMS
20 pages
Knapsack Encryption Algorithm in Cryptography
No ratings yet
Knapsack Encryption Algorithm in Cryptography
2 pages
Cycle Counting Methods For Fatigue
No ratings yet
Cycle Counting Methods For Fatigue
7 pages
03.2 03.3 Shingling MinHash
No ratings yet
03.2 03.3 Shingling MinHash
32 pages
ML Model Evaluation Basics
No ratings yet
ML Model Evaluation Basics
4 pages
GCD Egcd New
No ratings yet
GCD Egcd New
10 pages
Face Image Analysis by Unsupervised Learning Scribd PDF Download
100% (19)
Face Image Analysis by Unsupervised Learning Scribd PDF Download
17 pages
SPA5304 Physical Dynamics Homework 3: (20 Marks)
No ratings yet
SPA5304 Physical Dynamics Homework 3: (20 Marks)
2 pages
Blockchain Lecture#1
No ratings yet
Blockchain Lecture#1
64 pages
Coding Question Bank
No ratings yet
Coding Question Bank
48 pages
Advanced Thermodynamics and Combustion: Thermodynamic Property Relations (Module III)
No ratings yet
Advanced Thermodynamics and Combustion: Thermodynamic Property Relations (Module III)
15 pages
Fake News Detection
No ratings yet
Fake News Detection
6 pages
Cse 5
No ratings yet
Cse 5
4 pages
Adversarial Attacks on LLMs
No ratings yet
Adversarial Attacks on LLMs
31 pages
DSA Solved Paper (May - June 2023) by VP
No ratings yet
DSA Solved Paper (May - June 2023) by VP
31 pages
Understanding SSL/TLS for IT Professionals
No ratings yet
Understanding SSL/TLS for IT Professionals
18 pages
Local Search
No ratings yet
Local Search
34 pages
OR New
No ratings yet
OR New
8 pages
Questions-09 01 2024
No ratings yet
Questions-09 01 2024
4 pages
Validity and Tautology in Logic
No ratings yet
Validity and Tautology in Logic
6 pages
Principle of Unattainability of Absolute
No ratings yet
Principle of Unattainability of Absolute
12 pages
Assignment 2
0% (1)
Assignment 2
4 pages
Introduction To Environmental Data Science
No ratings yet
Introduction To Environmental Data Science
649 pages
2 1 Results
No ratings yet
2 1 Results
2 pages
King Saud University Department of Mathematics 244 First Midterm, March 2016
No ratings yet
King Saud University Department of Mathematics 244 First Midterm, March 2016
6 pages
Year 6 Maths 17th July.206471143
No ratings yet
Year 6 Maths 17th July.206471143
2 pages
Khamene 2000
No ratings yet
Khamene 2000
10 pages
5 More On Optimum Design Concepts:: Optimality Conditions
No ratings yet
5 More On Optimum Design Concepts:: Optimality Conditions
88 pages
Big Data Science Diploma Egypt
No ratings yet
Big Data Science Diploma Egypt
4 pages
Engineering Math Exam Guide
No ratings yet
Engineering Math Exam Guide
2 pages

Presentation

Uploaded by

Presentation

Uploaded by

Eng: Mahmoud Yahia

Ser = pd.Series([1, 2, 3, 4])

Values Count sort_values()

You might also like