0% found this document useful (0 votes)

24 views14 pages

Pandas Plots

Pandas provides options to visualize data through plots. Some examples include scatter, bar, histogram and other plots. It also discusses pivot tables and grouping to analyze and visualize data in different ways like finding highest values, counts and more.

Uploaded by

ASHUTOSH TRIVEDI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views14 pages

Pandas Plots

Uploaded by

ASHUTOSH TRIVEDI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Pandas plots

Pandas also provides us options to visualize the data. Here are some of the examples:

Syntax:

df.plot(X, y, marker, kind)

X = value on X axis

y = value on y axis

marker = shape in case of specific plots like a scatter plot

kind = type of plot

A scatter plot to visualize the trend of acceleration in different years.

1. df.plot(x = 'model_year', y = 'acceleration', marker = 'o',

kind = 'scatter');

2.
A bar plot to visualize mean acceleration in different years.

1. df.groupby('model_year').mean()[['acceleration']].plot(kind
= 'bar');

A histogram to visualize the frequency distribution of cylinders

1. df['cylinders'].plot(kind = 'hist')

2.
A scatter plot to visualize the relationship between weight and mpg.

1. df.plot(x = 'weight', y = 'mpg', kind = 'scatter')

2.
A bar plot to visualize the sorted mean values of acceleration with respect to number of
cylinders.

1. df.groupby('cylinders').mean().sort_values('acceleration')
[['acceleration']].plot(kind = 'bar')

Pandas Exercise

Problem Statement:
to find the rainfall dataset that is to be considered for the exercise.. This data contains
region(district) wise rainfall across India.

1. Import the data into Python environment as a Pandas DataFrame.

2. Check for missing values, if any and drop the corresponding rows.
3. Find the district that gets the highest annual rainfall.
4. Display the top 5 states that get the highest annual rainfall.
5. Drop the columns 'Jan-Feb', 'Mar-May', 'Jun-Sep', 'Oct-Dec'.
6. Display the state-wise mean rainfall for all the months using a pivot table.
7. Display the count of districts in each state.
8. For each state, display the district that gets the highest rainfall in May. Also display the
recorded rainfall.
Exercise
Problem Statement:
Given below is a DataFrame df:

1. import pandas as pd

3. df = pd.DataFrame([[0.23,'f1'],[5.36,'f2']],

4. index = list('pq'),

5. columns = list('ab'))

Do the following:

1. Change the column name from ' a' to ' A'.

2. Add a new column ' c' filled with random values.

3. Change the datatype of column ' A' values to complex.

4. Display rows whose any of the element matches with any element of the given list:

1. lst = ['f30','f50','f2','f0']

Merging
There are few more commonly used methods involved in Pandas as listed:

 Merging
 Reshaping
 Pivot Tables
 Grouping

Let us understand their importance by some illustrations starting with merging of datasets.
Given two datasets from a conducted experiment with each dataset having its own feature. Our
task is to form a single dataset combining all the features particular to each observation. To do so
we can take help of concat() function.

1. import pandas as pd

3. data1 = pd.DataFrame([[15, 12, -3],

4. [33, 54, 21],

5. [10, 32, 22]],

6. columns = list('ABC'))

7. data2 = pd.DataFrame([[10, 1, 3],

8. [33, -54, 2],

9. [10, 0.32, 2]],

10. columns = list('DEF'))

11. print(data1)

12. # A B C

13. # 0 15 12 -3

14. # 1 33 54 21

15. # 2 10 32 22

16.

17. print(data2)

18. # D E F

19. # 0 10 1.00 3

20. # 1 33 -54.00 2

21. # 2 10 0.32 2
22.

23. print(pd.concat( [data1,data2], axis = 1 ))

24. # A B C D E F

25. # 0 15 12 -3 10 1.00 3

26. # 1 33 54 21 33 -54.00 2

27. # 2 10 32 22 10 0.32 2

28.

29.

Suppose an individual took observation of 3 different features in two instances. Next, she wants
to combine all these samples to form a single dataset. We can achieve this task using the
same concat() function, but this time with the change of axis.

1. import numpy as np

2. import pandas as pd

4. data1 = pd.DataFrame(np.random.randn(9).reshape(3,3),

5. columns = list('ABC'))

6. data2 = pd.DataFrame(np.arange(9).reshape(3,3),

7. columns = list('ABC'))

8. print(data1) # Random values

9. # A B C

10. # 0 1.957218 0.433266 1.214950

11. # 1 -0.143500 -0.092030 -0.823898

12. # 2 0.481486 -0.024111 -0.769195

13.

14. print(data2)

15. # A B C

16. # 0 0 1 2

17. # 1 3 4 5

18. # 2 6 7 8

19.

20. print(pd.concat( [data1,data2], axis = 0 ))

21. # A B C

22. # 0 1.957218 0.433266 1.214950

23. # 1 -0.143500 -0.092030 -0.823898

24. # 2 0.481486 -0.024111 -0.769195

25. # 0 0.000000 1.000000 2.000000

26. # 1 3.000000 4.000000 5.000000

27. # 2 6.000000 7.000000 8.000000

Pivot Tables
Reshaping a dataset is a perfect seed for quick text visualization. However, similar to reshaping we have
another term named as pivot tables which are more efficient in delivering a better visualization.

To understand pivot tables we take the same last dataframe except adding a new feature 'score'.

1. import pandas as pd

3. df = pd.DataFrame([

4. ['IND', 'Gold', 'Game1', '9.9'],

5. ['IND', 'Bronze', 'Game2', '8'],

6. ['USA', 'Silver', 'Game1', '9.5'],

7. ['USA', 'Gold', 'Game2', '8.6'],

8. ], columns = ['Country', 'Medal',

9. 'Game', 'Score'],

10. index = ['Year1', 'Year2','Year1', 'Year2'])

11.

12. print(df)

13. # Country Medal Game Score

14. # Year1 IND Gold Game1 9.9

15. # Year2 IND Bronze Game2 8

16. # Year1 USA Silver Game1 9.5

17. # Year2 USA Gold Game2 8.6

18.

Pivot tables come handy when we have to break down a large dataset (in terms of features) into fewer features
for quick visualization. For example, finding which medal is common to both IND and USA, listing game(s) in
which India won Silver, etc.

We implement pivot table on the given dataset:

1. # Listing all the features

2. print(df.pivot(index = 'Country', columns = 'Medal'))

3. # Game Score

4. # Medal Bronze Gold Silver Bronze Gold Silver

5. # Country

6. # IND Game2 Game1 None 8 9.9 None

7. # USA None Game2 Game1 None 8.6 9.5

9. # Listing only Score feature

10. print(df.pivot(index = 'Country', columns = 'Medal',

11. values = 'Score'))

12. # Medal Bronze Gold Silver

13. # Country

14. # IND 8 9.9 None

15. # USA None 8.6 9.5

16.

Pivot table limitations

The previous dataset doesn't include any redundant sample, for instance, let's say 2 people win gold as well as
silver medals in Year1 of Game1 from India. In that case, the above function named pivot() breaks due to
ambiguity in Game column. Therefore, to overcome this limitation, we have another function named
as pivot_table() which requires an argument named as aggfun through which we can clear the ambiguity. Let's
understand it through an example.

1. import numpy as np

2. import pandas as pd

4. df = pd.DataFrame([

5. ['IND', 'Gold', 'Game1', '9.9'],

6. ['IND', 'Silver', 'Game1', '9.5'],

7. ['IND', 'Bronze', 'Game2', '8'],

8. ['USA', 'Bronze', 'Game1', '9.0'],

9. ['USA', 'Silver', 'Game2', '8.6'],

10. ], columns = ['Country', 'Medal',

11. 'Game', 'Score'],

12. index = ['Year1', 'Year1', 'Year2','Year1',

'Year2'])

13.

14. print(df)

15. # Country Medal Game Score

16. # Year1 IND Gold Game1 9.9

17. # Year1 IND Silver Game1 9.5

18. # Year2 IND Bronze Game2 8

19. # Year1 USA Bronze Game1 9.0

20. # Year2 USA Silver Game2 8.6

21.

22. print(df.pivot_table(index = 'Country',

23. columns = 'Game',

24. values = 'Score',

25. aggfunc = np.max))

26. # Game Game1 Game2

27. # Country

28. # IND 9.9 8

29. # USA 9.0 8.6

Grouping
To understand grouping, a concept similar to databases, let us consider that we have been given a dataset for
the sales of laptop and desktop systems. The observations for a particular type can be repeated for the different
sales price. In this case, if we need to calculate the total sales of each category then we can group similar data
and apply a certain function.

To do so we create a sample dummy dataset and sum up the total sales particular to each category.

1. import pandas as pd

3. df = pd.DataFrame([["Laptop", 1000],

4. ["Laptop", 2520],

5. ["Desktop", 3000],

6. ["Desktop", 400]], columns =

['Category','Sales'])

8. print(df)

9. # Category Sales

10. # 0 Laptop 1000

11. # 1 Laptop 2520

12. # 2 Desktop 3000

13. # 3 Desktop 400

14.

15. print(df.groupby(['Category'], sort = False).sum())

16. # Sales

17. # Category

18. # Laptop 3520

19. # Desktop 3400

Exercise

Problem Statement:
Given a dataframe df which has three attributes defining: set_name: system names, spd_per_day:
Speed per day, speed: Network speed in MBps

1. sys = ['s1','s1','s1','s1',

2. 's2','s2','s2','s2']

3. net_day = ['d1','d1','d2','d2',

4. 'd1','d1','d2','d2']

5. spd = [1.3, 11.4, 5.6, 12.3,

6. 6.2, 1.1, 20.0, 8.8]

7. df = pd.DataFrame({'set_name':sys,

8. 'spd_per_day':net_day,

9. 'speed':spd})

10.

Do the following:

1. Construct a dataframe new_df where the given dataset is grouped based on each system
(s1 and s2) and speed per day (d1 and d2) with the median speed each day per system.
Also, provide a secondary name ' Median' for the speed attribute.
2. Sort the dataframe new_df in the ascending order of the median speed.

Hints/Answers:

1. Answer:

1. set_name spd_per_day speed

2. Median
3. 0 s1 d1 6.35

4. 1 s1 d2 8.95

5. 2 s2 d1 3.65

6. 3 s2 d2 14.40

2. Answer:

1. set_name spd_per_day speed

2. Median

3. 2 s2 d1 3.65

4. 0 s1 d1 6.35

5. 1 s1 d2 8.95

6. 3 s2 d2 14.40

Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
98 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
Python & Pandas Cheat Sheet Guide
100% (2)
Python & Pandas Cheat Sheet Guide
5 pages
Unit 5 2
No ratings yet
Unit 5 2
6 pages
Pandas Data Wrangling Cheat Sheet
100% (2)
Pandas Data Wrangling Cheat Sheet
6 pages
Pandas Data Handling & Visualization Guide
100% (1)
Pandas Data Handling & Visualization Guide
37 pages
Pandas
No ratings yet
Pandas
25 pages
Practical File 2024
No ratings yet
Practical File 2024
25 pages
Acknowledgement
No ratings yet
Acknowledgement
25 pages
EDA Module 3-1
No ratings yet
EDA Module 3-1
40 pages
Data Aggregation and Group Operations
No ratings yet
Data Aggregation and Group Operations
34 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Creation of Series Using List, Dictionary & Ndarray
No ratings yet
Creation of Series Using List, Dictionary & Ndarray
65 pages
Python CSBS Bhavya Lab Manual
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
Unit 3 Python B.SC IT
No ratings yet
Unit 3 Python B.SC IT
18 pages
Practical File 12th
No ratings yet
Practical File 12th
19 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Pandas Notes
No ratings yet
Pandas Notes
27 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
6 pages
Data Handling Part Ii
No ratings yet
Data Handling Part Ii
41 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Python & Data Science Cheat Sheet
100% (4)
Python & Data Science Cheat Sheet
11 pages
Manipulating Dataframes With Pandas: Case Study: Olympic Medals
No ratings yet
Manipulating Dataframes With Pandas: Case Study: Olympic Medals
30 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Aadarsh
No ratings yet
Aadarsh
26 pages
Python Pandas
No ratings yet
Python Pandas
34 pages
Class XII-IP-Practical File 1
No ratings yet
Class XII-IP-Practical File 1
28 pages
Pandas
No ratings yet
Pandas
13 pages
CLASS XII - IP List of Practicals With Coding 2020
No ratings yet
CLASS XII - IP List of Practicals With Coding 2020
15 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Even Students
No ratings yet
Even Students
36 pages
Python 2.1.3
No ratings yet
Python 2.1.3
6 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Week 3 GGG
No ratings yet
Week 3 GGG
17 pages
Lab Record IP
No ratings yet
Lab Record IP
13 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
Informatics Practices Class 12 Cbse Notes Data Handling
0% (1)
Informatics Practices Class 12 Cbse Notes Data Handling
17 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
Pandas
No ratings yet
Pandas
44 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
Python Cheat Sheet 2.0
100% (2)
Python Cheat Sheet 2.0
10 pages
12 IP Practical Exampl
No ratings yet
12 IP Practical Exampl
6 pages
Fortran 95 Practical Exercises
No ratings yet
Fortran 95 Practical Exercises
10 pages
Geo S8
No ratings yet
Geo S8
4 pages
Exploring The Best Indicators in TA-Lib - Technical Analysis of Stocks Using Python - Part 1 - by Himanshu Sharma - MLearning - Ai - Medium
No ratings yet
Exploring The Best Indicators in TA-Lib - Technical Analysis of Stocks Using Python - Part 1 - by Himanshu Sharma - MLearning - Ai - Medium
14 pages
Form P
No ratings yet
Form P
2 pages
Property & Supply Management: Holy Name University
No ratings yet
Property & Supply Management: Holy Name University
228 pages
Boolean Syntax Extensions For The TI-Nspire CX CAS Handheld and Associated Emulators
No ratings yet
Boolean Syntax Extensions For The TI-Nspire CX CAS Handheld and Associated Emulators
18 pages
Cem Intro v1.3
No ratings yet
Cem Intro v1.3
35 pages
Convolution Tasks with MATLAB Code
No ratings yet
Convolution Tasks with MATLAB Code
42 pages
Technology Plan and Infrastructure Support System
No ratings yet
Technology Plan and Infrastructure Support System
6 pages
Windows 8.1 Product Key
75% (8)
Windows 8.1 Product Key
1 page
Registration Form Test Cases
No ratings yet
Registration Form Test Cases
14 pages
Python Decorators for Developers
No ratings yet
Python Decorators for Developers
12 pages
Structed Query Language: Questions Answer
No ratings yet
Structed Query Language: Questions Answer
13 pages
Binary Number Conversion Guide
No ratings yet
Binary Number Conversion Guide
5 pages
g7t Datasheet en
No ratings yet
g7t Datasheet en
7 pages
AD 2020-10-05 Collins FMS
No ratings yet
AD 2020-10-05 Collins FMS
9 pages
Understanding Laser Printer
No ratings yet
Understanding Laser Printer
5 pages
Lecture 4
No ratings yet
Lecture 4
10 pages
OleksiiD Frontend CV
No ratings yet
OleksiiD Frontend CV
1 page
Python Lab Manual With Output Alg FC A
No ratings yet
Python Lab Manual With Output Alg FC A
12 pages
Chat Log
No ratings yet
Chat Log
102 pages
States The Philosophy or The Rationale of The Study
No ratings yet
States The Philosophy or The Rationale of The Study
4 pages
Turk Omron
No ratings yet
Turk Omron
36 pages
IT General Controls
No ratings yet
IT General Controls
18 pages
Sap Abap Programming Syllabus
No ratings yet
Sap Abap Programming Syllabus
5 pages
Computer Basics Worksheet
No ratings yet
Computer Basics Worksheet
8 pages
Guide On Using CLAN
No ratings yet
Guide On Using CLAN
10 pages
DAILY - LESSON - PLAN-coherent Final
No ratings yet
DAILY - LESSON - PLAN-coherent Final
8 pages
Taking Notes in The Digital Age
No ratings yet
Taking Notes in The Digital Age
39 pages
Gen. Math Reviewer
No ratings yet
Gen. Math Reviewer
165 pages