0% found this document useful (0 votes)

55 views9 pages

Pivot Tables

Uploaded by

pavanka22072004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views9 pages

Pivot Tables

Uploaded by

pavanka22072004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

12/6/22, 10:10 AM 03.09-Pivot-Tables.

ipynb - Colaboratory

Pivot Tables

We have seen how the GroupBy abstraction lets us explore relationships within a dataset. A
pivot table is a similar operation that is commonly seen in spreadsheets and other programs that
operate on tabular data. The pivot table takes simple column-wise data as input, and groups the
entries into a two-dimensional table that provides a multidimensional summarization of the
data. The difference between pivot tables and GroupBy can sometimes cause confusion; it helps
me to think of pivot tables as essentially a multidimensional version of GroupBy aggregation.
That is, you split-apply-combine, but both the split and the combine happen across not a one-
dimensional index, but across a two-dimensional grid.

Motivating Pivot Tables

For the examples in this section, we'll use the database of passengers on the Titanic, available
through the Seaborn library (see Visualization With Seaborn):

import numpy as np
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')

titanic.head()

survived pclass sex age sibsp parch fare embarked class who adul

0 0 3 male 22.0 1 0 7.2500 S Third man

1 1 1 female 38.0 1 0 71.2833 C First woman

2 1 3 female 26.0 0 0 7.9250 S Third woman

3 1 1 female 35.0 1 0 53.1000 S First woman

4 0 3 male 35.0 0 0 8.0500 S Third man

This contains a wealth of information on each passenger of that ill-fated voyage, including
gender, age, class, fare paid, and much more.

Pivot Tables by Hand

https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.09-Pivot-Tables.ipynb#scrollTo=tRSF… 1/10
12/6/22, 10:10 AM 03.09-Pivot-Tables.ipynb - Colaboratory

To start learning more about this data, we might begin by grouping according to gender, survival
status, or some combination thereof. If you have read the previous section, you might be
tempted to apply a GroupBy operation–for example, let's look at survival rate by gender:

titanic.groupby('sex')[['survived']].mean()

survived

sex

female 0.742038

male 0.188908

This immediately gives us some insight: overall, three of every four females on board survived,
while only one in five males survived!

This is useful, but we might like to go one step deeper and look at survival by both sex and, say,
class. Using the vocabulary of GroupBy , we might proceed using something like this: we group
by class and gender, select survival, apply a mean aggregate, combine the resulting groups, and
then unstack the hierarchical index to reveal the hidden multidimensionality. In code:

titanic.groupby(['sex', 'class'])['survived'].aggregate('mean').unstack()

class First Second Third

sex

female 0.968085 0.921053 0.500000

male 0.368852 0.157407 0.135447

This gives us a better idea of how both gender and class affected survival, but the code is
starting to look a bit garbled. While each step of this pipeline makes sense in light of the tools
we've previously discussed, the long string of code is not particularly easy to read or use. This
two-dimensional GroupBy is common enough that Pandas includes a convenience routine,
pivot_table , which succinctly handles this type of multi-dimensional aggregation.

Pivot Table Syntax

Here is the equivalent to the preceding operation using the pivot_table method of DataFrame s:

titanic.pivot_table('survived', index='sex', columns='class')

https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.09-Pivot-Tables.ipynb#scrollTo=tRSF… 2/10
12/6/22, 10:10 AM 03.09-Pivot-Tables.ipynb - Colaboratory

class First Second Third

sex

female 0.968085 0.921053 0.500000

This is male
eminently more readable than the groupby approach, and produces the same result. As
0.368852 0.157407 0.135447
you might expect of an early 20th-century transatlantic cruise, the survival gradient favors both
women and higher classes. First-class women survived with near certainty (hi, Rose!), while only
one in ten third-class men survived (sorry, Jack!).

Multi-level pivot tables

Just as in the GroupBy , the grouping in pivot tables can be specified with multiple levels, and via
a number of options. For example, we might be interested in looking at age as a third dimension.
We'll bin the age using the pd.cut function:

age = pd.cut(titanic['age'], [0, 18, 80])

titanic.pivot_table('survived', ['sex', age], 'class')

class First Second Third

sex age

female (0, 18] 0.909091 1.000000 0.511628

(18, 80] 0.972973 0.900000 0.423729

male (0, 18] 0.800000 0.600000 0.215686

(18, 80] 0.375000 0.071429 0.133663

We can apply the same strategy when working with the columns as well; let's add info on the
fare paid using pd.qcut to automatically compute quantiles:

fare = pd.qcut(titanic['fare'], 2)
titanic.pivot_table('survived', ['sex', age], [fare, 'class'])

fare [0, 14.454] (14.454, 512.329]

class First Second Third First Second Third

sex age

female (0, 18] NaN 1.000000 0.714286 0.909091 1.000000 0.318182

(18, 80] NaN 0.880000 0.444444 0.972973 0.914286 0.391304

male (0, 18] NaN 0.000000 0.260870 0.800000 0.818182 0.178571

(18, 80] 0.0 0.098039 0.125000 0.391304 0.030303 0.192308

https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.09-Pivot-Tables.ipynb#scrollTo=tRSF… 3/10
12/6/22, 10:10 AM 03.09-Pivot-Tables.ipynb - Colaboratory

The result is a four-dimensional aggregation with hierarchical indices (see Hierarchical

Indexing), shown in a grid demonstrating the relationship between the values.

Additional pivot table options

The full call signature of the pivot_table method of DataFrame s is as follows:

# call signature as of Pandas 0.18

DataFrame.pivot_table(data, values=None, index=None, columns=None,
aggfunc='mean', fill_value=None, margins=False,
dropna=True, margins_name='All')

We've already seen examples of the first three arguments; here we'll take a quick look at the
remaining ones. Two of the options, fill_value and dropna , have to do with missing data and
are fairly straightforward; we will not show examples of them here.

The aggfunc keyword controls what type of aggregation is applied, which is a mean by default.
As in the GroupBy, the aggregation specification can be a string representing one of several
common choices (e.g., 'sum' , 'mean' , 'count' , 'min' , 'max' , etc.) or a function that
implements an aggregation (e.g., np.sum() , min() , sum() , etc.). Additionally, it can be specified
as a dictionary mapping a column to any of the above desired options:

titanic.pivot_table(index='sex', columns='class',
aggfunc={'survived':sum, 'fare':'mean'})

fare survived

class First Second Third First Second Third

sex

female 106.125798 21.970121 16.118810 91.0 70.0 72.0

male 67.226127 19.741782 12.661633 45.0 17.0 47.0

Notice also here that we've omitted the values keyword; when specifying a mapping for
aggfunc , this is determined automatically.

At times it's useful to compute totals along each grouping. This can be done via the margins
keyword:

titanic.pivot_table('survived', index='sex', columns='class', margins=True)

https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.09-Pivot-Tables.ipynb#scrollTo=tRSF… 4/10
12/6/22, 10:10 AM 03.09-Pivot-Tables.ipynb - Colaboratory

class First Second Third All

sex

female 0.968085 0.921053 0.500000 0.742038

male 0.368852 0.157407 0.135447 0.188908

Here thisAllautomatically
0.629630 gives us information
0.472826 0.242363 about the class-agnostic survival rate by gender,
0.383838
the gender-agnostic survival rate by class, and the overall survival rate of 38%. The margin label
can be specified with the margins_name keyword, which defaults to "All" .

Example: Birthrate Data

As a more interesting example, let's take a look at the freely available data on births in the United
States, provided by the Centers for Disease Control (CDC). This data can be found at
https://raw.githubusercontent.com/jakevdp/data-CDCbirths/master/births.csv (this dataset has
been analyzed rather extensively by Andrew Gelman and his group; see, for example, this blog
post):

# shell command to download the data:

# !curl -O https://raw.githubusercontent.com/jakevdp/data-CDCbirths/master/births.csv

births = pd.read_csv('data/births.csv')

Taking a look at the data, we see that it's relatively simple–it contains the number of births
grouped by date and gender:

births.head()

year month day gender births

0 1969 1 1 F 4046

1 1969 1 1 M 4440

2 1969 1 2 F 4454

3 1969 1 2 M 4548

4 1969 1 3 F 4548

We can start to understand this data a bit more by using a pivot table. Let's add a decade
column, and take a look at male and female births as a function of decade:

births['decade'] = 10 * (births['year'] // 10)

births.pivot_table('births', index='decade', columns='gender', aggfunc='sum')

https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.09-Pivot-Tables.ipynb#scrollTo=tRSF… 5/10
12/6/22, 10:10 AM 03.09-Pivot-Tables.ipynb - Colaboratory

gender F M

decade

1960 1753634 1846572

1970 16263075 17121550

1980 18310351 19243452

1990 19479454 20420553

2000 18229309 19106428

We immediately see that male births outnumber female births in every decade. To see this trend
a bit more clearly, we can use the built-in plotting tools in Pandas to visualize the total number of
births by year (see Introduction to Matplotlib for a discussion of plotting with Matplotlib):

%matplotlib inline
import matplotlib.pyplot as plt
sns.set() # use Seaborn styles
births.pivot_table('births', index='year', columns='gender', aggfunc='sum').plot()
plt.ylabel('total births per year');

With a simple pivot table and plot() method, we can immediately see the annual trend in births
by gender. By eye, it appears that over the past 50 years male births have outnumbered female
births by around 5%.

Further data exploration

https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.09-Pivot-Tables.ipynb#scrollTo=tRSF… 6/10
12/6/22, 10:10 AM 03.09-Pivot-Tables.ipynb - Colaboratory

Though this doesn't necessarily relate to the pivot table, there are a few more interesting
features we can pull out of this dataset using the Pandas tools covered up to this point. We
must start by cleaning the data a bit, removing outliers caused by mistyped dates (e.g., June
31st) or missing values (e.g., June 99th). One easy way to remove these all at once is to cut
outliers; we'll do this via a robust sigma-clipping operation:

quartiles = np.percentile(births['births'], [25, 50, 75])

mu = quartiles[1]
sig = 0.74 * (quartiles[2] - quartiles[0])

This final line is a robust estimate of the sample mean, where the 0.74 comes from the
interquartile range of a Gaussian distribution (You can learn more about sigma-clipping
operations in a book I coauthored with Željko Ivezić, Andrew J. Connolly, and Alexander Gray:
"Statistics, Data Mining, and Machine Learning in Astronomy").

With this we can use the query() method (discussed further in High-Performance Pandas:
eval() and query() ) to filter-out rows with births outside these values:

births = births.query('(births > @mu - 5 * @sig) & (births < @mu + 5 * @sig)')

Next we set the day column to integers; previously it had been a string because some columns
in the dataset contained the value 'null' :

# set 'day' column to integer; it originally was a string due to nulls

births['day'] = births['day'].astype(int)

Finally, we can combine the day, month, and year to create a Date index (see Working with Time
Series). This allows us to quickly compute the weekday corresponding to each row:

# create a datetime index from the year, month, day

births.index = pd.to_datetime(10000 * births.year +
100 * births.month +
births.day, format='%Y%m%d')

births['dayofweek'] = births.index.dayofweek

Using this we can plot births by weekday for several decades:

import matplotlib.pyplot as plt

import matplotlib as mpl

births.pivot_table('births', index='dayofweek',
columns='decade', aggfunc='mean').plot()

https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.09-Pivot-Tables.ipynb#scrollTo=tRSF… 7/10
12/6/22, 10:10 AM 03.09-Pivot-Tables.ipynb - Colaboratory

plt.gca().set_xticklabels(['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun'])

plt.ylabel('mean births by day');

Apparently births are slightly less common on weekends than on weekdays! Note that the 1990s
and 2000s are missing because the CDC data contains only the month of birth starting in 1989.

Another intersting view is to plot the mean number of births by the day of the year. Let's first
group the data by month and day separately:

births_by_date = births.pivot_table('births',
[births.index.month, births.index.day])
births_by_date.head()

1 1 4009.225
2 4247.400
3 4500.900
4 4571.350
5 4603.625
Name: births, dtype: float64

The result is a multi-index over months and days. To make this easily plottable, let's turn these
months and days into a date by associating them with a dummy year variable (making sure to
choose a leap year so February 29th is correctly handled!)

births_by_date.index = [pd.datetime(2012, month, day)

for (month, day) in births_by_date.index]
births_by_date.head()

2012-01-01 4009.225
2012-01-02 4247.400
2012-01-03 4500.900
2012-01-04 4571.350

https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.09-Pivot-Tables.ipynb#scrollTo=tRSF… 8/10
12/6/22, 10:10 AM 03.09-Pivot-Tables.ipynb - Colaboratory

2012-01-05 4603.625
Name: births, dtype: float64

Focusing on the month and day only, we now have a time series reflecting the average number
of births by date of the year. From this, we can use the plot method to plot the data. It reveals
some interesting trends:

# Plot the results

fig, ax = plt.subplots(figsize=(12, 4))
births_by_date.plot(ax=ax);

In particular, the striking feature of this graph is the dip in birthrate on US holidays (e.g.,
Independence Day, Labor Day, Thanksgiving, Christmas, New Year's Day) although this likely
reflects trends in scheduled/induced births rather than some deep psychosomatic effect on
natural births. For more discussion on this trend, see the analysis and links in Andrew Gelman's
blog post on the subject. We'll return to this figure in Example:-Effect-of-Holidays-on-US-Births,
where we will use Matplotlib's tools to annotate this plot.

Looking at this short example, you can see that many of the Python and Pandas tools we've
seen to this point can be combined and used to gain insight from a variety of datasets. We will
see some more sophisticated applications of these data manipulations in future sections!

< Aggregation and Grouping | Contents | Vectorized String Operations >

Open in Colab

https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.09-Pivot-Tables.ipynb#scrollTo=tRSF… 9/10

Pivot Tables - Python Data Science Handbook
No ratings yet
Pivot Tables - Python Data Science Handbook
10 pages
Pivot Tables
No ratings yet
Pivot Tables
11 pages
Pandas - Data Manipulation and Analysis Library - Educative
No ratings yet
Pandas - Data Manipulation and Analysis Library - Educative
7 pages
Lecture 14
No ratings yet
Lecture 14
33 pages
Ahamed 123
100% (1)
Ahamed 123
7 pages
EDA Module 3-1
No ratings yet
EDA Module 3-1
40 pages
Pandas Dataframe Assignment No 3 - Answerkey
No ratings yet
Pandas Dataframe Assignment No 3 - Answerkey
10 pages
Class 12 DataFrame Operations
No ratings yet
Class 12 DataFrame Operations
21 pages
Titanic
No ratings yet
Titanic
22 pages
Pandas 12 Pivot Table and Drop - A
No ratings yet
Pandas 12 Pivot Table and Drop - A
18 pages
Python Data Cleaning
100% (1)
Python Data Cleaning
20 pages
Week 3 Laboratory Activity
No ratings yet
Week 3 Laboratory Activity
7 pages
Pandas Plots
No ratings yet
Pandas Plots
14 pages
???? ???????????? ???? ??????
No ratings yet
???? ???????????? ???? ??????
63 pages
DVA Practical
No ratings yet
DVA Practical
19 pages
Python Pandas - II
No ratings yet
Python Pandas - II
28 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Pandas Pivot Tables Guide
No ratings yet
Pandas Pivot Tables Guide
14 pages
IP Assgn 3
No ratings yet
IP Assgn 3
12 pages
Pivot Tables and Grouping in Pandas
50% (2)
Pivot Tables and Grouping in Pandas
20 pages
From Arrays From Tuples From Product From Levels and Codes
No ratings yet
From Arrays From Tuples From Product From Levels and Codes
22 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
Unit 5 Analysis With Pandas in Python
No ratings yet
Unit 5 Analysis With Pandas in Python
26 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
6 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Learn Pandas
No ratings yet
Learn Pandas
37 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Unit 5 2
No ratings yet
Unit 5 2
6 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Pandas and Numpy
No ratings yet
Pandas and Numpy
9 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
Pandas CheatSheet 1
No ratings yet
Pandas CheatSheet 1
15 pages
CS1010S Lecture 11 - Visualising Data
No ratings yet
CS1010S Lecture 11 - Visualising Data
68 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
Pandas Data Wrangling Cheat Sheet
100% (2)
Pandas Data Wrangling Cheat Sheet
6 pages
MLT Lab Prep Guide
No ratings yet
MLT Lab Prep Guide
37 pages
Untitled26 1
No ratings yet
Untitled26 1
15 pages
Data Aggregation and Group Operations
No ratings yet
Data Aggregation and Group Operations
34 pages
1-Pandas Cheat Sheet
No ratings yet
1-Pandas Cheat Sheet
7 pages
Data Mining Lab Manaul
No ratings yet
Data Mining Lab Manaul
32 pages
The Titanic Dataset
No ratings yet
The Titanic Dataset
6 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Program EDA
No ratings yet
Program EDA
6 pages
SQL To Pandas - Group Aggregations
No ratings yet
SQL To Pandas - Group Aggregations
6 pages
Pandas
No ratings yet
Pandas
26 pages
Brush: History
No ratings yet
Brush: History
4 pages
Independence and Departure Exercises For Separation Anxiety
No ratings yet
Independence and Departure Exercises For Separation Anxiety
6 pages
Hiral Rathod1
No ratings yet
Hiral Rathod1
3 pages
Before Time Certificate Diploma Form 1
75% (4)
Before Time Certificate Diploma Form 1
2 pages
Faculty Profile Sample
No ratings yet
Faculty Profile Sample
7 pages
Updated Price List
No ratings yet
Updated Price List
2 pages
LIFT
No ratings yet
LIFT
14 pages
D&T OctoberNovember 2007
No ratings yet
D&T OctoberNovember 2007
19 pages
DMT Supported Entities and Advanced Template Packages
No ratings yet
DMT Supported Entities and Advanced Template Packages
18 pages
Mill Lines by FlSmidth
100% (1)
Mill Lines by FlSmidth
4 pages
TRF 9 10
No ratings yet
TRF 9 10
8 pages
Beginner's Guide to Caring for Geckos
No ratings yet
Beginner's Guide to Caring for Geckos
1 page
Thin Films & Nanotech Conference
No ratings yet
Thin Films & Nanotech Conference
4 pages
Harnessing The Tourism Potential of Bengal
No ratings yet
Harnessing The Tourism Potential of Bengal
56 pages
Colonel Bogey March: Kenneth J Alford (Frederick Joseph Ricketts) Arr. by M A Seaton
No ratings yet
Colonel Bogey March: Kenneth J Alford (Frederick Joseph Ricketts) Arr. by M A Seaton
18 pages
Integrated Reporting
No ratings yet
Integrated Reporting
16 pages
Lecture 8-Mathematical Expectation
No ratings yet
Lecture 8-Mathematical Expectation
11 pages
Caubet The International Style
100% (1)
Caubet The International Style
12 pages
Republic of The Philippines Tamag, Vigan City, Ilocos Sur 2700
No ratings yet
Republic of The Philippines Tamag, Vigan City, Ilocos Sur 2700
25 pages
What Is A Compilation
100% (1)
What Is A Compilation
8 pages
Wireless & Web Security Tools Guide
No ratings yet
Wireless & Web Security Tools Guide
10 pages
Unit of A Long Term Plan
No ratings yet
Unit of A Long Term Plan
3 pages
Galaxy Roblox Icons - Google Search
No ratings yet
Galaxy Roblox Icons - Google Search
1 page
Court
No ratings yet
Court
8 pages
FGHHH
No ratings yet
FGHHH
12 pages
Cyber Crime and Punishments
No ratings yet
Cyber Crime and Punishments
19 pages
Economic Geography:: Buy Now Buy Now Buy Now
No ratings yet
Economic Geography:: Buy Now Buy Now Buy Now
8 pages
The Mizoram Gazette: Published by Authority
No ratings yet
The Mizoram Gazette: Published by Authority
4 pages
Arai (2011) - Fish Karyotypes
No ratings yet
Arai (2011) - Fish Karyotypes
345 pages
Mathematics 9 - W4 - Module-6 For Printing PDF
50% (4)
Mathematics 9 - W4 - Module-6 For Printing PDF
25 pages

Pivot Tables

Uploaded by

Pivot Tables

Uploaded by

12/6/22, 10:10 AM 03.09-Pivot-Tables.

Motivating Pivot Tables

0 0 3 male 22.0 1 0 7.2500 S Third man

1 1 1 female 38.0 1 0 71.2833 C First woman

2 1 3 female 26.0 0 0 7.9250 S Third woman

3 1 1 female 35.0 1 0 53.1000 S First woman

4 0 3 male 35.0 0 0 8.0500 S Third man

Pivot Tables by Hand

class First Second Third

female 0.968085 0.921053 0.500000

male 0.368852 0.157407 0.135447

Pivot Table Syntax

titanic.pivot_table('survived', index='sex', columns='class')

class First Second Third

female 0.968085 0.921053 0.500000

Multi-level pivot tables

age = pd.cut(titanic['age'], [0, 18, 80])

class First Second Third

female (0, 18] 0.909091 1.000000 0.511628

(18, 80] 0.972973 0.900000 0.423729

male (0, 18] 0.800000 0.600000 0.215686

(18, 80] 0.375000 0.071429 0.133663

fare [0, 14.454] (14.454, 512.329]

class First Second Third First Second Third

female (0, 18] NaN 1.000000 0.714286 0.909091 1.000000 0.318182

(18, 80] NaN 0.880000 0.444444 0.972973 0.914286 0.391304

male (0, 18] NaN 0.000000 0.260870 0.800000 0.818182 0.178571

(18, 80] 0.0 0.098039 0.125000 0.391304 0.030303 0.192308

The result is a four-dimensional aggregation with hierarchical indices (see Hierarchical

Additional pivot table options

# call signature as of Pandas 0.18

class First Second Third First Second Third

female 106.125798 21.970121 16.118810 91.0 70.0 72.0

male 67.226127 19.741782 12.661633 45.0 17.0 47.0

titanic.pivot_table('survived', index='sex', columns='class', margins=True)

class First Second Third All

female 0.968085 0.921053 0.500000 0.742038

male 0.368852 0.157407 0.135447 0.188908

Example: Birthrate Data

# shell command to download the data:

year month day gender births

births['decade'] = 10 * (births['year'] // 10)

1960 1753634 1846572

1970 16263075 17121550

1980 18310351 19243452

1990 19479454 20420553

2000 18229309 19106428

Further data exploration

quartiles = np.percentile(births['births'], [25, 50, 75])

# set 'day' column to integer; it originally was a string due to nulls

# create a datetime index from the year, month, day

Using this we can plot births by weekday for several decades:

import matplotlib.pyplot as plt

plt.gca().set_xticklabels(['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun'])

births_by_date.index = [pd.datetime(2012, month, day)

# Plot the results

< Aggregation and Grouping | Contents | Vectorized String Operations >

You might also like