0% found this document useful (0 votes)
50 views60 pages

Unit5-Pandas 100 Ques-Ans

Uploaded by

urvashiishhrii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views60 pages

Unit5-Pandas 100 Ques-Ans

Uploaded by

urvashiishhrii
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 60

Exercise 1:

Create a DataFrame from a dictionary of lists.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}

df = pd.DataFrame(data)

print(df)

Copy
Output:

X Y
0 1 5
1 2 6
2 3 7
3 4 8
Exercise 2:

Select the first 3 rows of a DataFrame.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}

df = pd.DataFrame(data)

print(df.head(3))

Copy
Output:

X Y
0 1 5
1 2 6
2 3 7
Exercise 3:

Select the 'X' column from a DataFrame.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}

df = pd.DataFrame(data)

print(df['X'])

Copy
Output:

0 1
1 2
2 3
3 4
Name: X, dtype: int64
Exercise 4:

Filter rows based on a column condition.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}

df = pd.DataFrame(data)

filtered_df = df[df['X'] > 2]

print(filtered_df)

Copy
Output:

X Y
2 3 7
3 4 8
Exercise 5:

Add a new column to an existing DataFrame.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}

df = pd.DataFrame(data)

df['Z'] = df['X'] + df['Y']

print(df)

Copy
Output:

X Y Z
0 1 5 6
1 2 6 8
2 3 7 10
3 4 8 12
Exercise 6:

Remove a column from a DataFrame.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8], 'Z': [9, 10, 11, 12]}

df = pd.DataFrame(data)

df.drop(columns=['Z'], inplace=True)

print(df)

Copy
Output:
X Y
0 1 5
1 2 6
2 3 7
3 4 8
Exercise 7:

Sort a DataFrame by a column.

Solution:

import pandas as pd

data = {'X': [4, 3, 2, 1], 'Y': [8, 7, 6, 5]}

df = pd.DataFrame(data)

df.sort_values(by='X', inplace=True)

print(df)

Copy
Output:

X Y
3 1 5
2 2 6
1 3 7
0 4 8
Exercise 8:

Group a DataFrame by a column and calculate the mean of each group.

Solution:

import pandas as pd

data = {'X': [1, 2, 1, 2], 'Y': [5, 6, 7, 8]}

df = pd.DataFrame(data)

grouped_df = df.groupby('X').mean()
print(grouped_df)

Copy
Output:

Y
X
1 6.0
2 7.0
Exercise 9:

Replace missing values in a DataFrame.

Solution:

import pandas as pd

data = {'X': [1, 2, None, 4], 'Y': [5, None, 7, 8]}

df = pd.DataFrame(data)

df.fillna(0, inplace=True)

print(df)

Copy
Output:

X Y
0 1.0 5.0
1 2.0 0.0
2 0.0 7.0
3 4.0 8.0
Exercise 10:

Convert a column to datetime.

Solution:

import pandas as pd

data = {'X': ['2020-01-01', '2020-01-02', '2020-01-03']}


df = pd.DataFrame(data)

df['X'] = pd.to_datetime(df['X'])

print(df)

Copy
Output:

X
0 2020-01-01
1 2020-01-02
2 2020-01-03
Exercise 11:

Create a DataFrame with specific column names.

Solution:

import pandas as pd

data = {'col1': [1, 2, 3], 'col2': [4, 5, 6]}

df = pd.DataFrame(data)

print(df)

Copy
Output:

col1 col2
0 1 4
1 2 5
2 3 6
Exercise 12:

Calculate the sum of values in each column.

Solution:

import pandas as pd

data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}


df = pd.DataFrame(data)

print(df.sum())

Copy
Output:

X 6
Y 15
dtype: int64
Exercise 13:

Calculate the mean of values in each row.

Solution:

import pandas as pd

data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}

df = pd.DataFrame(data)

print(df.mean(axis=1))

Copy
Output:

0 2.5
1 3.5
2 4.5
dtype: float64
Exercise 14:

Concatenate two DataFrames.

Solution:

import pandas as pd

data1 = {'X': [1, 2, 3]}

data2 = {'Y': [4, 5, 6]}


df1 = pd.DataFrame(data1)

df2 = pd.DataFrame(data2)

concatenated_df = pd.concat([df1, df2], axis=1)

print(concatenated_df)

Copy
Output:

X Y
0 1 4
1 2 5
2 3 6
Exercise 15:

Merge two DataFrames on a key.

Solution:

import pandas as pd

data1 = {'key': ['X', 'Y', 'Z'], 'value1': [1, 2, 3]}

data2 = {'key': ['X', 'Y', 'D'], 'value2': [4, 5, 6]}

df1 = pd.DataFrame(data1)

df2 = pd.DataFrame(data2)

merged_df = pd.merge(df1, df2, on='key')

print(merged_df)

Copy
Output:

key value1 value2


0 X 1 4
1 Y 2 5
Exercise 16:
Create a pivot table from a DataFrame.

Solution:

import pandas as pd

data = {'X': ['foo', 'foo', 'bar', 'bar'], 'Y': ['one', 'two', 'one', 'two'], 'Z': [1, 2, 3, 4]}

df = pd.DataFrame(data)

pivot_table = df.pivot_table(values='Z', index='X', columns='Y')

print(pivot_table)

Copy
Output:

Y one two
X
bar 3.0 4.0
foo 1.0 2.0
Exercise 17:

Reshape a DataFrame from long to wide format.

Solution:

import pandas as pd

data = {'X': ['foo', 'foo', 'bar', 'bar'], 'Y': ['one', 'two', 'one', 'two'], 'Z': [1, 2, 3, 4]}

df = pd.DataFrame(data)

wide_df = df.pivot(index='X', columns='Y', values='Z')

print(wide_df)

Copy
Output:

Y one two
X
bar 3 4
foo 1 2
Exercise 18:

Calculate the correlation between columns in a DataFrame.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4], 'Y': [4, 3, 2, 1]}

df = pd.DataFrame(data)

correlation = df.corr()

print(correlation)

Copy
Output:

X Y
X 1.0 -1.0
Y -1.0 1.0
Exercise 19:

Iterate over rows in a DataFrame using iterrows().

Solution:

import pandas as pd

data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}

df = pd.DataFrame(data)

for index, row in df.iterrows():

print(index, row['X'], row['Y'])

Copy
Output:

014
125
236
Exercise 20:

Apply a function to each element in a DataFrame.

Solution:

import pandas as pd # Import the Pandas library

# Create a sample DataFrame

data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}

df = pd.DataFrame(data)

# Apply a function to each element using the map method

df = df.apply(lambda col: col.map(lambda x: x * 2))

print(df)

Copy
Output:

X Y
0 2 8
1 4 10
2 6 12
Exercise 21:

Create a DataFrame from a list of dictionaries.

Solution:

import pandas as pd

data = [{'X': 1, 'Y': 2}, {'X': 3, 'Y': 4}]

df = pd.DataFrame(data)

print(df)
Copy
Output:

X Y
0 1 2
1 3 4
Exercise 22:

Rename columns in a DataFrame.

Solution:

import pandas as pd

data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}

df = pd.DataFrame(data)

df.rename(columns={'X': 'X', 'Y': 'Y'}, inplace=True)

print(df)

Copy
Output:

X Y
0 1 4
1 2 5
2 3 6
Exercise 23:

Filter rows by multiple conditions.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}

df = pd.DataFrame(data)

filtered_df = df[(df['X'] > 2) & (df['Y'] < 7)]


print(filtered_df)

Copy
Output:

X Y
2 3 6
Exercise 24:

Calculate the cumulative sum of a column.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4]}

df = pd.DataFrame(data)

df['Cumulative_Sum'] = df['X'].cumsum()

print(df)

Copy
Output:

X Cumulative_Sum
0 1 1
1 2 3
2 3 6
3 4 10
Exercise 25:

Drop rows with missing values.

Solution:

import pandas as pd

data = {'X': [1, 2, None, 4], 'Y': [4, 5, 6, None]}

df = pd.DataFrame(data)
df.dropna(inplace=True)

print(df)

Copy
Output:

X Y
0 1.0 4.0
1 2.0 5.0
Exercise 26:

Replace values in a DataFrame based on a condition.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}

df = pd.DataFrame(data)

df.loc[df['X'] > 2, 'Y'] = 0

print(df)

Copy
Output:

X Y
0 1 5
1 2 6
2 3 0
3 4 0
Exercise 27:

Create a DataFrame with a MultiIndex.

Solution:

import pandas as pd

arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]


index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))

data = {'Value': [10, 20, 30, 40]}

df = pd.DataFrame(data, index=index)

print(df)

Copy
Output:

Value
Group Number
X 1 10
2 20
Y 1 30
2 40
Exercise 28:

Calculate the rolling mean of a column.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4, 5, 6]}

df = pd.DataFrame(data)

df['Rolling_Mean'] = df['X'].rolling(window=3).mean()

print(df)

Copy
Output:

X Rolling_Mean
0 1 NaN
1 2 NaN
2 3 2.0
3 4 3.0
4 5 4.0
5 6 5.0
Exercise 29:

Create a DataFrame from a list of tuples.

Solution:

import pandas as pd

data = [(1, 2), (3, 4), (5, 6)]

df = pd.DataFrame(data, columns=['X', 'Y'])

print(df)

Copy
Output:

X Y
0 1 2
1 3 4
2 5 6
Exercise 30:

Add a row to a DataFrame.

Solution:

import pandas as pd # Import the Pandas library

# Create a sample DataFrame

data = {'X': [1, 2], 'Y': [3, 4]}

df = pd.DataFrame(data)

# Create a new row as a DataFrame

new_row = pd.DataFrame({'X': [5], 'Y': [6]})

# Concatenate the new row to the DataFrame


df = pd.concat([df, new_row], ignore_index=True)

print(df)

Copy
Output:

X Y
0 1 3
1 2 4
2 5 6
Exercise 31:

Create a DataFrame with random values.

Solution:

import pandas as pd

import numpy as np

data = np.random.rand(4, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

print(df)

Copy
Output:

X Y Z
0 0.688292 0.950264 0.665916
1 0.497719 0.840536 0.923938
2 0.285218 0.091178 0.722034
3 0.037824 0.248689 0.584696
Exercise 32:

Calculate the rank of values in a DataFrame.

Solution:

import pandas as pd
data = {'X': [3, 1, 4, 1], 'Y': [2, 3, 1, 4]}

df = pd.DataFrame(data)

df['Rank'] = df['X'].rank()

print(df)

Copy
Output:

X Y Rank
0 3 2 3.0
1 1 3 1.5
2 4 1 4.0
3 1 4 1.5
Exercise 33:

Change the data type of a column.

Solution:

import pandas as pd

data = {'X': ['1', '2', '3']}

df = pd.DataFrame(data)

df['X'] = df['X'].astype(int)

print(df)

Copy
Output:

X
0 1
1 2
2 3
Exercise 34:

Filter rows based on string matching.


Solution:

import pandas as pd

data = {'X': ['foo', 'bar', 'baz', 'qux']}

df = pd.DataFrame(data)

filtered_df = df[df['X'].str.contains('ba')]

print(filtered_df)

Copy
Output:

X
1 bar
2 baz
Exercise 35:

Create a DataFrame with specified row and column labels.

Solution:

import pandas as pd

data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

df = pd.DataFrame(data, index=['row1', 'row2', 'row3'], columns=['col1', 'col2',


'col3'])

print(df)

Copy
Output:

col1 col2 col3


row1 1 2 3
row2 4 5 6
row3 7 8 9
Exercise 36:

Transpose a DataFrame.
Solution:

import pandas as pd

data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}

df = pd.DataFrame(data)

transposed_df = df.T

print(transposed_df)

Copy
Output:

0 1 2
X 1 2 3
Y 4 5 6
Exercise 37:

Set a column as the index of a DataFrame.

Solution:

import pandas as pd

data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}

df = pd.DataFrame(data)

df.set_index('X', inplace=True)

print(df)

Copy
Output:

Y
X
1 4
2 5
3 6
Exercise 38:
Reset the index of a DataFrame.

Solution:

import pandas as pd

data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}

df = pd.DataFrame(data)

df.set_index('X', inplace=True)

df.reset_index(inplace=True)

print(df)

Copy
Output:

X Y
0 1 4
1 2 5
2 3 6
Exercise 39:

Add a prefix or suffix to column names.

Solution:

import pandas as pd

data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}

df = pd.DataFrame(data)

df = df.add_prefix('col_')

print(df)

Copy
Output:

col_X col_Y
0 1 4
1 2 5
2 3 6
Exercise 40:

Filter rows based on datetime index.

Solution:

import pandas as pd

date_range = pd.date_range(start='1/1/2020', periods=5, freq='D')

data = {'X': [1, 2, 3, 4, 5]}

df = pd.DataFrame(data, index=date_range)

filtered_df = df['2020-01-03':'2020-01-05']

print(filtered_df)

Copy
Output:

X
2020-01-03 3
2020-01-04 4
2020-01-05 5
Exercise 41:

Create a DataFrame with duplicate rows and remove duplicates.

Solution:

import pandas as pd

data = {'X': [1, 2, 2, 3], 'Y': [4, 5, 5, 6]}

df = pd.DataFrame(data)

df.drop_duplicates(inplace=True)

print(df)
Copy
Output:

X Y
0 1 4
1 2 5
3 3 6
Exercise 42:

Create a DataFrame with hierarchical index.

Solution:

import pandas as pd

arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]

index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))

data = {'Value': [10, 20, 30, 40]}

df = pd.DataFrame(data, index=index)

print(df)

Copy
Output:

Value
Group Number
X 1 10
2 20
Y 1 30
2 40
Exercise 43:

Calculate the difference between consecutive rows in a DataFrame.

Solution:

import pandas as pd
data = {'X': [1, 3, 6, 10]}

df = pd.DataFrame(data)

df['Difference'] = df['X'].diff()

print(df)

Copy
Output:

X Difference
0 1 NaN
1 3 2.0
2 6 3.0
3 10 4.0
Exercise 44:

Create a DataFrame with hierarchical columns.

Solution:

import pandas as pd

arrays = [['X', 'X', 'Y', 'Y'], ['C1', 'C2', 'C1', 'C2']]

columns = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Type'))

data = [[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]

df = pd.DataFrame(data, columns=columns)

print(df)

Copy
Output:

Group X Y
Type C1 C2 C1 C2
0 1 2 3 4
1 5 6 7 8
2 9 10 11 12
Exercise 45:
Filter rows based on the length of strings in a column.

Solution:

import pandas as pd

data = {'X': ['foo', 'bar', 'baz', 'qux']}

df = pd.DataFrame(data)

filtered_df = df[df['X'].str.len() > 3]

print(filtered_df)

Copy
Output:

Empty DataFrame
Columns: [X]
Index: []
Exercise 46:

Calculate the percentage change between rows in a DataFrame.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4]}

df = pd.DataFrame(data)

df['Pct_Change'] = df['X'].pct_change()

print(df)

Copy
Output:

X Pct_Change
0 1 NaN
1 2 1.000000
2 3 0.500000
3 4 0.333333
Exercise 47:

Create a DataFrame from a dictionary of Series.

Solution:

import pandas as pd

data = {'X': pd.Series([1, 2, 3]), 'Y': pd.Series([4, 5, 6])}

df = pd.DataFrame(data)

print(df)

Copy
Output:

X Y
0 1 4
1 2 5
2 3 6
Exercise 48:

Filter rows based on whether a column value is in a list.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4], 'Y': [5, 6, 7, 8]}

df = pd.DataFrame(data)

filtered_df = df[df['X'].isin([2, 3])]

print(filtered_df)

Copy
Output:

X Y
1 2 6
2 3 7
Exercise 49:

Calculate the z-score of values in a DataFrame.

Solution:

import pandas as pd

import numpy as np

data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}

df = pd.DataFrame(data)

df['zscore_A'] = (df['X'] - np.mean(df['X'])) / np.std(df['X'])

print(df)

Copy
Output:

X Y zscore_A
0 1 4 -1.341641
1 2 5 -0.447214
2 3 6 0.447214
3 4 7 1.341641
Exercise 50:

Create a DataFrame with random integers and calculate descriptive statistics.

Solution:

import pandas as pd

import numpy as np

data = np.random.randint(1, 100, size=(5, 3))

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

print(df.describe())
Copy
Output:

X Y Z
count 5.000000 5.000000 5.000000
mean 60.600000 71.800000 42.600000
std 38.435661 13.971399 12.218838
min 5.000000 53.000000 28.000000
25% 40.000000 64.000000 34.000000
50% 69.000000 72.000000 41.000000
75% 91.000000 82.000000 55.000000
max 98.000000 88.000000 55.000000
Exercise 51:

Calculate the rank of values in each column of a DataFrame.

Solution:

import pandas as pd

data = {'X': [3, 1, 4, 1], 'Y': [2, 3, 1, 4]}

df = pd.DataFrame(data)

df['Rank_A'] = df['X'].rank()

df['Rank_B'] = df['Y'].rank()

print(df)

Copy
Output:

X Y Rank_A Rank_B
0 3 2 3.0 2.0
1 1 3 1.5 3.0
2 4 1 4.0 1.0
3 1 4 1.5 4.0
Exercise 52:

Filter rows based on multiple string conditions.


Solution:

import pandas as pd

data = {'X': ['foo', 'bar', 'baz', 'qux']}

df = pd.DataFrame(data)

filtered_df = df[df['X'].str.contains('ba|qu')]

print(filtered_df)

Copy
Output:

X
1 bar
2 baz
3 qux
Exercise 53:

Create a DataFrame with random values and calculate the skewness.

Solution:

import pandas as pd

data = {'X': ['foo', 'bar', 'baz', 'qux']}

df = pd.DataFrame(data)

filtered_df = df[df['X'].str.contains('ba|qu')]

print(filtered_df)

Copy
Output:

X
1 bar
2 baz
3 qux
Exercise 54:
Create a DataFrame and calculate the kurtosis.

Solution:

import pandas as pd

import numpy as np

data = np.random.rand(4, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

print(df.kurt())

Copy
Output:

X 2.958407
Y -2.639654
Z 2.704430
dtype: float64
Exercise 55:

Calculate the cumulative product of a column in a DataFrame.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4]}

df = pd.DataFrame(data)

df['Cumulative_Product'] = df['X'].cumprod()

print(df)

Copy
Output:

X Cumulative_Product
0 1 1
1 2 2
2 3 6
3 4 24
Exercise 56:

Create a DataFrame and calculate the rolling standard deviation.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4, 5, 6]}

df = pd.DataFrame(data)

df['Rolling_Std'] = df['X'].rolling(window=3).std()

print(df)

Copy
Output:

X Rolling_Std
0 1 NaN
1 2 NaN
2 3 1.0
3 4 1.0
4 5 1.0
5 6 1.0
Exercise 57:

Create a DataFrame and calculate the expanding mean.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4, 5, 6]}

df = pd.DataFrame(data)

df['Expanding_Mean'] = df['X'].expanding().mean()

print(df)
Copy
Output:

X Expanding_Mean
0 1 1.0
1 2 1.5
2 3 2.0
3 4 2.5
4 5 3.0
5 6 3.5
Exercise 58:

Create a DataFrame with random values and calculate the covariance matrix.

Solution:

import pandas as pd

import numpy as np

data = np.random.rand(4, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

print(df.cov())

Copy
Output:

X Y Z
X 0.054079 0.007398 -0.031403
Y 0.007398 0.053211 -0.020480
Z -0.031403 -0.020480 0.048057
Exercise 59:

Create a DataFrame with random values and calculate the correlation matrix.

Solution:

import pandas as pd

import numpy as np
data = np.random.rand(4, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

print(df.corr())

Copy
Output:

X Y Z
X 1.000000 -0.258187 0.541044
Y -0.258187 1.000000 -0.432419
Z 0.541044 -0.432419 1.000000
Exercise 60:

Create a DataFrame and calculate the rolling correlation between two columns.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4, 5, 6], 'Y': [6, 5, 4, 3, 2, 1]}

df = pd.DataFrame(data)

df['Rolling_Corr'] = df['X'].rolling(window=3).corr(df['Y'])

print(df)

Copy
Output:

X Y Rolling_Corr
0 1 6 NaN
1 2 5 NaN
2 3 4 -1.0
3 4 3 -1.0
4 5 2 -1.0
5 6 1 -1.0
Exercise 61:

Create a DataFrame and calculate the expanding variance.


Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4, 5, 6]}

df = pd.DataFrame(data)

df['Expanding_Var'] = df['X'].expanding().var()

print(df)

Copy
Output:

X Expanding_Var
0 1 NaN
1 2 0.500000
2 3 1.000000
3 4 1.666667
4 5 2.500000
5 6 3.500000
Exercise 62:

Create a DataFrame with datetime index and resample by month.

Solution:

import pandas as pd

date_range = pd.date_range(start='1/1/2020', periods=100, freq='D')

data = {'X': range(100)}

df = pd.DataFrame(data, index=date_range)

monthly_df = df.resample('M').sum()

print(monthly_df)

Copy
Output:
X
2020-01-31 465
2020-02-29 1305
2020-03-31 2325
2020-04-30 855
Exercise 63:

Create a DataFrame and calculate the exponential moving average.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4, 5, 6]}

df = pd.DataFrame(data)

df['EMA'] = df['X'].ewm(span=3, adjust=False).mean()

print(df)

Copy
Output:

X EMA
0 1 1.00000
1 2 1.50000
2 3 2.25000
3 4 3.12500
4 5 4.06250
5 6 5.03125
Exercise 64:

Create a DataFrame with random integers and calculate the mode.

Solution:

import pandas as pd

import numpy as np

data = np.random.randint(1, 10, size=(5, 3))


df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

print(df.mode())

Copy
Output:

X Y Z
0 2 1.0 2.0
1 3 3.0 7.0
2 5 NaN NaN
3 6 NaN NaN
4 9 NaN NaN
Exercise 65:

Create a DataFrame and calculate the z-score of each column.

Solution:

import pandas as pd

import numpy as np

data = {'X': [1, 2, 3, 4], 'Y': [4, 5, 6, 7]}

df = pd.DataFrame(data)

df['zscore_A'] = (df['X'] - np.mean(df['X'])) / np.std(df['X'])

df['zscore_B'] = (df['Y'] - np.mean(df['Y'])) / np.std(df['Y'])

print(df)

Copy
Output:

X Y zscore_A zscore_B
0 1 4 -1.341641 -1.341641
1 2 5 -0.447214 -0.447214
2 3 6 0.447214 0.447214
3 4 7 1.341641 1.341641
Exercise 66:
Create a DataFrame with random values and calculate the median.

Solution:

import pandas as pd

import numpy as np

data = np.random.rand(4, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

print(df.median())

Copy
Output:

X 0.787042
Y 0.477837
Z 0.696911
dtype: float64
Exercise 67:

Create a DataFrame and apply a custom function to each column.

Solution:

import pandas as pd

data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}

df = pd.DataFrame(data)

df = df.apply(lambda x: x + 1)

print(df)

Copy
Output:

X Y
0 2 5
1 3 6
2 4 7
Exercise 68:

Create a DataFrame with hierarchical index and calculate the mean for each group.

Solution:

import pandas as pd

arrays = [['X', 'X', 'Y', 'Y'], [1, 2, 1, 2]]

index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Number'))

data = {'Value': [10, 20, 30, 40]}

df = pd.DataFrame(data, index=index)

grouped_df = df.groupby('Group').mean()

print(grouped_df)

Copy
Output:

Value
Group
X 15.0
Y 35.0
Exercise 69:

Create a DataFrame and calculate the percentage of missing values in each column.

Solution:

import pandas as pd

data = {'X': [1, 2, None, 4], 'Y': [4, None, 6, 8]}

df = pd.DataFrame(data)

missing_percentage = df.isnull().mean() * 100

print(missing_percentage)
Copy
Output:

X 25.0
Y 25.0
dtype: float64
Exercise 70:

Create a DataFrame and apply a custom function to each row.

Solution:

import pandas as pd

data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}

df = pd.DataFrame(data)

df['Sum'] = df.apply(lambda row: row['X'] + row['Y'], axis=1)

print(df)

Copy
Output:

X Y Sum
0 1 4 5
1 2 5 7
2 3 6 9
Exercise 71:

Create a DataFrame with random values and calculate the quantiles.

Solution:

import pandas as pd

import numpy as np

data = np.random.rand(4, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])


print(df.quantile([0.25, 0.5, 0.75]))

Copy
Output:

X Y Z
0.25 0.174265 0.184036 0.520573
0.50 0.468040 0.315593 0.644571
0.75 0.767870 0.436426 0.771297
Exercise 72:

Create a DataFrame and calculate the interquartile range (IQR).

Solution:

import pandas as pd

import numpy as np

data = np.random.rand(4, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

Q1 = df.quantile(0.25)

Q3 = df.quantile(0.75)

IQR = Q3 - Q1

print(IQR)

Copy
Output:

X 0.354244
Y 0.329573
Z 0.245520
dtype: float64
Exercise 73:

Create a DataFrame with datetime index and calculate the rolling mean.

Solution:
import pandas as pd

date_range = pd.date_range(start='1/1/2020', periods=10, freq='D')

data = {'X': range(10)}

df = pd.DataFrame(data, index=date_range)

df['Rolling_Mean'] = df['X'].rolling(window=3).mean()

print(df)

Copy
Output:

X Rolling_Mean
2020-01-01 0 NaN
2020-01-02 1 NaN
2020-01-03 2 1.0
2020-01-04 3 2.0
2020-01-05 4 3.0
2020-01-06 5 4.0
2020-01-07 6 5.0
2020-01-08 7 6.0
2020-01-09 8 7.0
2020-01-10 9 8.0
Exercise 74:

Create a DataFrame and calculate the cumulative maximum.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 2, 1]}

df = pd.DataFrame(data)

df['Cumulative_Max'] = df['X'].cummax()

print(df)

Copy
Output:

X Cumulative_Max
0 1 1
1 2 2
2 3 3
3 2 3
4 1 3
Exercise 75:

Create a DataFrame and calculate the cumulative minimum.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 2, 1]}

df = pd.DataFrame(data)

df['Cumulative_Min'] = df['X'].cummin()

print(df)

Copy
Output:

X Cumulative_Min
0 1 1
1 2 1
2 3 1
3 2 1
4 1 1
Exercise 76:

Create a DataFrame with random values and calculate the cumulative variance.

Solution:

import pandas as pd

import numpy as np
data = np.random.rand(10, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

df['Cumulative_Var'] = df['X'].expanding().var()

print(df)

Copy
Output:

X Y Z Cumulative_Var
0 0.315669 0.900791 0.404858 NaN
1 0.462000 0.463257 0.922495 0.010706
2 0.328968 0.200027 0.967625 0.006548
3 0.630370 0.992849 0.231884 0.021460
4 0.574397 0.968600 0.926893 0.020023
5 0.204077 0.889864 0.589022 0.027130
6 0.386806 0.630882 0.242157 0.022759
7 0.319831 0.935747 0.829739 0.020630
8 0.786435 0.377739 0.879458 0.034407
9 0.523467 0.077937 0.764476 0.031194
Exercise 77:

Create a DataFrame and apply a custom function to each element.

Solution:

import pandas as pd

# Create a DataFrame

data = {'X': [1, 2, 3], 'Y': [4, 5, 6]}

df = pd.DataFrame(data)

# Define the custom function

def custom_function(x):

return x * 2

# Apply the function to each element using map on each column


df = df.apply(lambda col: col.map(custom_function))

# Print the DataFrame

print(df)

Copy
Output:

X Y
0 2 8
1 4 10
2 6 12
Exercise 78:

Create a DataFrame with random values and calculate the z-score for each element.

Solution:

import pandas as pd

import numpy as np

data = np.random.rand(4, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

df = df.apply(lambda x: (x - x.mean()) / x.std(), axis=0)

print(df)

Copy
Output:

X Y Z
0 1.027393 0.656858 1.032853
1 0.674079 -1.277904 -0.220065
2 -0.996641 -0.298841 0.475217
3 -0.704831 0.919887 -1.288005
Exercise 79:

Create a DataFrame and calculate the cumulative sum for each group.
Solution:

import pandas as pd

data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}

df = pd.DataFrame(data)

df['Cumulative_Sum'] = df.groupby('X')['Y'].cumsum()

print(df)

Copy
Output:

X Y Cumulative_Sum
0 foo 1 1
1 bar 2 2
2 foo 3 4
3 bar 4 6
Exercise 80:

Create a DataFrame with random values and calculate the rank for each element.

Solution:

import pandas as pd

import numpy as np

data = np.random.rand(4, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

df = df.rank()

print(df)

Copy
Output:

X Y Z
0 4.0 3.0 3.0
1 3.0 2.0 2.0
2 1.0 4.0 1.0
3 2.0 1.0 4.0
Exercise 81:

Create a DataFrame and calculate the cumulative product for each group.

Solution:

import pandas as pd

data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}

df = pd.DataFrame(data)

df['Cumulative_Product'] = df.groupby('X')['Y'].cumprod()

print(df)

Copy
Output:

X Y Cumulative_Product
0 foo 1 1
1 bar 2 2
2 foo 3 3
3 bar 4 8
Exercise 82:

Create a DataFrame with random values and calculate the expanding sum.

Solution:

import pandas as pd

import numpy as np

data = np.random.rand(4, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

df['Expanding_Sum'] = df['X'].expanding().sum()
print(df)

Copy
Output:

X Y Z Expanding_Sum
0 0.815750 0.062819 0.699743 0.815750
1 0.128772 0.843222 0.411903 0.944522
2 0.857516 0.219424 0.234460 1.802038
3 0.011010 0.774375 0.259412 1.813048
Exercise 83:

Create a DataFrame and calculate the expanding minimum for each group.

Solution:

import pandas as pd

data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}

df = pd.DataFrame(data)

df['Expanding_Min'] = df.groupby('X')['Y'].expanding().min().reset_index(level=0,
drop=True)

print(df)

Copy
Output:

X Y Expanding_Min
0 foo 1 1.0
1 bar 2 2.0
2 foo 3 1.0
3 bar 4 2.0
Exercise 84:

Create a DataFrame with random values and calculate the expanding maximum for
each group.

Solution:
import pandas as pd

import numpy as np

data = np.random.rand(4, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

df['Expanding_Max'] = df.groupby('X')
['Y'].expanding().max().reset_index(level=0, drop=True)

print(df)

Copy
Output:

X Y Z Expanding_Max
0 0.751392 0.015856 0.313990 0.015856
1 0.812436 0.701808 0.069307 0.701808
2 0.148614 0.838726 0.290646 0.838726
3 0.764419 0.586510 0.470466 0.586510
Exercise 85:

Create a DataFrame and calculate the expanding variance for each group.

Solution:

import pandas as pd

data = {'X': ['foo', 'bar', 'foo', 'bar'], 'Y': [1, 2, 3, 4]}

df = pd.DataFrame(data)

df['Expanding_Var'] = df.groupby('X')['Y'].expanding().var().reset_index(level=0,
drop=True)

print(df)

Copy
Output:

X Y Expanding_Var
0 foo 1 NaN
1 bar 2 NaN
2 foo 3 2.0
3 bar 4 2.0
Exercise 86:

Create a DataFrame with random values and calculate the expanding standard
deviation.

Solution:

import pandas as pd

import numpy as np

data = np.random.rand(4, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

df['Expanding_Std'] = df['X'].expanding().std()

print(df)

Copy
Output:

X Y Z Expanding_Std
0 0.693184 0.088273 0.109510 NaN
1 0.031186 0.163005 0.803467 0.468103
2 0.294881 0.409395 0.278145 0.333272
3 0.918778 0.854961 0.791329 0.397322

Exercise 87:

Create a DataFrame and calculate the expanding covariance.

Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4], 'Y': [4, 3, 2, 1]}

df = pd.DataFrame(data)
df['Expanding_Cov'] = df['X'].expanding().cov(df['Y'])

print(df)

Copy
Output:

X Y Expanding_Cov
0 1 4 NaN
1 2 3 -0.500000
2 3 2 -1.000000
3 4 1 -1.666667
Exercise 88:

Create a DataFrame with random values and calculate the expanding correlation.

Solution:

import pandas as pd

import numpy as np

data = np.random.rand(4, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

df['Expanding_Corr'] = df['X'].expanding().corr(df['Y'])

print(df)

Copy
Output:

X Y Z Expanding_Corr
0 0.094026 0.320246 0.044218 NaN
1 0.422531 0.002172 0.995907 -1.000000
2 0.265459 0.391239 0.589878 -0.751147
3 0.118812 0.061489 0.837821 -0.372750
Exercise 89:

Create a DataFrame and calculate the expanding median.


Solution:

import pandas as pd

data = {'X': [1, 2, 3, 4, 5, 6]}

df = pd.DataFrame(data)

df['Expanding_Median'] = df['X'].expanding().median()

print(df)

Copy
Output:

X Expanding_Median
0 1 1.0
1 2 1.5
2 3 2.0
3 4 2.5
4 5 3.0
5 6 3.5
Exercise 90:

Create a DataFrame with datetime index and calculate the expanding mean for
each group.

Solution:

import pandas as pd

date_range = pd.date_range(start='1/1/2020', periods=10, freq='D')

data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}

df = pd.DataFrame(data, index=date_range)

df['Expanding_Mean'] = df.groupby('X')
['Y'].expanding().mean().reset_index(level=0, drop=True)

print(df)

Copy
Output:

X Y Expanding_Mean
2020-01-01 foo 0 0.0
2020-01-02 bar 1 1.0
2020-01-03 foo 2 1.0
2020-01-04 bar 3 2.0
2020-01-05 foo 4 2.0
2020-01-06 bar 5 3.0
2020-01-07 foo 6 3.0
2020-01-08 bar 7 4.0
2020-01-09 foo 8 4.0
2020-01-10 bar 9 5.0
Exercise 91:

Create a DataFrame with random values and calculate the rolling sum for each
group.

Solution:

import pandas as pd

import numpy as np

data = np.random.rand(10, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

df['Rolling_Sum'] = df.groupby('X')
['Y'].rolling(window=3).sum().reset_index(level=0, drop=True)

print(df)

Copy
Output:

X Y Z Rolling_Sum
0 0.342706 0.579330 0.902681 NaN
1 0.182432 0.163406 0.156607 NaN
2 0.983085 0.052785 0.588865 NaN
3 0.756982 0.123991 0.704262 NaN
4 0.876875 0.710953 0.923588 NaN
5 0.359818 0.135520 0.277327 NaN
6 0.693156 0.590918 0.985834 NaN
7 0.892253 0.633529 0.169000 NaN
8 0.084238 0.007579 0.076730 NaN
9 0.663869 0.780832 0.644874 NaN
Exercise 92:

Create a DataFrame and calculate the rolling mean for each group.

Solution:

import pandas as pd

data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}

df = pd.DataFrame(data)

df['Rolling_Mean'] = df.groupby('X')
['Y'].rolling(window=3).mean().reset_index(level=0, drop=True)

print(df)

Copy
Output:

X Y Rolling_Mean
0 foo 0 NaN
1 bar 1 NaN
2 foo 2 NaN
3 bar 3 NaN
4 foo 4 2.0
5 bar 5 3.0
6 foo 6 4.0
7 bar 7 5.0
8 foo 8 6.0
9 bar 9 7.0
Exercise 93:

Create a DataFrame with random values and calculate the rolling standard
deviation for each group.
Solution:

import pandas as pd

import numpy as np

data = np.random.rand(10, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

df['Rolling_Std'] = df.groupby('X')
['Y'].rolling(window=3).std().reset_index(level=0, drop=True)

print(df)

Copy
Output:

X Y Z Rolling_Std
0 0.154838 0.162793 0.808882 NaN
1 0.740167 0.920318 0.650240 NaN
2 0.033449 0.007883 0.249656 NaN
3 0.983601 0.261995 0.399816 NaN
4 0.883155 0.051084 0.125735 NaN
5 0.986930 0.470328 0.612276 NaN
6 0.981338 0.016731 0.627210 NaN
7 0.670522 0.247346 0.530971 NaN
8 0.978909 0.752500 0.903401 NaN
9 0.185614 0.362602 0.541459 NaN
Exercise 94:

Create a DataFrame and calculate the rolling variance for each group.

Solution:

import pandas as pd

data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}

df = pd.DataFrame(data)

df['Rolling_Var'] = df.groupby('X')
['Y'].rolling(window=3).var().reset_index(level=0, drop=True)
print(df)

Copy
Output:

X Y Rolling_Var
0 foo 0 NaN
1 bar 1 NaN
2 foo 2 NaN
3 bar 3 NaN
4 foo 4 4.0
5 bar 5 4.0
6 foo 6 4.0
7 bar 7 4.0
8 foo 8 4.0
9 bar 9 4.0
Exercise 95:

Create a DataFrame with random values and calculate the rolling correlation for
each group.

Solution:

import pandas as pd

import numpy as np

# Create a DataFrame with random values

np.random.seed(42) # For reproducibility

data = np.random.rand(10, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

# Optionally create a group column if necessary

df['Group'] = np.random.choice(['A', 'B'], size=10)

# Calculate the rolling correlation for each group

df['Rolling_Corr'] = df.groupby('Group').apply(lambda group:


group['Y'].rolling(window=3).corr(group['Z'])).reset_index(level=0, drop=True)
print(df)

Copy
Output:

X Z Group Rolling_Corr
0 0.374540 0.950714 0.731994 A NaN
1 0.598658 0.156019 0.155995 A NaN
2 0.058084 0.866176 0.601115 A 0.992633
3 0.708073 0.020584 0.969910 A -0.095420
4 0.832443 0.212339 0.181825 A -0.180021
5 0.183405 0.304242 0.524756 B NaN
6 0.431945 0.291229 0.611853 B NaN
7 0.139494 0.292145 0.366362 A -0.869948
8 0.456070 0.785176 0.199674 B -0.984073
9 0.514234 0.592415 0.046450 B -0.788379
Exercise 96:

Create a DataFrame and calculate the rolling covariance for each group.

Solution:

import pandas as pd

# Create a DataFrame with sample data

data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'],

'Y': range(10), 'Z': range(10, 20)}

df = pd.DataFrame(data)

# Calculate the rolling covariance for each group

rolling_cov = df.groupby('X').apply(lambda group:


group['Y'].rolling(window=3).cov(group['Z'])).reset_index(level=0, drop=True)

# Add the rolling covariance to the original DataFrame

df['Rolling_Cov'] = rolling_cov
print(df)

Copy
Output:

X Y Z Rolling_Cov
0 foo 0 10 NaN
1 bar 1 11 NaN
2 foo 2 12 NaN
3 bar 3 13 NaN
4 foo 4 14 4.0
5 bar 5 15 4.0
6 foo 6 16 4.0
7 bar 7 17 4.0
8 foo 8 18 4.0
9 bar 9 19 4.0
Exercise 97:

Create a DataFrame with random values and calculate the rolling skewness for
each group.

Solution:

import pandas as pd

import numpy as np

data = np.random.rand(10, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

df['Rolling_Skew'] = df.groupby('X')
['Y'].rolling(window=3).skew().reset_index(level=0, drop=True)

print(df)

Copy
Output:

X Y Z Rolling_Skew
0 0.808397 0.304614 0.097672 NaN
1 0.684233 0.440152 0.122038 NaN
2 0.495177 0.034389 0.909320 NaN
3 0.258780 0.662522 0.311711 NaN
4 0.520068 0.546710 0.184854 NaN
5 0.969585 0.775133 0.939499 NaN
6 0.894827 0.597900 0.921874 NaN
7 0.088493 0.195983 0.045227 NaN
8 0.325330 0.388677 0.271349 NaN
9 0.828738 0.356753 0.280935 NaN
Exercise 98:

Create a DataFrame and calculate the rolling kurtosis for each group.

Solution:

import pandas as pd

data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}

df = pd.DataFrame(data)

df['Rolling_Kurt'] = df.groupby('X')
['Y'].rolling(window=3).kurt().reset_index(level=0, drop=True)

print(df)

Copy
Output:

X Y Rolling_Kurt
0 foo 0 NaN
1 bar 1 NaN
2 foo 2 NaN
3 bar 3 NaN
4 foo 4 NaN
5 bar 5 NaN
6 foo 6 NaN
7 bar 7 NaN
8 foo 8 NaN
9 bar 9 NaN
Exercise 99:
Create a DataFrame with random values and calculate the rolling median for each
group.

Solution:

import pandas as pd

import numpy as np

data = np.random.rand(10, 3)

df = pd.DataFrame(data, columns=['X', 'Y', 'Z'])

df['Rolling_Median'] = df.groupby('X')
['Y'].rolling(window=3).median().reset_index(level=0, drop=True)

print(df)

Copy
Output:

X Y Z Rolling_Median
0 0.542696 0.140924 0.802197 NaN
1 0.074551 0.986887 0.772245 NaN
2 0.198716 0.005522 0.815461 NaN
3 0.706857 0.729007 0.771270 NaN
4 0.074045 0.358466 0.115869 NaN
5 0.863103 0.623298 0.330898 NaN
6 0.063558 0.310982 0.325183 NaN
7 0.729606 0.637557 0.887213 NaN
8 0.472215 0.119594 0.713245 NaN
9 0.760785 0.561277 0.770967 NaN
Exercise 100:

Create a DataFrame and calculate the expanding sum for each group.

Solution:

import pandas as pd

data = {'X': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar'], 'Y': range(10)}
df = pd.DataFrame(data)

df['Expanding_Sum'] = df.groupby('X')
['Y'].expanding().sum().reset_index(level=0, drop=True)

print(df)

Output:

X Y Expanding_Sum
0 foo 0 0.0
1 bar 1 1.0
2 foo 2 2.0
3 bar 3 4.0
4 foo 4 6.0
5 bar 5 9.0
6 foo 6 12.0
7 bar 7 16.0
8 foo 8 20.0
9 bar 9 25.0

You might also like