0% found this document useful (0 votes)

69 views5 pages

Pandas

The document discusses creating and manipulating DataFrames in Python Pandas. It covers topics like creating DataFrames from different data structures, selecting data, filtering, adding/modifying columns and rows, importing/exporting CSV files, and descriptive statistics.

Uploaded by

khushhal2024

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views5 pages

Pandas

Uploaded by

khushhal2024

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

PYTHON PANDAS

Creating DataFrame:

Syntax:  DF.DataFrame(data,columns,index)

1. Creating Empty DataFrame

DF=pd.DataFrame( )
2. Creating DataFrame from 2D lists (List of Lists)
L=[ [ 10, 20, 30, 40], [12, 12, 13,14 ] ]  Created a list L
DF=pd.DataFrame( L )  Passed it to DataFrame( )
3. Creating DataFrame from 2D dictionary (dictionary of dictionaries)
D={ 1: { 'a':10, 'b':20, 'd':120 },2 : { 'a':30, 'b':60, 'c':90} }  Created a nested (2D) dictionary
DF=pd.DataFrame(D)  Passed it to DataFrame( )
4. Creating DataFrame from ndarray
a1=np.array([91,2,3,4,5])  Created array a1
a2=np.array([10,20,30,40])  Created array a2
>>> DF=pd.DataFrame([a1,a2])  passed the arrays to DataFrame ( ) (in the form of list)
5. Creating DataFrame from List of Dictionaries
A=[{ 'a':10, 'b':20, 'd':120 },{ 'a':30, 'b':60, 'c':90} ]
>>>>DF=pd.DataFrame(A)
6. Creating DataFrame from Dictionary of Lists
A= {'Maths':[80,85,90],'Science':[78,89,88],'English':[85,74,65]}
>>>>DF=pd.DataFrame(A)
7. Creating DataFrame from Dictionary of Series
DF1=pd.DataFrame({'Maths':pd.Series([80,85,90],index=['a','b','c']),'Science':pd.Series([78,89,88],index=['a','b','c'
]),'English':pd.Series([85,74,65],index=['a','b','c'])})
8. Creating DataFrame from another DataFrame

Attributes of DataFrame:

1. index 6. shape
2. columns 7.values
3. axes 8. ndim
4. dtype 9. empty
5. size 10. T

Selecting/Accessing DataFrame:

1. Accessing a column  DF[‘col_name’] or DF.col_name

2. Accessing Multiple columns  DF[ [ list of column names] ]
3. Accessing a row  DF.loc[ ‘row_label’ ]
4. Accessing Multiple rows DF.loc[ [ list of row_labels] ]
5. Accessing a subset of Dataframe DF.loc [ index , columns ]
index = list of row labels or slice (only labels or all i.e. : ) of row labels
columns = list of column labels or slice (only labels or all i.e. : ) of column labels
OR
DF.iloc[ row_start_index : row_end_index , col_start_index : col_end_index)

Note: When we use iloc[ ], then end index is excluded but while using loc[ ], the end index is included.

6. Selecting/Accessing individual value  DF.colmn_name[ ‘row_label’ or index ]

Ex. DF.population[‘Delhi’] or DF.population[2]
Filtering Dataframe using operators:

1. Using index label: DataFrameName[DataFrameName[ col_name] operator value]]

Example  DF[DF[‘Marks’]>90]]
DF[ (DF[‘Marks’]>90]) & (DF[‘Gender’] == “Female”]) ]
2. Using dot notation: DataFrameName[DataFrameName. col_name operator value]
Example  DF[DF.Marks>90]
DF[(DF.Marks>90) & (DF.Gender == “Female”)]
3. Using .query( ) function: DataFrameName.query( ‘ col_name operator value’ )
Example  DF.query( ‘ Maths > 90 & Science > 85’ )

Adding/Modifying Columns of a DataFrame:

DF[‘col_name’]= [List of values]  will add/overite a column with given values

DF[‘new_col_name’]= Single_Value  will add/overwrite a column with all values as same

DF.assign(colum_name=[ list of values])  will add/overite a column with given values

DF.loc[ : , ‘col_name’] = [ List of Values ] or a Single Value

Note: you may also use dot( . ) notation of modify/add a column Ex. DF.col_name = [list of values]

 It the column name does not exist in the DataFrame, it will add a new column but if a column name already exist
with the same name, it will overwrite the values of that particular column.
 If you are assigning a list of values, the values in the column required must match the number of values you have
assigned otherwise it will raise an error i.e. ValueError.
 assign( ) function can be used to add/modify the values for multiple columns. (the column_name must be
written without quotation marks)

Adding/Modifying Rows of a DataFrame: use .loc( )

DF.loc[‘row_label’]= [ List of values ]  will add/overite a row values with given values

DF.loc[‘row_label’]= Single_Value  will add/overwrite a row values with all values as same

DF.at[‘row_label’] = [ List of values ]

DF.loc [ ‘row_label’ , start : end ] = [ List of values ]  will only modify the given slice of row

Note: To update a subset of DataFrame: DF.loc[ row , columns ]

DF.loc[ list of row_labels/Slice, list of column_names or Slice]

Deleting Columns/Row of a DataFrame:

1. del DF[‘Col_Name’]  it can only delete column

It will delete the column and update the values in DF. (the original DataFrame will be updated)
It can be used to drop the complete dataframe Ex. del DF
2. DF.drop[[list of columns/indexes],axis= 0/1) By default the axis=0
Drop function returns us the datframe after deleted values so you may store it in some variable. It will not affect
the original DF.

Renaming Rows/Columns:

DF.rename(columns={old_name:new_name,old_name:new_name,……},inplace=True/False)  For columns

DF.rename(index={old_name:new_name,old_name:new_name,……},inplace=True/False)  For Rows

Note: To rename both rows and columns DF.rename( {columns={ ……….}, index = { ………….} )

Boolean Indexing: For this first of all you need to create the DF with Boolean indexing.

To display rows with False index  DF.loc[ 0 ] or DF.loc[ False ]

To display rows with True index  DF.loc[ True ] or DF.loc[ 1 ]

Importing and Exporting Data between CSV files and DataFrame

Importing Data from CSV files: read_csv( ) function is used

read_csv( “ file path”, sep, header, names)

To read a CSV named ‘resultsheet.csv’ and store in a dataframe named ‘DF’

DF=pd.read_csv(“E:/Python programs/resultsheet.csv”, sep= “,”, header=0, names=[‘RNo’, ‘Name’, ‘Eng’, ‘Hin’])

Note: The default value of sep is space (Means the data items will be separated by space character)

Exporting Data from DataFrame to CSV File: to_csv( ) function is used

DF.to_csv( “ file path”, sep, header, index)

To create a CSV file ‘resultsheetnew’ from a dataframe named ‘DF’

DF.to_csv(“E:/Python programs/resultsheetnew.csv”, sep= “,”, header=True/False, index=True/False)

Note: By default the value of hearer and index attributes is True. If you don’t want column labels/ row index to be
stored in CSV file, you can make it False.

Series V/s Numpy Array:

Series Numpy Array

 In series we can define our own labeled index to  NumPy arrays are accessed by their integer
access elements of an array. These can be position using numbers only.
numbers or letters.
 The elements can be indexed in descending  The indexing starts with zero for the first
order also. element and the index is fixed.
 If two series are not aligned, NaN or missing  There is no concept of NaN values and if there
values are generated are no matching values in arrays, alignment
fails.
 Series require more memory.  NumPy occupies lesser memory.

.
*********************

PANDAS – 2
Descriptive Statistics:

Name UT Maths Science S.St Hindi Eng

0 Raman 1 22 21 18 20 21
1 Raman 2 21 20 17 22 24
2 Raman 3 14 19 15 24 23
3 Zuhaire 1 20 17 22 24 19
4 Zuhaire 2 23 15 21 25 15
5 Zuhaire 3 22 18 19 23 13
6 Ashravy 1 23 19 20 15 22
7 Ashravy 2 24 22 24 17 21
8 Ashravy 3 12 25 19 21 23
9 Mishti 1 15 22 25 22 22
10 Mishti 2 18 21 25 24 23
11 Mishti 3 17 18 20 25 20
Calculating Maximum Value: df.max(numeric_only=True/False, axis=0/1)
To calculate maximum value in each column. By default, all these functions extract a value for each column i.e. axis=0.
However to find the descriptive statistics for each row, we need to specify axis=1 as its argument.
Example. To print maximum marks obtained in each subject in Unit Test 2.
>>> DFUT2=df[df.UT==2] or DFUT2=df[df[‘UT’]==2]
>>> DFUT2.max(numeric_only=True)

Calculating Minimum Value: df.min(numeric_only=True/False, axis=0/1)

Example. To display minimum marks obtained by Raman in all unit tests in Hindi.
dfRaman=df[df.Name=='Raman']  Extract the records of Raman only and store in
dfRamandfRamanHndi= dfRaman.Hindi  Extract the record of Hindi from Raman’s
print(dfRamanHndi.max())  Apply function on finally extracted record

Calculating Sum of Values: df.sum(numeric_only=True/False, axis=0/1)

Example: Write python statement to print the total marks obtained by Raman in each subject.
>>> dfRaman=df[df.Name=='Raman']
>>> dfRaman[['Maths','Science','S.St','Hindi','Eng']].sum()
Example: Write python statement to print the total marks obtained by Raman in all subjects in each Unit Test.
>>> dfRaman=df[df.Name=='Raman']
>>> dfRaman[['Maths','Science','S.St','Hindi','Eng']].sum(axis=1)

Calculating Number of Values: df.count(numeric_only=True/False, axis=0/1)

Calculating mean (average): df.mean(numeric_only=True/False, axis=0/1)

mean() will display the average of the values of each column of a DataFrame. It is only applicable for numeric values.

Calculating median (Middle Value): df.median(numeric_only=True/False, axis=0/1)

median() will display the middle values of each column of a DataFrame. It is only applicable for numeric values.

Calculating Mode (Most appeared): df.mode(numeric_only=True/False, axis=0/1)

Mode is defined as the value that appears the most number of times in a data set.

Calculating Quartile: df.quantile( q , numeric_only, axis )

quantile() function is used to get the quartiles. It will output the quartile of each column or row of the DataFrame in four
parts i.e. the first quartile is 25% (parameter q = .25), the second quartile is 50% (Median), the third quartile is 75%
(parameter q = .75). By default, it will display the second quantile (median) of all numeric values.
position = (n-1) × q + 1 to locate the quartile position (Sorting of the dataset is required to calculate it)
Example: >>> Series1=pd.Series([15,18,10,22,23,42,41,36,80,75])
>>> Series1.quantile(.25)
19.0
N=10 (total number of values in data set) q = .25
(10-1) × .25 + 1 = 3.25 position
Arrange data in ascending order 10 15 18 22 23 36 41 42 75 80
3rd 4th positions
rd th
Difference 3 and 4 position values is 4, so fractional part will be multiplied by 4 (i.e. 18 + (4 × .25) = 19)
Example: Write the statement to display the first and third quartiles of all subjects.
>>> df[['Maths','Science','S.St','Hindi','Eng']].quantile([.25,.75])
Maths Science S.St Hindi Eng
0.25 16.50 18.00 18.75 20.75 19.75
0.75 22.25 21.25 22.50 24.00 23.00

Calculating Variance: df.var(numeric_only, axis)

Variance is the average of squared differences from the mean.

Calculating Standard Deviation: df.std((numeric_only, axis)

Standard deviation is calculated as the square root of the variance.

Data Aggregations: Aggregation means to transform the dataset and produce a single numeric value from an array.
Aggregation can be applied to one or more columns together. Aggregate functions are max(),min(), sum(), count(), std(),
var().
>>> df.aggregate('max')  will calculate max for each column
>>> df.aggregate(['max','count'])  will calculate max and total items for each column
>>> df['Maths'].aggregate(['max','min'],axis=0)  will calculate max and min value of Maths column
>>> df[['Maths','Science']].aggregate('sum',axis=1)  will calculate sum of Maths and Science in each row.

Sorting a DataFrame: df.sort_values( by = ‘colname’, axis = 0/1, ascending = True/False)

>>> df.sort_values(by=['Name'])  will sort the df in ascending order of
Name
>>> dfUT2 = df[df.UT == 2]  will sort the
>>> print(dfUT2.sort_values(by=['Science']))

Pandas Dataframe
No ratings yet
Pandas Dataframe
8 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Python Pandas-Data Frames
No ratings yet
Python Pandas-Data Frames
41 pages
Pandas
No ratings yet
Pandas
27 pages
Ip Study
No ratings yet
Ip Study
18 pages
Data Frames
No ratings yet
Data Frames
60 pages
Pandas
No ratings yet
Pandas
8 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Dataframe Ip
No ratings yet
Dataframe Ip
75 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Unit IV
No ratings yet
Unit IV
49 pages
Lab 9
No ratings yet
Lab 9
9 pages
Chapter 1 - Part 2 - DataFrame
No ratings yet
Chapter 1 - Part 2 - DataFrame
48 pages
Pandas DataFrame Basics
No ratings yet
Pandas DataFrame Basics
48 pages
Pandas
No ratings yet
Pandas
26 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas DataFrame Cheat Sheet
100% (1)
Pandas DataFrame Cheat Sheet
10 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
4 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Wa0046.
No ratings yet
Wa0046.
8 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
Chapter 1 Python Pandas - I
No ratings yet
Chapter 1 Python Pandas - I
35 pages
XII IP Resource Material - DataFrame
No ratings yet
XII IP Resource Material - DataFrame
22 pages
Chapter 2 Data Handling Using Pandas - I (DATA FRAME)
No ratings yet
Chapter 2 Data Handling Using Pandas - I (DATA FRAME)
15 pages
Data Frames
No ratings yet
Data Frames
42 pages
Pandas
No ratings yet
Pandas
29 pages
12 Pandas
No ratings yet
12 Pandas
9 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
Unit 4
No ratings yet
Unit 4
36 pages
Class 12 Practical File
No ratings yet
Class 12 Practical File
29 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Subject IP
No ratings yet
Subject IP
9 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Pandas, Numpy, Matplotlib
No ratings yet
Pandas, Numpy, Matplotlib
11 pages
Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
Dataframes UNIT 1 PART 2
No ratings yet
Dataframes UNIT 1 PART 2
33 pages

Pandas

Uploaded by

Pandas

Uploaded by

PYTHON PANDAS

1. Creating Empty DataFrame

1. Accessing a column  DF[‘col_name’] or DF.col_name

6. Selecting/Accessing individual value  DF.colmn_name[ ‘row_label’ or index ]

1. Using index label: DataFrameName[DataFrameName[ col_name] operator value]]

Adding/Modifying Columns of a DataFrame:

DF[‘col_name’]= [List of values]  will add/overite a column with given values

DF[‘new_col_name’]= Single_Value  will add/overwrite a column with all values as same

DF.assign(colum_name=[ list of values])  will add/overite a column with given values

DF.loc[ : , ‘col_name’] = [ List of Values ] or a Single Value

Adding/Modifying Rows of a DataFrame: use .loc( )

DF.at[‘row_label’] = [ List of values ]

Note: To update a subset of DataFrame: DF.loc[ row , columns ]

DF.loc[ list of row_labels/Slice, list of column_names or Slice]

Deleting Columns/Row of a DataFrame:

1. del DF[‘Col_Name’]  it can only delete column

DF.rename(columns={old_name:new_name,old_name:new_name,……},inplace=True/False)  For columns

DF.rename(index={old_name:new_name,old_name:new_name,……},inplace=True/False)  For Rows

To display rows with False index  DF.loc[ 0 ] or DF.loc[ False ]

To display rows with True index  DF.loc[ True ] or DF.loc[ 1 ]

Importing and Exporting Data between CSV files and DataFrame

Importing Data from CSV files: read_csv( ) function is used

read_csv( “ file path”, sep, header, names)

To read a CSV named ‘resultsheet.csv’ and store in a dataframe named ‘DF’

DF=pd.read_csv(“E:/Python programs/resultsheet.csv”, sep= “,”, header=0, names=[‘RNo’, ‘Name’, ‘Eng’, ‘Hin’])

Exporting Data from DataFrame to CSV File: to_csv( ) function is used

DF.to_csv( “ file path”, sep, header, index)

To create a CSV file ‘resultsheetnew’ from a dataframe named ‘DF’

DF.to_csv(“E:/Python programs/resultsheetnew.csv”, sep= “,”, header=True/False, index=True/False)

Series V/s Numpy Array:

Series Numpy Array

Name UT Maths Science S.St Hindi Eng

Calculating Minimum Value: df.min(numeric_only=True/False, axis=0/1)

Calculating Sum of Values: df.sum(numeric_only=True/False, axis=0/1)

Calculating Number of Values: df.count(numeric_only=True/False, axis=0/1)

Calculating mean (average): df.mean(numeric_only=True/False, axis=0/1)

Calculating median (Middle Value): df.median(numeric_only=True/False, axis=0/1)

Calculating Mode (Most appeared): df.mode(numeric_only=True/False, axis=0/1)

Calculating Quartile: df.quantile( q , numeric_only, axis )

Calculating Variance: df.var(numeric_only, axis)

Calculating Standard Deviation: df.std((numeric_only, axis)

Sorting a DataFrame: df.sort_values( by = ‘colname’, axis = 0/1, ascending = True/False)

You might also like