PANDAS
COMMANDS
Creating dataframe:
data_mobile = {'Mobile': ['iPhone', 'Samsung', 'Redmi'], 'Color': ['Red', 'White',
'Black'], 'Price': [High,
Medium,Low]}
df = pd.DataFrame(data_mobile,columns=['Mobile', 'Color', 'Price'])
Importing Data :
pd.read_csv('filename') - It is used to import the csv file into pandas dataframe.
pd.read_excel('filename') - it is used to import the excel sheet into pandas
dataframe.
pd.read_json('filename') - It is used to import the json file into pandas
dataframe.
Exporting Data:
df.to_csv('filename') - It is used to export the cleaned (or) modified data into
the file.
df.to_excel('filename') - It is used to export the modified data into the file.
df.to_json('filename') - It is used to export the modified data into the file.
Important commands :
* df.head(n) - Displays first n rows of the DataFrame.
* df.tail(n) – Displays last n rows of the DataFrame.
* df.shape - Gives the number of rows and columns.
* df.info() - Gives information of Index, Datatype and Memory.
* df.describe() -Summary statistics for numerical columns like
mean,std,25%,50%,75%,min,max.
df.columns - Gives the column names.
pd.set_option('display.max_columns',None) - displays all the columns.
df.dtypes - Gives the datatypes of each column.
df['column_name'].dtypes - Gives the datatype of the particular column.
df.axes - Gives the index range and column names of the dataframe.
df.ndim - Gives the dimensions of the table.
df.size - Gives the total no of rows.
df.values - Displays each row in form of list.
df.mean() - Gives the mean of each column
df.['column_name'].mean() - Gives mean of particular column
df.max() - gives the maximum values of all columns.include column name to know for
particular column.
df.min() - gives the minimum values of all columns.include column nameto know for
particular column.
df.std() - Gives the standard deviation of all columns.include column name to know
for particular column.
df.median() - Gives the middle values of all columns.include column name to know
for particular column.
df.count() - Gives the no of non-null values in each column.include column name to
know for particular column.
df.cumsum() - returns a DataFrame with the cumulative sum for each row.
df['column_name'].unique() - Gives all the unique values in the column.
df.column_name.replace('previous name','new name') - we can replace old with new
name in column.
df['column_name'].value_counts() - displays how many times a value repeated in
column.
df.sample(n=12) - displays 12 random rows from the data frame.
df.nlargest(n, 'value’) - select and order top n entries
df.nsmallest(n, ;value') - select and order bottom n entries.
LOC:
df.loc[1:3, ['cname1','cname2']] - Displays 1-3 rows of columns 1,2.
df.loc[1:3] - Displays 1,2,3 all rows of columns
df.iat[1, 2] - Access a single value for a row/column pair by index.
df.at[2, 'cname'] - Acess a single value for a row/column pair by column name.
ILOC:
df.iloc[0] - displays first row of data frame
df.iloc[1] - displays second row of data frame
df.iloc[-1] - displays last row of data frame
df.iloc[:,0] - displays the first column of dataframe.
df.iloc[:,1] - Select second column of data frame.
Handling Missing Values:
df.isnull() - The null value present in column is displayed as true,remaining all
are false.
df.notnull() - The null value present in column is displayed as false,remaining all
are true.
df.isnull().sum() - Counts all the null values present in each column and shows at
end of each column.
df.fillna(value) - Replace all NA/null data with value.
df.dropna()/df.dropna(thresh=n) - Drop rows with any column having NA/null
data./eliminates the rows which have the values less than n.
df.drop_duplicates() - removes duplicate rows (considers columns).
df.reset_index() - resets index of the row numbers, moves the actual index to
columns.
df.sort_index() - sorting the index.
df.rename(columns = {'y':'year'}) - rename the columns of the dataframe.
df.sort_values('cname') - Order rows by values of a column (low to high).
df.sort_values('cname',ascending=False) - Order rows by values of a column (high
to low).
df.groupby([column1,column2]) - Returns a groupbyobject values from multiple
columns.
df.groupby(column1)[column2].mean() - Returns the mean of the values in column2,
grouped by the values in column1 .
df.groupby(column1)[column2].median() - Returns the median of the values in
column2, grouped by the values in column1.
Extra commands:
pd.melt(df) - Gather columns into rows
df.pivot(columns='var', values='val') - spreads rows into columns.
pd.concat([df1,df2]) - Append rows of DataFrames
pd.concat([df1,df2], axis=1) - Append columns of DataFrames.