PYTHON PANDAS
Creating DataFrame:
                        Syntax:  DF.DataFrame(data,columns,index)
   1. Creating Empty DataFrame
      DF=pd.DataFrame( )
   2. Creating DataFrame from 2D lists (List of Lists)
      L=[ [ 10, 20, 30, 40], [12, 12, 13,14 ] ]                  Created a list L
      DF=pd.DataFrame( L )                                       Passed it to DataFrame( )
   3. Creating DataFrame from 2D dictionary (dictionary of dictionaries)
      D={ 1: { 'a':10, 'b':20, 'd':120 },2 : { 'a':30, 'b':60, 'c':90} }         Created a nested (2D) dictionary
      DF=pd.DataFrame(D)                                                         Passed it to DataFrame( )
   4. Creating DataFrame from ndarray
      a1=np.array([91,2,3,4,5])                        Created array a1
      a2=np.array([10,20,30,40])                       Created array a2
      >>> DF=pd.DataFrame([a1,a2])                     passed the arrays to DataFrame ( ) (in the form of list)
   5. Creating DataFrame from List of Dictionaries
      A=[{ 'a':10, 'b':20, 'd':120 },{ 'a':30, 'b':60, 'c':90} ]
      >>>>DF=pd.DataFrame(A)
   6. Creating DataFrame from Dictionary of Lists
      A= {'Maths':[80,85,90],'Science':[78,89,88],'English':[85,74,65]}
      >>>>DF=pd.DataFrame(A)
   7. Creating DataFrame from Dictionary of Series
      DF1=pd.DataFrame({'Maths':pd.Series([80,85,90],index=['a','b','c']),'Science':pd.Series([78,89,88],index=['a','b','c'
      ]),'English':pd.Series([85,74,65],index=['a','b','c'])})
   8. Creating DataFrame from another DataFrame
Attributes of DataFrame:
   1.   index                    6. shape
   2.   columns                  7.values
   3.   axes                     8. ndim
   4.   dtype                    9. empty
   5.   size                     10. T
Selecting/Accessing DataFrame:
   1.   Accessing a column                  DF[‘col_name’]           or       DF.col_name
   2.   Accessing Multiple columns          DF[ [ list of column names] ]
   3.   Accessing a row                     DF.loc[ ‘row_label’ ]
   4.   Accessing Multiple rows DF.loc[ [ list of row_labels] ]
   5.   Accessing a subset of Dataframe DF.loc [ index , columns ]
                 index = list of row labels or slice (only labels or all i.e. : ) of row labels
                 columns = list of column labels or slice (only labels or all i.e. : ) of column labels
                                   OR
        DF.iloc[ row_start_index : row_end_index , col_start_index : col_end_index)
        Note: When we use iloc[ ], then end index is excluded but while using loc[ ], the end index is included.
   6. Selecting/Accessing individual value  DF.colmn_name[ ‘row_label’ or index ]
                                             Ex. DF.population[‘Delhi’] or DF.population[2]
Filtering Dataframe using operators:
   1. Using index label: DataFrameName[DataFrameName[ col_name] operator value]]
                       Example  DF[DF[‘Marks’]>90]]
                       DF[ (DF[‘Marks’]>90]) & (DF[‘Gender’] == “Female”]) ]
   2. Using dot notation: DataFrameName[DataFrameName. col_name operator value]
                       Example  DF[DF.Marks>90]
                       DF[(DF.Marks>90) & (DF.Gender == “Female”)]
   3. Using .query( ) function: DataFrameName.query( ‘ col_name operator value’ )
                       Example  DF.query( ‘ Maths > 90 & Science > 85’ )
Adding/Modifying Columns of a DataFrame:
       DF[‘col_name’]= [List of values]                       will add/overite a column with given values
       DF[‘new_col_name’]= Single_Value                       will add/overwrite a column with all values as same
       DF.assign(colum_name=[ list of values])                will add/overite a column with given values
       DF.loc[ : , ‘col_name’] = [ List of Values ] or a Single Value
       Note: you may also use dot( . ) notation of modify/add a column Ex. DF.col_name = [list of values]
      It the column name does not exist in the DataFrame, it will add a new column but if a column name already exist
       with the same name, it will overwrite the values of that particular column.
      If you are assigning a list of values, the values in the column required must match the number of values you have
       assigned otherwise it will raise an error i.e. ValueError.
      assign( ) function can be used to add/modify the values for multiple columns. (the column_name must be
       written without quotation marks)
Adding/Modifying Rows of a DataFrame: use .loc( )
       DF.loc[‘row_label’]= [ List of values ]                will add/overite a row values with given values
       DF.loc[‘row_label’]= Single_Value                      will add/overwrite a row values with all values as same
       DF.at[‘row_label’] = [ List of values ]
       DF.loc [ ‘row_label’ , start : end ] = [ List of values ]     will only modify the given slice of row
       Note: To update a subset of DataFrame: DF.loc[ row , columns ]
                DF.loc[ list of row_labels/Slice, list of column_names or Slice]
Deleting Columns/Row of a DataFrame:
   1. del DF[‘Col_Name’]                        it can only delete column
      It will delete the column and update the values in DF. (the original DataFrame will be updated)
      It can be used to drop the complete dataframe Ex. del DF
   2. DF.drop[[list of columns/indexes],axis= 0/1)              By default the axis=0
      Drop function returns us the datframe after deleted values so you may store it in some variable. It will not affect
      the original DF.
Renaming Rows/Columns:
       DF.rename(columns={old_name:new_name,old_name:new_name,……},inplace=True/False)                         For columns
       DF.rename(index={old_name:new_name,old_name:new_name,……},inplace=True/False)                           For Rows
            Note: To rename both rows and columns DF.rename( {columns={ ……….}, index = { ………….} )
Boolean Indexing: For this first of all you need to create the DF with Boolean indexing.
            To display rows with False index      DF.loc[ 0 ] or DF.loc[ False ]
            To display rows with True index       DF.loc[ True ] or DF.loc[ 1 ]
Importing and Exporting Data between CSV files and DataFrame
Importing Data from CSV files: read_csv( ) function is used
                   read_csv( “ file path”, sep, header, names)
                   To read a CSV named ‘resultsheet.csv’ and store in a dataframe named ‘DF’
            DF=pd.read_csv(“E:/Python programs/resultsheet.csv”, sep= “,”, header=0, names=[‘RNo’, ‘Name’, ‘Eng’, ‘Hin’])
Note: The default value of sep is space (Means the data items will be separated by space character)
Exporting Data from DataFrame to CSV File: to_csv( ) function is used
DF.to_csv( “ file path”, sep, header, index)
To create a CSV file ‘resultsheetnew’ from a dataframe named ‘DF’
DF.to_csv(“E:/Python programs/resultsheetnew.csv”, sep= “,”, header=True/False, index=True/False)
Note: By default the value of hearer and index attributes is True. If you don’t want column labels/ row index to be
stored in CSV file, you can make it False.
Series V/s Numpy Array:
                              Series                                                   Numpy Array
            In series we can define our own labeled index to            NumPy arrays are accessed by their integer
             access elements of an array. These can be                    position using numbers only.
             numbers or letters.
            The elements can be indexed in descending                   The indexing starts with zero for the first
             order         also.                                          element and the index is fixed.
            If two series are not aligned, NaN or missing               There is no concept of NaN values and if there
             values are generated                                         are no matching values in arrays, alignment
                                                                          fails.
            Series require more memory.                                 NumPy occupies lesser memory.
.
*********************
                                                         PANDAS – 2
Descriptive Statistics:
               Name     UT    Maths     Science      S.St        Hindi    Eng
    0         Raman      1       22          21        18           20     21
    1         Raman      2       21          20        17           22     24
    2         Raman      3       14          19        15           24     23
    3       Zuhaire      1       20          17        22           24     19
    4       Zuhaire      2       23          15        21           25     15
    5       Zuhaire      3       22          18        19           23     13
    6       Ashravy      1       23          19        20           15     22
 7      Ashravy        2        24            22       24        17      21
 8      Ashravy        3        12            25       19        21      23
 9       Mishti        1        15            22       25        22      22
 10      Mishti        2        18            21       25        24      23
 11      Mishti        3        17            18       20        25      20
Calculating Maximum Value: df.max(numeric_only=True/False, axis=0/1)
To calculate maximum value in each column. By default, all these functions extract a value for each column i.e. axis=0.
However to find the descriptive statistics for each row, we need to specify axis=1 as its argument.
Example. To print maximum marks obtained in each subject in Unit Test 2.
                >>> DFUT2=df[df.UT==2]            or      DFUT2=df[df[‘UT’]==2]
                >>> DFUT2.max(numeric_only=True)
Calculating Minimum Value: df.min(numeric_only=True/False, axis=0/1)
Example. To display minimum marks obtained by Raman in all unit tests in Hindi.
        dfRaman=df[df.Name=='Raman']                       Extract the records of Raman only and store in
        dfRamandfRamanHndi= dfRaman.Hindi                  Extract the record of Hindi from Raman’s
        print(dfRamanHndi.max())                           Apply function on finally extracted record
Calculating Sum of Values:     df.sum(numeric_only=True/False, axis=0/1)
Example: Write python statement to print the total marks obtained by Raman in each subject.
        >>> dfRaman=df[df.Name=='Raman']
        >>> dfRaman[['Maths','Science','S.St','Hindi','Eng']].sum()
Example: Write python statement to print the total marks obtained by Raman in all subjects in each Unit Test.
        >>> dfRaman=df[df.Name=='Raman']
        >>> dfRaman[['Maths','Science','S.St','Hindi','Eng']].sum(axis=1)
Calculating Number of Values: df.count(numeric_only=True/False, axis=0/1)
Calculating mean (average):     df.mean(numeric_only=True/False, axis=0/1)
mean() will display the average of the values of each column of a DataFrame. It is only applicable for numeric values.
Calculating median (Middle Value):       df.median(numeric_only=True/False, axis=0/1)
median() will display the middle values of each column of a DataFrame. It is only applicable for numeric values.
Calculating Mode (Most appeared):      df.mode(numeric_only=True/False, axis=0/1)
Mode is defined as the value that appears the most number of times in a data set.
Calculating Quartile:                       df.quantile( q , numeric_only, axis )
quantile() function is used to get the quartiles. It will output the quartile of each column or row of the DataFrame in four
parts i.e. the first quartile is 25% (parameter q = .25), the second quartile is 50% (Median), the third quartile is 75%
(parameter q = .75). By default, it will display the second quantile (median) of all numeric values.
         position = (n-1) × q + 1 to locate the quartile position (Sorting of the dataset is required to calculate it)
         Example:           >>> Series1=pd.Series([15,18,10,22,23,42,41,36,80,75])
                            >>> Series1.quantile(.25)
                            19.0
         N=10 (total number of values in data set)            q = .25
         (10-1) × .25 + 1 = 3.25 position
Arrange data in ascending order             10 15 18 22 23 36 41 42 75 80
                                                     3rd 4th positions
              rd      th
Difference 3 and 4 position values is 4, so fractional part will be multiplied by 4 (i.e. 18 + (4 × .25) = 19)
       Example: Write the statement to display the first and third quartiles of all subjects.
              >>> df[['Maths','Science','S.St','Hindi','Eng']].quantile([.25,.75])
                      Maths Science S.St Hindi Eng
              0.25    16.50 18.00 18.75 20.75 19.75
              0.75    22.25 21.25 22.50 24.00 23.00
Calculating Variance:             df.var(numeric_only, axis)
                                  Variance is the average of squared differences from the mean.
Calculating Standard Deviation:                   df.std((numeric_only, axis)
                                  Standard deviation is calculated as the square root of the variance.
Data Aggregations: Aggregation means to transform the dataset and produce a single numeric value from an array.
Aggregation can be applied to one or more columns together. Aggregate functions are max(),min(), sum(), count(), std(),
var().
       >>> df.aggregate('max')                                 will calculate max for each column
       >>> df.aggregate(['max','count'])                       will calculate max and total items for each column
       >>> df['Maths'].aggregate(['max','min'],axis=0)         will calculate max and min value of Maths column
       >>> df[['Maths','Science']].aggregate('sum',axis=1)     will calculate sum of Maths and Science in each row.
Sorting a DataFrame:      df.sort_values( by = ‘colname’, axis = 0/1, ascending = True/False)
        >>> df.sort_values(by=['Name'])             will sort the df in ascending order of
Name
        >>> dfUT2 = df[df.UT == 2]  will sort the
        >>> print(dfUT2.sort_values(by=['Science']))