Python Libraries - Pandas - Pandas Basics
Pandas is a library built using NumPy specifically for data analysis.you will be using Pandas heavily
for data manipulation,visuilization,building machine learning models,etc.
There are two main data structures in pandas:
• series
• dataframes
The default way to store data in dataframes,and thus manipilating dataframes quickly in probable the most important skill set for datya analysis.
In [1]:
1 pip install pandas
Requirement already satisfied: pandas in c:\users\student\anaconda3\lib\site-packages (1.4.4)
Requirement already satisfied: pytz>=2020.1 in c:\users\student\anaconda3\lib\site-packages (from pandas) (2022.1)
Requirement already satisfied: numpy>=1.18.5 in c:\users\student\anaconda3\lib\site-packages (from pandas) (1.21.5)
Requirement already satisfied: python-dateutil>=2.8.1 in c:\users\student\anaconda3\lib\site-packages (from pandas) (2.8.2)
Requirement already satisfied: six>=1.5 in c:\users\student\anaconda3\lib\site-packages (from python-dateutil>=2.8.1->panda
s) (1.16.0)
Note: you may need to restart the kernel to use updated packages.
In [3]:
1 import pandas as pd
In [4]:
1 # The Pandas series
2 #creating a numeric pandas series
3 s = pd.Series([2,4,5,6,9])
4 print(s)
5 print(type(s))
0 2
1 4
2 5
3 6
4 9
dtype: int64
<class 'pandas.core.series.Series'>
In [5]:
1 #creating a series of type datetime
2 data_series = pd.date_range(start = '11-09-2017', end= '12-12-2017')
3 data_series
4 #type (data_series)
Out[5]:
DatetimeIndex(['2017-11-09', '2017-11-10', '2017-11-11', '2017-11-12',
'2017-11-13', '2017-11-14', '2017-11-15', '2017-11-16',
'2017-11-17', '2017-11-18', '2017-11-19', '2017-11-20',
'2017-11-21', '2017-11-22', '2017-11-23', '2017-11-24',
'2017-11-25', '2017-11-26', '2017-11-27', '2017-11-28',
'2017-11-29', '2017-11-30', '2017-12-01', '2017-12-02',
'2017-12-03', '2017-12-04', '2017-12-05', '2017-12-06',
'2017-12-07', '2017-12-08', '2017-12-09', '2017-12-10',
'2017-12-11', '2017-12-12'],
dtype='datetime64[ns]', freq='D')
The Dataframe
Dataframe is the most widely used data-structure in data analysis.It is a table with rows andcolumns,with rows having index and columns having meaningful
data.
creating dataframes from dictionaries.
EXAMPLE - 1
In [8]:
1 country = ['United States','Australia','India','Russia','Morrocco']
2 symbol = ['US','AU','IND','RUS','MOR']
3 dic_world = {"country":country,"symbol":symbol}
In [9]:
1 print(dic_world)
{'country': ['United States', 'Australia', 'India', 'Russia', 'Morrocco'], 'symbol': ['US', 'AU', 'IND', 'RUS', 'MOR']}
In [10]:
1 dic_world["country"]
2
Out[10]:
['United States', 'Australia', 'India', 'Russia', 'Morrocco']
In [11]:
1 dic_world["symbol"]
Out[11]:
['US', 'AU', 'IND', 'RUS', 'MOR']
In [12]:
1 data = pd.DataFrame(dic_world)
In [13]:
1 print(type(data))
2
<class 'pandas.core.frame.DataFrame'>
In [14]:
1 print(data)
2
country symbol
0 United States US
1 Australia AU
2 India IND
3 Russia RUS
4 Morrocco MOR
In [15]:
1 print(data["country"])
0 United States
1 Australia
2 India
3 Russia
4 Morrocco
Name: country, dtype: object
In [16]:
1 print(data["symbol"])
2
0 US
1 AU
2 IND
3 RUS
4 MOR
Name: symbol, dtype: object
EXAMPLE-2
In [18]:
1 #defining data to create lists for dictionary
2 cars_per_cap = [809,731,588,18,200,70,45]
3 country = ['United states','Australia','Japan','India','Russia','Morroco','Egypt']
4 drives_right = [False,True,True,True,False,False,False]
5
In [19]:
1 #creating the dictionaries to state the entries as key:value pair.
2 cars_dict = {"cars_per_cap":cars_per_cap,"country":country,"drives_right":drives_right}
In [20]:
1 print(cars_dict)
{'cars_per_cap': [809, 731, 588, 18, 200, 70, 45], 'country': ['United states', 'Australia', 'Japan', 'India', 'Russia', 'M
orroco', 'Egypt'], 'drives_right': [False, True, True, True, False, False, False]}
In [21]:
1 print(cars_dict['cars_per_cap'])
[809, 731, 588, 18, 200, 70, 45]
In [22]:
1 cars = pd.DataFrame(cars_dict)
AGGREGATION FUNCTION
In [24]:
1 cars
Out[24]:
cars_per_cap country drives_right
0 809 United states False
1 731 Australia True
2 588 Japan True
3 18 India True
4 200 Russia False
5 70 Morroco False
6 45 Egypt False
In [25]:
1 cars.cars_per_cap
Out[25]:
0 809
1 731
2 588
3 18
4 200
5 70
6 45
Name: cars_per_cap, dtype: int64
In [26]:
1 print(cars.cars_per_cap.max())
809
In [27]:
1 print(cars.cars_per_cap.min())
18
In [28]:
1 print(cars.cars_per_cap.mean())
351.57142857142856
In [29]:
1 print(cars.cars_per_cap.std())
345.59555222005633
In [30]:
1 print(cars.cars_per_cap.count())
7
In [39]:
1 country = ['United states','Australia','Japan','India','Russia','Morroco','Egypt']
2 cars_per_cap = [809,731,588,18,200,70,45]
In [41]:
1 lst = [['tom','reacher',25],['krish','pete',30],['nick','wilson',26],['julie', 'jonny', 28]]
2 df = pd.DataFrame(lst,columns = ['FName','LName','Age'],dtype = float)
3 df
C:\Users\student\AppData\Local\Temp\ipykernel_9292\3002031254.py:2: FutureWarning: Could not cast to float64, falling back
to object. This behavior is deprecated. In a future version, when a dtype is passed to 'DataFrame', either all columns will
be cast to that dtype, or a TypeError will be raised.
df = pd.DataFrame(lst,columns = ['FName','LName','Age'],dtype = float)
Out[41]:
FName LName Age
0 tom reacher 25.0
1 krish pete 30.0
2 nick wilson 26.0
3 julie jonny 28.0
In [42]:
1 df.Age.max()
Out[42]:
30.0
In [43]:
1 df.Age.min()
Out[43]:
25.0
In [44]:
1 df.Age.mean()
Out[44]:
27.25
In [45]:
1 df.Age.std()
Out[45]:
2.217355782608345
In [46]:
1 df.Age.count()
Out[46]:
In [ ]: