0% found this document useful (0 votes)

22 views33 pages

Section 7

This document provides an overview of NumPy and Pandas, two popular Python libraries for working with data. NumPy is introduced as a library for working with arrays and numerical data that aims to provide fast array operations. Key features of NumPy like ndarrays, array access, operations, and random number generation are demonstrated. Pandas is then introduced as a library built on NumPy for working with structured and labeled data. The basics of pandas Series and DataFrame objects are covered, including creation, accessing data, and common operations.

Uploaded by

emadelkhashab1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views33 pages

Section 7

Uploaded by

emadelkhashab1234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Big Data

Section 7

By : Yosra Attaher
Agenda

●
NumPy `
●
Pandas
●
Talk about the project
What is NumPy?
●
NumPy is a Python library used for working with arrays.
●
It also has functions for working in domain of linear
algebra, fourier transform, and matrices.
●
NumPy was created in 2005 by Travis Oliphant. It is an
open source project and you can use it freely.
●
NumPy stands for Numerical Python.
Why Use NumPy?
●
In Python we have lists that serve the purpose of arrays,
but they are slow to process.
●
NumPy aims to provide an array object that is up to 50x
faster than traditional Python lists.
●
The array object in NumPy is called ndarray, it provides
a lot of supporting functions that make working with
ndarray very easy.
●
Arrays are very frequently used in data science, where
speed and resources are very important.
NumPy, Arrays
import numpy as np
# create a NumPy ndarray object by using the array() function.
a = np.array(45)
b = np.array([1, 2, 3])
c = np.array((1, 2, 3)) #Use a tuple to create a NumPy array
d = np.array([[1, 2, 3], [3, 4, 5],[2, 3, 4]])
e = np.array((1, 2, 3), ndmin=5)
print(a.ndim, a)
print(a)
Access Arrays
print(b.ndim, b) # 1 [1 2 3]
print(b[0], ', ', b[2]) #1, 3
print(c.ndim, c) # 1 [1 2 3]
print(c[0], ', ', c[1], ', ', c[-1]) #1, 2, 3
print(d.ndim, d) # 2 [[1 2 3] [3 4 5] [2 3 4]]
print(d[0,0], ', ', d[1,2]) #1, 5
print(e.ndim, e) # 5 [[[[[1 2 3]]]]]
print(e[0,0,0,0,2]) #3
Access Arrays
x = np.array([1,2,3,4,5,6,7,8,9,10])
print(x[1:5]) # [2 3 4 5]
print(x[4:]) # [ 5 6 7 8 9 10]
print(x[:4]) # [1 2 3 4]
print(x[-3:-1]) # [8 9]
print(x[1:5:2]) # [2 4]
print(x[::2]) # [1 3 5 7 9]
Slice
x = np.array([[1, 2, 3, 5], [3, 4, 5, 9],[2, 3, 4, 12],[4, 2, 3, 1]])

RUN:
print(x[1, 1:4])
[4 5 9]
print ("--------------------------") --------------------------

print(x[0:2, 2:4]) [[3 5]

[5 9]]
print ("--------------------------")
--------------------------
print(x[1:3, 1:3]) [[4 5]
[3 4]]
Conversion
x = np.array([1.2, 2.4, 3.5], 'f8') RUN...
print(x.dtype); print(x) float64

y = x.astype('i4') [1.2 2.4 3.5]

print(y.dtype); print(y) int32

x = np.array([1.2, 2.4, 3.5], 'i4') [1 2 3]

print(x.dtype); print(x) int32

x = np.array([1.2, 2.4, 3.5], 'i1') [1 2 3]

print(x.dtype); print(x) int8

[1 2 3]
x = np.array(["1.2", "2.4", "3.5"], 'f8')
float64
print(x.dtype); print(x)
[1.2 2.4 3.5]
Copy and View
x = np.array([1,2,3,4,5,6,7,8,9,10]) RUN….
y = x.copy()
[ 1 2 3 4 5 6 7 8 9 10]
x[1] = 12
[ 1 12 3 4 5 6 7 8 9 10]
print(y)
---------------
y = x.view()
x[1] = 12 [ 1 15 3 4 5 6 7 8 9 10]
print(y) None [ 1 15 3 4 5 6 7 8 9 10]
y[1] = 15
print ("---------------")
print(x)
print(x.base, y.base)
Reshape
x = np.array([[1, 2, 3, 5], [3, 4, 5, 9],[2, 3, 4, 12],
[4, 2, 3, 1]]) RUN…...
--------------------
print("--------------------")
(4, 4)
print(x.shape)
[[ 1 2 3 5 3 4 5 9]
y = x.reshape(2, 8) [ 2 3 4 12 4 2 3 1]]
print(y) --------------------

print("--------------------") [ 1 2 3 5 3 4 5 9 2 3 4 12 4 2 3 1]
--------------------
y = x.reshape(-1)
[[ 1 2 3 5]
print(y) [ 3 4 5 9]
print("--------------------") [ 2 3 4 12]

print(y.base) [ 4 2 3 1]]
Join
x = np.array([[1, 1, 1], [2, 2, 2]]) Run…

(4, 3) [[1 1 1]
y = np.array([[3, 3, 3], [4, 4, 4]])
[2 2 2]
z = np.concatenate((x,y)) [3 3 3]

print(z.shape, z) [4 4 4]]

--------------
print("--------------")
(4, 3) [[1 1 1]
z = np.concatenate((x,y), axis=0) [2 2 2]

print(z.shape, z) [3 3 3]

[4 4 4]]
print("--------------")
--------------
z = np.concatenate((x,y), axis=1)
(2, 6) [[1 1 1 3 3 3]
print(z.shape, z) [2 2 2 4 4 4]]
Search, Sort, Filter
#search
x = np.array([11, 31, 87, 19, 23, 43])
y = np.where(x==19); print(y) RUN…
#sort
(array([3]),)
x = np.array([11, 31, 87, 19, 23, 43])
[11 19 23 31 43 87]
y = np.sort(x); print(y)
#filter [11 23 43]

x = np.array([11, 31, 87, 19, 23, 43])

s = [True, False, False, False, True, True]
y = x[s]; print(y)
NUMPY, RANDOM
import numpy as np RUN…

0.7537900893332695

from numpy import random 86

#basics [30 79 10 14 94]

[[ 7 91 46]

x = random.rand(); print(x) [ 0 65 56]

x = random.randint(100); print(x) [62 64 28]

[91 72 18]

x = random.randint(100, size=5); [16 37 24]]

print(x) [0.89308242 0.11235977 0.57879863 0.63562923 0.68296079]

x = random.randint(100, size=(5, 3)); [[0.28630843 0.87333319 0.07027453]

print(x) [0.82643457 0.81043574 0.47318528]

[0.38990336 0.267552 0.23475348]

x = random.rand(5); print(x) [0.28870442 0.82799002 0.85453119]

x = random.rand(5,3); print(x) [0.55594484 0.29363382 0.97318952]]

Random Choice
x = random.choice([5,3,7,8]); RUN…
print(x)
5
x = random.choice([5,3,7,8],
size=(10)); print(x) [7 3 3 5 5 7 5 5 8 5]

x = random.choice([5,3,7,8], [[8 8 3]
size=(2,3)); print(x) [7 8 3]]
x = random.choice([5,3,7,8], [7 7 7 7 3 3 7 7 7 7]
p=[0.1, 0.3, 0.6, 0.0],
size=(10));
print(x)
Shuffel
x = np.array([1,2,3,4,5,6,7,8]) RUN…
o = x.copy() [1 2 3 4 5 6 7 8]
random.shuffle(x) [7 6 3 5 1 4 8 2]
print('\n', o, '\n', x)
x = np.array([1,2,3,4,5,6,7,8]) [1 2 3 4 5 6 7 8]
y = random.permutation(x) [2 4 3 6 5 1 7 8]
print('\n', x, '\n', y)
Random Distribution
import numpy as np
from numpy import random
import matplotlib.pyplot as plt

# We can plot Normal Distribution, Binomial Distribution,

Poisson Distribution, Uniform Distribution, Logarithmic
Distribution, Multinomial Distribution, Exponential
Distribution, Chi-Square Distribution
What is Pandas?
●
Pandas is a Python library used for working with
data sets.
●
It has functions for analyzing, cleaning, exploring,
and manipulating data.
●
The name "Pandas" has a reference to both
"Panel Data", and "Python Data Analysis" and
was created by Wes McKinney in 2008.
Why Use Pandas?
●
Pandas allows us to analyze big data and make
conclusions based on statistical theories.
●
Pandas can clean messy data sets, and make
them readable and relevant.
●
Relevant data is very important in data science.
Series, Creation
import pandas as pd
import numpy as np
#creating series
s = pd.Series([22, 32, 31, 42, 51]); print(s)
data = np.array(['a', 'b', 'c', 'd'])
s = pd.Series(data); print(s)
s = pd.Series(data,
index=[100,101,102,103]); print(s)
Series , Creation
data = {'a':100, 'b':120, 'c':99}
s = pd.Series(data); print(s)
data = {'c':99, 'a':100, 'b':120}
s = pd.Series(data, index=['a', 'b', 'c', 'd']);
print(s)
s = pd.Series(5, index=['a', 'b', 'c', 'd']);
print(s)
Series, Accessing
s = pd.Series([1,2,3,4,5],index =
['a','b','c','d','e'])
print(s[0])
print(s[1:3])
print(s[:3])
print(s[1:])
print(s[:])
print(s[-1])
print(s[-3:-1])
print(s['a'])
print(s[['a', 'c', 'e']])
print(s[[2, 4]])
Series, Basic Functions
calories = {'day1': 200, 'day2': 380,
'day3': 480, 'day4': 290}
s = pd.Series(calories); print(s.axes)
print(s.empty)
print(s.ndim)
print(s.size)
print(s.values)
print(s.head(2))
print(s.tail(2))
DataFrame, Creation
data = [12, 12, 13, 14, 15]
df = pd.DataFrame(data); print(df)
df = pd.DataFrame(data, columns =
['Temprature']);
print(df)
df = pd.DataFrame(data, columns =
['Temprature'],
dtype=float); print(df)
data = [['Alex',10],['Bob',12],['Clarke',13]]
df =
pd.DataFrame(data,columns=['Name','Age
']); print(df)
DataFrame, Creation
data = {
"calories": [200, 380, 480, 290],
"duration": [50, 40, 45, 30]
}
df = pd.DataFrame(data); print(df)
df = pd.DataFrame(data, index=['sat', 'sun',
'mon','tus']); print(df)
df = pd.DataFrame([{'math':88,
'physics':90},{'history':75, 'math':94}]); print(df)
DataFrame, Creation
data = {
'calories': pd.Series([200, 380, 480, 290],
index=['sat', 'sun', 'mon','tus']),
'duration': pd.Series([50, 40, 45, 30], index=['sat',
'sun', 'mon','tus'])
}
df = pd.DataFrame(data); print(df)
df = pd.DataFrame([{'math':88, 'physics':90},{'art':65,
'math':94}], index=['midterm', 'final'],
columns=['physics', 'math', 'art']); print(df)
DataFrame, Basic Functions
data = {
'Name':pd.Series(['Tom','James','Steve','Smith','
Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,
3.8])};
df = pd.DataFrame(data)
print(df)
print(df.T)
print(df.axes)
print(df.dtypes)
print(df.empty)
print(df.ndim)
print(df.shape)
print(df.size)
print(df.values)
DataFrame, Files
df = pd.read_csv('data/data.csv');
print(df)
df = pd.read_json('data/data.json');
print(df)
print(df.head())
print(df.head(10))
print(df.tail())
print(df.tail(6))
print(df.info())
DataFrame, Cleaning
Df =
pd.read_csv('data/wdata.csv’);
print(df)
print(df.loc[[22, 26,7, 11, 12,
18, 28]])
print(df.info())
DataFrame, Cleaning
dfcopy = df.dropna();
print(dfcopy.info())
df.dropna(inplace = True);
print(df.info())
df = pd.read_csv('data/wdata.csv’)
print(df.loc[[22, 26, 7, 11, 12, 18, 28]])
df.fillna(130, inplace = True);
print(df.info())
print(df.loc[[22, 26, 7, 11, 12, 18, 28]])
DataFrame, Cleaning
df = pd.read_csv('data/wdata.csv')
df.dropna(subset=['Date'], inplace = True)
print(df.info())
df = pd.read_csv('data/wdata.csv')
print(df.duplicated())
df.drop_duplicates(inplace=True)
print(df.duplicated())
DataFrame, Files
df = pd.read_csv('data/data.csv');
print(df)
df = pd.read_json('data/data.json');
print(df)
print(df.head())
print(df.head(10))
print(df.tail())
print(df.tail(6))
print(df.info())
Thanks

Introduction To Numpy Pandas and Matplotlib
No ratings yet
Introduction To Numpy Pandas and Matplotlib
2 pages
Data Analysis with Python Libraries
No ratings yet
Data Analysis with Python Libraries
29 pages
M3-Introduction To Numpy and Pandas
No ratings yet
M3-Introduction To Numpy and Pandas
55 pages
Numpy
No ratings yet
Numpy
9 pages
Python Basics for Data Science
No ratings yet
Python Basics for Data Science
30 pages
Practicals 1 To 4
No ratings yet
Practicals 1 To 4
15 pages
Python NumPy for Beginners
100% (1)
Python NumPy for Beginners
84 pages
ML Programs
No ratings yet
ML Programs
34 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Numpy (Numerical Python)
No ratings yet
Numpy (Numerical Python)
80 pages
Machine Learning Using Phython
No ratings yet
Machine Learning Using Phython
25 pages
Numpy Merged
No ratings yet
Numpy Merged
93 pages
Numpy Basics
No ratings yet
Numpy Basics
66 pages
NumPy Tutorial
No ratings yet
NumPy Tutorial
8 pages
Numpy
No ratings yet
Numpy
18 pages
Numpy
No ratings yet
Numpy
14 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
Fods Lab Manual
No ratings yet
Fods Lab Manual
26 pages
Tutorial 2
No ratings yet
Tutorial 2
9 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Numpy Cheat Sheet
No ratings yet
Numpy Cheat Sheet
13 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
61 pages
Swarang Raut EDVA Experiment 1 Numpy Pandas
No ratings yet
Swarang Raut EDVA Experiment 1 Numpy Pandas
58 pages
Ex 1 NumpyArrays
No ratings yet
Ex 1 NumpyArrays
5 pages
NumPy for Scientific Computing
No ratings yet
NumPy for Scientific Computing
39 pages
Module 6 NumPY and Pandas
No ratings yet
Module 6 NumPY and Pandas
12 pages
Numpy
No ratings yet
Numpy
11 pages
NumPy Array Operations Guide
No ratings yet
NumPy Array Operations Guide
14 pages
Numpy Merged
No ratings yet
Numpy Merged
96 pages
NumPy Basics for Engineers
No ratings yet
NumPy Basics for Engineers
13 pages
Sheet 3 Numpy
No ratings yet
Sheet 3 Numpy
10 pages
Numpy
No ratings yet
Numpy
11 pages
Python 101 - Python Libraries For Data Analysis - Numpy and Pandas
No ratings yet
Python 101 - Python Libraries For Data Analysis - Numpy and Pandas
22 pages
NumPy 2
No ratings yet
NumPy 2
11 pages
Mds1111 Merged Numbered
No ratings yet
Mds1111 Merged Numbered
41 pages
NumPy Basics
No ratings yet
NumPy Basics
23 pages
Numpy
No ratings yet
Numpy
20 pages
45B AIML Practical1.1
No ratings yet
45B AIML Practical1.1
57 pages
Num Py Notes
No ratings yet
Num Py Notes
13 pages
Unit 1
No ratings yet
Unit 1
170 pages
NumPy Basics and Array Operations
No ratings yet
NumPy Basics and Array Operations
73 pages
Python Numpy and Pandas Interview Questions
No ratings yet
Python Numpy and Pandas Interview Questions
16 pages
Numpy Guide
No ratings yet
Numpy Guide
1 page
Lab-02 AI
No ratings yet
Lab-02 AI
14 pages
Day5 NumpyFoundation
No ratings yet
Day5 NumpyFoundation
4 pages
Numpy Library Basics
No ratings yet
Numpy Library Basics
16 pages
Arrays
No ratings yet
Arrays
28 pages
RAW Data
No ratings yet
RAW Data
22 pages
Workshop Notes-2 Handling Array With NumPy
No ratings yet
Workshop Notes-2 Handling Array With NumPy
13 pages
HKU - 7001 - 3.2 Managing Data II
No ratings yet
HKU - 7001 - 3.2 Managing Data II
67 pages
Lab 1 - Introduction
No ratings yet
Lab 1 - Introduction
14 pages
Unit III - Data Manipulation Using Python
No ratings yet
Unit III - Data Manipulation Using Python
16 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
Data Science Python Cheat Sheet
No ratings yet
Data Science Python Cheat Sheet
25 pages
ML3 Data Analysis
No ratings yet
ML3 Data Analysis
80 pages
Labmanualfds
No ratings yet
Labmanualfds
49 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages
Solar Racking for Installers
No ratings yet
Solar Racking for Installers
2 pages
Modbus Poll User's Manual
No ratings yet
Modbus Poll User's Manual
29 pages
Sony-Wfc500 Invoice
No ratings yet
Sony-Wfc500 Invoice
1 page
AMDPJ-MEGA-SYS-2101 - A - Sub-systemOverview (ATS)
No ratings yet
AMDPJ-MEGA-SYS-2101 - A - Sub-systemOverview (ATS)
27 pages
STEP BY STEP PROCEDURE TO USE PLANWIN Part I
No ratings yet
STEP BY STEP PROCEDURE TO USE PLANWIN Part I
146 pages
Icssr Seminar - 20241214 - 160236 - 0000
No ratings yet
Icssr Seminar - 20241214 - 160236 - 0000
5 pages
0005en PS7912 Im
No ratings yet
0005en PS7912 Im
8 pages
Fault and Alarm Troubleshooting Guide
No ratings yet
Fault and Alarm Troubleshooting Guide
10 pages
Eligible Candidates for SecB Registration 2013
No ratings yet
Eligible Candidates for SecB Registration 2013
50 pages
Resume Tuan Shaiful For Employment
No ratings yet
Resume Tuan Shaiful For Employment
3 pages
SPE93164 - Experimental Design As A Framework For Multiple Realisation History Matching F6 Further Development Studies
No ratings yet
SPE93164 - Experimental Design As A Framework For Multiple Realisation History Matching F6 Further Development Studies
15 pages
Ecodesign 2021 Rooftop AC Units
No ratings yet
Ecodesign 2021 Rooftop AC Units
16 pages
G3335-90215 MassHunter Offline Installation-En
No ratings yet
G3335-90215 MassHunter Offline Installation-En
22 pages
FB-7000 Series Catalog (2013)
No ratings yet
FB-7000 Series Catalog (2013)
4 pages
Creative Tech Grade 7 QRTR 1 Exam
No ratings yet
Creative Tech Grade 7 QRTR 1 Exam
5 pages
05DRGUNJANBAHETIKrCamaMAr 2024GG
No ratings yet
05DRGUNJANBAHETIKrCamaMAr 2024GG
15 pages
(BS EN 15015) - Plastics Piping Systems. Hot and Cold Water Piping Components. Requirements and Test - Assessment Methods For Pipes and Fittings
No ratings yet
(BS EN 15015) - Plastics Piping Systems. Hot and Cold Water Piping Components. Requirements and Test - Assessment Methods For Pipes and Fittings
35 pages
47-Manuscript ( - .PDF - .Doc) - 139-1-10-20200228
No ratings yet
47-Manuscript ( - .PDF - .Doc) - 139-1-10-20200228
11 pages
Airline Service Exec Training Equipment
No ratings yet
Airline Service Exec Training Equipment
2 pages
IoT Cameras for UAV Control
No ratings yet
IoT Cameras for UAV Control
4 pages
Unit 1: Production Engineering and Operations Infrastructure
No ratings yet
Unit 1: Production Engineering and Operations Infrastructure
1 page
Enterprise Systems: Key Characteristics
No ratings yet
Enterprise Systems: Key Characteristics
22 pages
BSECE 2019 Revised Curriculum OFFICIAL
100% (1)
BSECE 2019 Revised Curriculum OFFICIAL
3 pages
St. Petersburg Resiliency Plan
No ratings yet
St. Petersburg Resiliency Plan
4 pages
QT Truck Hub: Integrated Trucking Facility Design
No ratings yet
QT Truck Hub: Integrated Trucking Facility Design
3 pages
Mathematics Grade 10 Revision Material Term 1 - 2023
No ratings yet
Mathematics Grade 10 Revision Material Term 1 - 2023
26 pages
C Array Operations: 12 Programs
No ratings yet
C Array Operations: 12 Programs
57 pages
Activities of DS Tech Hub
0% (1)
Activities of DS Tech Hub
3 pages
Unemployment
No ratings yet
Unemployment
18 pages
Mechanical Separations: Unit Operation
No ratings yet
Mechanical Separations: Unit Operation
5 pages

Section 7

Uploaded by

Section 7

Uploaded by

Big Data

print(x[0:2, 2:4]) [[3 5]

y = x.astype('i4') [1.2 2.4 3.5]

print(y.dtype); print(y) int32

x = np.array([1.2, 2.4, 3.5], 'i4') [1 2 3]

print(x.dtype); print(x) int32

x = np.array([1.2, 2.4, 3.5], 'i1') [1 2 3]

print(x.dtype); print(x) int8

x = np.array([11, 31, 87, 19, 23, 43])

from numpy import random 86

#basics [30 79 10 14 94]

x = random.rand(); print(x) [ 0 65 56]

x = random.randint(100); print(x) [62 64 28]

x = random.randint(100, size=5); [16 37 24]]

x = random.randint(100, size=(5, 3)); [[0.28630843 0.87333319 0.07027453]

print(x) [0.82643457 0.81043574 0.47318528]

[0.38990336 0.267552 0.23475348]

x = random.rand(5,3); print(x) [0.55594484 0.29363382 0.97318952]]

# We can plot Normal Distribution, Binomial Distribution,

You might also like