Welcome to the
course!
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N
Hugo Bowne-Anderson
Data Scientist at DataCamp
Import data
Flat files, e.g. .txts, .csvs
Files from other software
INTRODUCTION TO IMPORTING DATA IN PYTHON
Import data
Flat files, e.g. .txts, .csvs
Files from other software
Relational databases
INTRODUCTION TO IMPORTING DATA IN PYTHON
Plain text files
INTRODUCTION TO IMPORTING DATA IN PYTHON
Table data
titanic.csv
Name Sex Cabin Survived
Braund, Mr. Owen Harris male NaN 0
Cumings, Mrs. John Bradley female C85 1
Heikkinen, Miss. Laina female NaN 1
Futrelle, Mrs. Jacques Heath female C123 1
Allen, Mr. William Henry male NaN 0
1 Source: Kaggle
INTRODUCTION TO IMPORTING DATA IN PYTHON
Table data
titanic.csv
Name Sex Cabin Survived
_______________________________________________________
Braund, Mr. Owen Harris male NaN 0 <-- row
_______________________________________________________
Cumings, Mrs. John Bradley female C85 1
Heikkinen, Miss. Laina female NaN 1
Futrelle, Mrs. Jacques Heath female C123 1
Allen, Mr. William Henry male NaN 0
INTRODUCTION TO IMPORTING DATA IN PYTHON
Table data
titanic.csv
Name | Sex | Cabin Survived
Braund, Mr. Owen Harris | male | NaN 0
Cumings, Mrs. John Bradley | female | C85 1
Heikkinen, Miss. Laina | female | NaN 1
Futrelle, Mrs. Jacques Heath | female | C123 1
Allen, Mr. William Henry | male | NaN 0
^column
Flat file
INTRODUCTION TO IMPORTING DATA IN PYTHON
Reading a text file
filename = 'huck_finn.txt'
file = open(filename, mode='r') # 'r' is to read
text = file.read()
file.close()
INTRODUCTION TO IMPORTING DATA IN PYTHON
Printing a text file
print(text)
YOU don't know about me without you have read a book by
the name of The Adventures of Tom Sawyer; but that
ain't no matter. That book was made by Mr. Mark Twain,
and he told the truth, mainly. There was things which
he stretched, but mainly he told the truth. That is
nothing. never seen anybody but lied one time or
another, without it was Aunt Polly, or the widow, or
maybe Mary. Aunt Polly--Tom's Aunt Polly, she is--and
Mary, and the Widow Douglas is all told about in that
book, which is mostly a true book, with some
stretchers, as I said before.
INTRODUCTION TO IMPORTING DATA IN PYTHON
Writing to a file
filename = 'huck_finn.txt'
file = open(filename, mode='w') # 'w' is to write
file.close()
INTRODUCTION TO IMPORTING DATA IN PYTHON
Context manager with
with open('huck_finn.txt', 'r') as file:
print(file.read())
YOU don't know about me without you have read a book by
the name of The Adventures of Tom Sawyer; but that
ain't no matter. That book was made by Mr. Mark Twain,
and he told the truth, mainly. There was things which
he stretched, but mainly he told the truth. That is
nothing. never seen anybody but lied one time or
another, without it was Aunt Polly, or the widow, or
maybe Mary. Aunt Polly--Tom's Aunt Polly, she is--and
Mary, and the Widow Douglas is all told about in that
book, which is mostly a true book, with some
stretchers, as I said before.
INTRODUCTION TO IMPORTING DATA IN PYTHON
In the exercises, you’ll:
Print files to the console
Print specific lines
Discuss flat files
INTRODUCTION TO IMPORTING DATA IN PYTHON
Let's practice!
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N
The importance of
flat files in data
science
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N
Hugo Bowne-Anderson
Data Scientist at DataCamp
Flat files
titanic.csv
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
INTRODUCTION TO IMPORTING DATA IN PYTHON
Flat files
titanic.csv
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
Name Sex Cabin Survived
Braund, Mr. Owen Harris male NaN 0
Cumings, Mrs. John Bradley female C85 1
Heikkinen, Miss. Laina female NaN 1
Futrelle, Mrs. Jacques Heath female C123 1
Allen Mr William Henry male NaN 0
INTRODUCTION TO IMPORTING DATA IN PYTHON
Flat files
titanic.csv
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
________________________________________________________________________
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S <-- row
________________________________________________________________________
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
Name Sex Cabin Survived
Braund, Mr. Owen Harris male NaN 0
Cumings, Mrs. John Bradley female C85 1
Heikkinen, Miss. Laina female NaN 1
Futrelle, Mrs. Jacques Heath female C123 1
Allen, Mr. William Henry male NaN 0
INTRODUCTION TO IMPORTING DATA IN PYTHON
Flat files
titanic.csv
column
PassengerId,Survived,Pclass, | Name | ,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embar
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
Name Sex Cabin Survived
Braund, Mr. Owen Harris male NaN 0
Cumings, Mrs. John Bradley female C85 1
Heikkinen, Miss. Laina female NaN 1
Futrelle, Mrs. Jacques Heath female C123 1
Allen, Mr. William Henry male NaN 0
INTRODUCTION TO IMPORTING DATA IN PYTHON
Flat files
Text files containing records
That is, table data
Record: row of fields or attributes
titanic.csv
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
INTRODUCTION TO IMPORTING DATA IN PYTHON
Flat files
Text files containing records
That is, table data
Record: row of fields or attributes
Column: feature or attribute
titanic.csv
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
________________________________________________________________________
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S <-- row
________________________________________________________________________
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
INTRODUCTION TO IMPORTING DATA IN PYTHON
Flat files
Text files containing records
That is, table data
Record: row of fields or attributes
Column: feature or attribute
titanic.csv
column
PassengerId,Survived,Pclass, | Name | ,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embar
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
INTRODUCTION TO IMPORTING DATA IN PYTHON
Header
titanic.csv
________________________________________________________________________
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
________________________________________________________________________
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
INTRODUCTION TO IMPORTING DATA IN PYTHON
Header
titanic.csv
________________________________________________________________________
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
________________________________________________________________________
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2.3101282,7.925,,S
INTRODUCTION TO IMPORTING DATA IN PYTHON
File extension
.csv - Comma separated values
.txt - Text file
commas, tabs - Delimiters
INTRODUCTION TO IMPORTING DATA IN PYTHON
Tab-delimited file
MNIST.txt
pixel149 pixel150 pixel151 pixel152 pixel153
0 0 0 0 0
86 250 254 254 254
0 0 0 9 254
0 0 0 0 0
103 253 253 253 253
0 0 0 0 0
0 0 0 0 0
0 0 0 0 41
253 253 253 253 253
INTRODUCTION TO IMPORTING DATA IN PYTHON
Tab-delimited file
MNIST.txt
pixel149 pixel150 pixel151 pixel152 pixel153
0 0 0 0 0
86 250 254 254 254
0 0 0 9 254
0 0 0 0 0
103 253 253 253 253
0 0 0 0 0
0 0 0 0 0
0 0 0 0 41
253 253 253 253 253
MNIST image:
INTRODUCTION TO IMPORTING DATA IN PYTHON
How do you import flat files?
Two main packages: NumPy, pandas
Here, you’ll learn to import:
Flat files with numerical data (MNIST)
Flat files with numerical data and strings (titanic.csv)
INTRODUCTION TO IMPORTING DATA IN PYTHON
Let's practice!
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N
Importing flat files
using NumPy
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N
Hugo Bowne-Anderson
Data Scientist at DataCamp
Why NumPy?
NumPy arrays: standard for storing numerical data
INTRODUCTION TO IMPORTING DATA IN PYTHON
Why NumPy?
NumPy arrays: standard for storing numerical data
Essential for other packages: e.g. scikit-learn
loadtxt()
genfromtxt()
INTRODUCTION TO IMPORTING DATA IN PYTHON
Importing flat files using NumPy
import numpy as np
filename = 'MNIST.txt'
data = np.loadtxt(filename, delimiter=',')
data
[[ 0. 0. 0. 0. 0.]
[ 86. 250. 254. 254. 254.]
[ 0. 0. 0. 9. 254.]
...,
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
INTRODUCTION TO IMPORTING DATA IN PYTHON
Customizing your NumPy import
import numpy as np
filename = 'MNIST_header.txt'
data = np.loadtxt(filename, delimiter=',', skiprows=1)
print(data)
[[ 0. 0. 0. 0. 0.]
[ 86. 250. 254. 254. 254.]
[ 0. 0. 0. 9. 254.]
...,
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]
[ 0. 0. 0. 0. 0.]]
INTRODUCTION TO IMPORTING DATA IN PYTHON
Customizing your NumPy import
import numpy as np
filename = 'MNIST_header.txt'
data = np.loadtxt(filename, delimiter=',', skiprows=1, usecols=[0, 2])
print(data)
[[ 0. 0.]
[ 86. 254.]
[ 0. 0.]
...,
[ 0. 0.]
[ 0. 0.]
[ 0. 0.]]
INTRODUCTION TO IMPORTING DATA IN PYTHON
Customizing your NumPy import
data = np.loadtxt(filename, delimiter=',', dtype=str)
INTRODUCTION TO IMPORTING DATA IN PYTHON
Mixed datatypes
titanic.csv
Name Sex Cabin Fare
Braund, Mr. Owen Harris male NaN 7.3
Cumings, Mrs. John Bradley female C85 71.3
Heikkinen, Miss. Laina female NaN 8.0
Futrelle, Mrs. Jacques Heath female C123 53.1
Allen, Mr. William Henry male NaN 8.05
1 Source: Kaggle
INTRODUCTION TO IMPORTING DATA IN PYTHON
Mixed datatypes
titanic.csv
Name Sex Cabin Fare
Braund, Mr. Owen Harris male NaN 7.3
Cumings, Mrs. John Bradley female C85 71.3
Heikkinen, Miss. Laina female NaN 8.0
Futrelle, Mrs. Jacques Heath female C123 53.1
Allen, Mr. William Henry male NaN 8.05
^ ^
strings floats
1 Source: Kaggle
INTRODUCTION TO IMPORTING DATA IN PYTHON
Let's practice!
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N
Importing flat files
using pandas
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N
Hugo Bowne-Anderson
Data Scientist at DataCamp
What a data scientist needs
Two-dimensional labeled data structure(s)
Columns of potentially different types
Manipulate, slice, reshape, groupby, join, merge
Perform statistics
Work with time series data
INTRODUCTION TO IMPORTING DATA IN PYTHON
Pandas and the DataFrame
INTRODUCTION TO IMPORTING DATA IN PYTHON
Pandas and the DataFrame
INTRODUCTION TO IMPORTING DATA IN PYTHON
Pandas and the DataFrame
DataFrame = pythonic analog of R’s data frame
INTRODUCTION TO IMPORTING DATA IN PYTHON
Pandas and the DataFrame
INTRODUCTION TO IMPORTING DATA IN PYTHON
Manipulating pandas DataFrames
Exploratory data analysis
Data wrangling
Data preprocessing
Building models
Visualization
Standard and best practice to use pandas
INTRODUCTION TO IMPORTING DATA IN PYTHON
Importing using pandas
import pandas as pd
filename = 'winequality-red.csv'
data = pd.read_csv(filename)
data.head()
volatile acidity citric acid residual sugar
0 0.70 0.00 1.9
1 0.88 0.00 2.6
2 0.76 0.04 2.3
3 0.28 0.56 1.9
4 0.70 0.00 1.9
data_array = data.values
INTRODUCTION TO IMPORTING DATA IN PYTHON
You’ll experience:
Importing flat files in a straightforward manner
Importing flat files with issues such as comments and missing
values
INTRODUCTION TO IMPORTING DATA IN PYTHON
Let's practice!
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N
Final thoughts on
data import
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N
Hugo Bowne-Anderson
Data Scientist at DataCamp
Next chapters:
Import other file types:
Excel, SAS, Stata
Feather
Interact with relational databases
INTRODUCTION TO IMPORTING DATA IN PYTHON
Next course:
Scrape data from the web
Interact with APIs
INTRODUCTION TO IMPORTING DATA IN PYTHON
Let's practice!
I N T R O D U C T I O N T O I M P O R T I N G D ATA I N P Y T H O N