Reading Files
Unit – 5B
Python for Everybody: Chapter 7
Software What
It is time to go find some
Next? Data to mess with!
Input Central
and Output Processing Files R
Devices Unit Us
Secondary
if x < 3: print Memory
Main From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
Memory Return-Path: <postmaster@collab.sakaiproject.org>
Date: Sat, 5 Jan 2008 09:12:18 -0500To:
source@collab.sakaiproject.orgFrom:
stephen.marquard@uct.ac.zaSubject: [sakai] svn commit: r39772 -
content/branches/Details:
http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772
...
File Processing
A text file can be thought of as a sequence of lines
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
Return-Path: <postmaster@collab.sakaiproject.org>
Date: Sat, 5 Jan 2008 09:12:18 -0500
To: source@collab.sakaiproject.org
From: stephen.marquard@uct.ac.za
Subject: [sakai] svn commit: r39772 - content/branches/
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772
Check file shared: mbox-short.txt
Opening a File
• Before we can read the contents of the file, we must tell Python
which file we are going to work with and what we will be doing
with the file
• This is done with the open() function
• open() returns a “file handle” - a variable used to perform
operations on the file
• Similar to “File -> Open” in a Word Processor
Using open()
fhand = open('mbox.txt', 'r')
• handle = open(filename, mode)
• returns a handle use to manipulate the file
• filename is a string
• mode is optional and should be 'r' if we are planning to
read the file and 'w' if we are going to write to the file
What is a Handle?
>>> fhand = open('mbox.txt')
>>> print(fhand)
<_io.TextIOWrapper name='mbox.txt' mode='r' encoding='UTF-8'>
When Files are Missing
>>> fhand = open('stuff.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or
directory: 'stuff.txt'
The newline Character
>>> stuff = 'Hello\nWorld!'
>>> stuff
• We use a special character 'Hello\nWorld!'
called the “newline” to indicate >>> print(stuff)
when a line ends Hello
World!
• We represent it as \n in strings >>> stuff = 'X\nY'
>>> print(stuff)
X
• Newline is still one character - Y
not two >>> len(stuff)
3
File Processing
A text file can be thought of as a sequence of lines
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
Return-Path: <postmaster@collab.sakaiproject.org>
Date: Sat, 5 Jan 2008 09:12:18 -0500
To: source@collab.sakaiproject.org
From: stephen.marquard@uct.ac.za
Subject: [sakai] svn commit: r39772 - content/branches/
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772
File Processing
A text file has newlines at the end of each line
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008\n
Return-Path: <postmaster@collab.sakaiproject.org>\n
Date: Sat, 5 Jan 2008 09:12:18 -0500\n
To: source@collab.sakaiproject.org\n
From: stephen.marquard@uct.ac.za\n
Subject: [sakai] svn commit: r39772 - content/branches/\n
\n
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772\n
Reading Files in Python
File Handle as a Sequence
• A file handle open for read can
be treated as a sequence of
strings where each line in the xfile = open('mbox.txt')
file is a string in the sequence for line in xfile:
print(line)
• We can use the for statement
to iterate through a sequence
• Remember - a sequence is an
ordered set
Counting Lines in a File
fhand = open('mbox.txt')
• Open a file read-only count = 0
for line in fhand:
• Use a for loop to read each line count = count + 1
print('Line Count:', count)
• Count the lines and print out
the number of lines
$ python open.py
Line Count: 132045
Reading the *Whole* File
>>> fhand = open('mbox-short.txt')
We can read the whole >>> inp = fhand.read()
file (newlines and all) >>> print(len(inp))
into a single string 94626
>>> print(inp[:20])
From stephen.marquar
Searching Through a File
We can put an if statement in fhand = open('mbox-short.txt')
for line in fhand:
our for loop to only print lines
if line.startswith('From:') :
that meet some criteria print(line)
OOPS!
From: stephen.marquard@uct.ac.za
What are all these blank
lines doing here? From: louis@media.berkeley.edu
From: zqian@umich.edu
From: rjlowe@iupui.edu
...
OOPS!
What are all these blank From: stephen.marquard@uct.ac.za\n
lines doing here? \n
From: louis@media.berkeley.edu\n
• Each line from the file \n
has a newline at the end From: zqian@umich.edu\n
\n
• The print statement adds From: rjlowe@iupui.edu\n
a newline to each line \n
...
Searching Through a File (fixed)
fhand = open('mbox-short.txt')
• We can strip the whitespace for line in fhand:
from the right-hand side of line = line.rstrip()
if line.startswith('From:') :
the string using rstrip() from print(line)
the string library
From: stephen.marquard@uct.ac.za
• The newline is considered
From: louis@media.berkeley.edu
“white space” and is From: zqian@umich.edu
stripped From: rjlowe@iupui.edu
....
Skipping with continue
fhand = open('mbox-short.txt')
We can conveniently for line in fhand:
skip a line by using the line = line.rstrip()
if not line.startswith('From:') :
continue statement continue
print(line)
Using in to Select Lines
fhand = open('mbox-short.txt')
We can look for a string for line in fhand:
anywhere in a line as our line = line.rstrip()
if not '@uct.ac.za' in line :
selection criteria continue
print(line)
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
X-Authentication-Warning: set sender to stephen.marquard@uct.ac.za using –f
From: stephen.marquard@uct.ac.za
Author: stephen.marquard@uct.ac.za
From david.horwitz@uct.ac.za Fri Jan 4 07:02:32 2008
X-Authentication-Warning: set sender to david.horwitz@uct.ac.za using -f...
fname = input('Enter the file name: ')
fhand = open(fname)
count = 0
Prompt for
for line in fhand:
if line.startswith('Subject:') :
count = count + 1
File Name
print('There were', count, 'subject lines in', fname)
Enter the file name: mbox.txt
There were 1797 subject lines in mbox.txt
Enter the file name: mbox-short.txt
There were 27 subject lines in mbox-short.txt
fname = input('Enter the file name: ')
try:
Bad File fhand = open(fname)
except:
print('File cannot be opened:', fname)
Names quit()
count = 0
for line in fhand:
if line.startswith('Subject:') :
count = count + 1
print('There were', count, 'subject lines in', fname)
Enter the file name: mbox.txt
There were 1797 subject lines in mbox.txt
Enter the file name: na na boo boo
File cannot be opened: na na boo boo
Summary
• Secondary storage • Searching for lines
• Opening a file - file handle • Reading file names
• File structure - newline character • Dealing with bad files
• Reading a file line by line with a
for loop
handling
what do you mean
by file
Use
tie data
Experiment extract
Store information
Analyze
Volatile fashion
If we don't have any file
If procening
use
handler
7 Open the file file
information to
the file
appen s
ay write the
Close the file
Syntax
accent enqdf.gg
yen file
name
fibject
Example test txt
read
test txt r Open the file
b open write
w Open the file
b test txt append
open the file
b
test.txt a'D Open
open
b close
Safe method of opening closing the file
try
b open test txt's r
finally
becloset
operations is wing
the with
The best way of file
It ensures that the opened file indeed
statement
É
ÉÉÉa
Ready tele python
File is existing in the working directory
test txt file name
is IT Capitol of India
Bangalore
b read 4
read the first 4 date
b read 6
b read read till the end of the file
read lineC is also used
Alternatively
Individual lines
khz toff or
w a
in Python
no as c
with open next
write name is Devaraj In
c my In
c write I am teaching python programming
one of the
best programming
c write It is
language
will create a new text file neath
This program
in the
current directory
igame is senoras
python programming
I am teaching
language
best programming
It is one of the
Append Python
II TpÉÉF txt
python
will
a
stay
as
as the
d
programy best
another decade
language for
ÉÉÉopens to file in read only mode
read only mode
rb It opens the file binary t
file read write mode
rt It opens the
It It opens the file read write binay made
I w It allows the write level aces
exist
file should already
cob write binary
write tread Rarely used
It
a It opens the file in append mode
ab Append binary
Append read write
at
rt
Id open att
do file operations
I
Close the filehandler
II EL
When a file is dosed the system free up
all the resources being used allocated
I procening
It
From hod meehi bmsee.acin.kledoet19 co 2o2o.tine.h
In path postmortem in
Ethic a
file
Read only default
fend
count 0
open mail txt
open mail txt r t
I
tastes read each line
Count
put number of lines
Kittie III I
From vicepnndpal bmscc.acin I
Y
shift open data 2 txt
to
mo
C if line.startswith from
stuffclose
bmsce.ac in
From pinapd
mode r encodingtutte
f hand open file name
t had open applog encodings
closeC
fhand
with open applog w
Id write into a file'm
dad part
f had open app log r
reads the first 15 date
fhand read 15
4 date
thad read 4
read rode till the end of
thad
the file
rename
affiant
renamelyplog banana.ly
remove app log
ay
GET
YA