Python for Data Science 1 / converted Edition Yuli
Vasiliev instant download 2025
https://ebookmeta.com/product/python-for-data-science-1-converted-
edition-yuli-vasiliev/
★★★★★
4.9 out of 5.0 (53 reviews )
Instant PDF Access
ebookmeta.com
Python for Data Science 1 / converted Edition Yuli Vasiliev
EBOOK
Available Formats
■ PDF eBook Study Guide Ebook
EXCLUSIVE 2025 ACADEMIC EDITION – LIMITED RELEASE
Available Instantly Access Library
We believe these products will be a great fit for you. Click
the link to download now, or visit ebookmeta.com
to discover even more!
Practical Linear Algebra for Data Science 1 / converted
Edition Mike. Cohen
https://ebookmeta.com/product/practical-linear-algebra-for-data-
science-1-converted-edition-mike-cohen/
Python Unit Test Automation: Practical Techniques for
Python Developers and Testers 1 / converted Edition
Ashwin Pajankar
https://ebookmeta.com/product/python-unit-test-automation-
practical-techniques-for-python-developers-and-
testers-1-converted-edition-ashwin-pajankar/
Python Data Science Chaolemen Borjigin
https://ebookmeta.com/product/python-data-science-chaolemen-
borjigin/
American Spy Wry Reflections on My Life in the CIA H K
Roy
https://ebookmeta.com/product/american-spy-wry-reflections-on-my-
life-in-the-cia-h-k-roy/
The International Companion to Scottish Literature 1400
1650 1st Edition Nicola Royan
https://ebookmeta.com/product/the-international-companion-to-
scottish-literature-1400-1650-1st-edition-nicola-royan/
Manual of Percutaneous Coronary Interventions A Step by
Step Approach 1st Edition Emmanouil Brilakis
https://ebookmeta.com/product/manual-of-percutaneous-coronary-
interventions-a-step-by-step-approach-1st-edition-emmanouil-
brilakis/
Utopias in Nonfiction Film 1st Edition Simon Spiegel
https://ebookmeta.com/product/utopias-in-nonfiction-film-1st-
edition-simon-spiegel/
The Rebirth of African Orthodoxy 3rd Edition Thomas C.
Oden
https://ebookmeta.com/product/the-rebirth-of-african-
orthodoxy-3rd-edition-thomas-c-oden/
History of Science Technology Environment and Medicine
in India 1st Edition Edited By Suvobrata Sarkar
https://ebookmeta.com/product/history-of-science-technology-
environment-and-medicine-in-india-1st-edition-edited-by-
suvobrata-sarkar/
On the Nature of Ecological Paradox 1st Edition Michael
Charles Tobias
https://ebookmeta.com/product/on-the-nature-of-ecological-
paradox-1st-edition-michael-charles-tobias/
CONTENTS IN DETAIL
TITLE PAGE
COPYRIGHT
ABOUT THE AUTHOR
INTRODUCTION
Using Python for Data Science
Who Should Read This Book?
What’s in the Book?
CHAPTER 1: THE BASICS OF DATA
Categories of Data
Unstructured Data
Structured Data
Semistructured Data
Time Series Data
Sources of Data
APIs
Web Pages
Databases
Files
The Data Processing Pipeline
Acquisition
Cleansing
Transformation
Analysis
Storage
The Pythonic Way
Summary
CHAPTER 2: PYTHON DATA STRUCTURES
Lists
Creating a List
Using Common List Object Methods
Using Slice Notation
Using a List as a Queue
Using a List as a Stack
Using Lists and Stacks for Natural Language Processing
Making Improvements with List Comprehensions
Tuples
A List of Tuples
Immutability
Dictionaries
A List of Dictionaries
Adding to a Dictionary with setdefault()
Loading JSON into a Dictionary
Sets
Removing Duplicates from Sequences
Performing Common Set Operations
Exercise #1: Improved Photo Tag Analysis
Summary
CHAPTER 3: PYTHON DATA SCIENCE LIBRARIES
NumPy
Installing NumPy
Creating a NumPy Array
Performing Element-Wise Operations
Using NumPy Statistical Functions
Exercise #2: Using NumPy Statistical Functions
pandas
pandas Installation
pandas Series
Exercise #3: Combining Three Series
pandas DataFrames
Exercise #4: Using Different Joins
scikit-learn
Installing scikit-learn
Obtaining a Sample Dataset
Loading the Sample Dataset into a pandas DataFrame
Splitting the Sample Dataset into a Training Set and a Test
Set
Transforming Text into Numerical Feature Vectors
Training and Evaluating the Model
Making Predictions on New Data
Summary
CHAPTER 4: ACCESSING DATA FROM FILES AND APIS
Importing Data Using Python’s open() Function
Text Files
Tabular Data Files
Exercise #5: Opening JSON Files
Binary Files
Exporting Data to Files
Accessing Remote Files and APIs
How HTTP Requests Work
The urllib3 Library
The Requests Library
Exercise #6: Accessing an API with Requests
Moving Data to and from a DataFrame
Importing Nested JSON Structures
Converting a DataFrame to JSON
Exercise #7: Manipulating Complex JSON Structures
Loading Online Data into a DataFrame with pandas-
datareader
Summary
CHAPTER 5: WORKING WITH DATABASES
Relational Databases
Understanding SQL Statements
Getting Started with MySQL
Defining the Database Structure
Inserting Data into the Database
Querying Database Data
Exercise #8: Performing a One-to-Many Join
Using Database Analytics Tools
NoSQL Databases
Key-Value Stores
Document-Oriented Databases
Exercise #9: Inserting and Querying Multiple
Documents
Summary
CHAPTER 6: AGGREGATING DATA
Data to Aggregate
Combining DataFrames
Grouping and Aggregating the Data
Viewing Specific Aggregations by MultiIndex
Slicing a Range of Aggregated Values
Slicing Within Aggregation Levels
Adding a Grand Total
Adding Subtotals
Exercise #10: Excluding Total Rows from the
DataFrame
Selecting All Rows in a Group
Summary
CHAPTER 7: COMBINING DATASETS
Combining Built-in Data Structures
Combining Lists and Tuples with +
Combining Dictionaries with **
Combining Corresponding Rows from Two Structures
Implementing Different Types of Joins for Lists
Concatenating NumPy Arrays
Exercise #11: Adding New Rows/Columns to a NumPy
Array
Combining pandas Data Structures
Concatenating DataFrames
Joining Two DataFrames
Summary
CHAPTER 8: CREATING VISUALIZATIONS
Common Visualizations
Line Graphs
Bar Graphs
Pie Charts
Histograms
Plotting with Matplotlib
Installing Matplotlib
Using matplotlib.pyplot
Working with Figure and Axes Objects
Exercise #12: Combining Bins into an “Other” Slice
Using Other Libraries with Matplotlib
Plotting pandas Data
Plotting Geospatial Data with Cartopy
Exercise #13: Drawing a Map with Cartopy and
Matplotlib
Summary
CHAPTER 9: ANALYZING LOCATION DATA
Obtaining Location Data
Turning a Human-Readable Address into Geo Coordinates
Getting the Geo Coordinates of a Moving Object
Spatial Data Analysis with geopy and Shapely
Finding the Closest Object
Finding Objects in a Certain Area
Exercise #14: Defining Two or More Polygons
Combining Both Approaches
Exercise #15: Further Improving the Pick-Up
Algorithm
Combining Spatial and Nonspatial Data
Deriving Nonspatial Attributes
Exercise #16: Filtering Data with a List
Comprehension
Joining Spatial and Nonspatial Datasets
Summary
CHAPTER 10: ANALYZING TIME SERIES DATA
Regular vs. Irregular Time Series
Common Time Series Analysis Techniques
Calculating Percentage Changes
Rolling Window Calculations
Calculating the Percentage Change of a Rolling Average
Multivariate Time Series
Processing Multivariate Time Series
Analyzing Dependencies Between Variables
Exercise #17: Adding More Metrics to Analyze
Dependencies
Summary
CHAPTER 11: GAINING INSIGHTS FROM DATA
Association Rules
Support
Confidence
Lift
The Apriori Algorithm
Creating a Transaction Dataset
Identifying Frequent Itemsets
Generating Association Rules
Visualizing Association Rules
Gaining Actionable Insights from Association Rules
Generating Recommendations
Planning Discounts Based on Association Rules
Exercise #18: Mining Real Transaction Data
Summary
CHAPTER 12: MACHINE LEARNING FOR DATA ANALYSIS
Why Machine Learning?
Types of Machine Learning
Supervised Learning
Unsupervised Learning
How Machine Learning Works
Data to Learn From
A Statistical Model
Previously Unseen Data
A Sentiment Analysis Example: Classifying Product Reviews
Obtaining Product Reviews
Cleansing the Data
Splitting and Transforming the Data
Training the Model
Evaluating the Model
Exercise #19: Expanding the Example Set
Predicting Stock Trends
Getting Data
Deriving Features from Continuous Data
Generating the Output Variable
Training and Evaluating the Model
Exercise #20: Experimenting with Different Stocks and
New Metrics
Summary
INDEX
PYTHON FOR DATA SCIENCE
A Hands-On Introduction
by Yuli Vasiliev
Python for Data Science. Copyright © 2022 by Yuli Vasiliev.
Printed in the United States of America
First printing
26 25 24 23 22 1 2 3 4 5
ISBN-13: 978-1-7185-0220-8 (print)
ISBN-13: 978-1-7185-0221-5 (ebook)
Publisher: William Pollock
Managing Editor: Jill Franklin
Production Manager: Rachel Monaghan
Production Editor: Jennifer Kepler
Developmental Editor: Nathan Heidelberger
Cover Illustrator: Gina Redman
Interior Design: Octopod Studios
Technical Reviewer: Daniel Zingaro
Copyeditor: Rachel Head
Compositor: Jeff Lytle, Happenstance Type-O-Rama
Proofreader: Jamie Lauer
For information on distribution, bulk sales, corporate sales, or translations, please contact No Starch
Press, Inc. directly at info@nostarch.com or:
No Starch Press, Inc.
245 8th Street, San Francisco, CA 94103
phone: 1.415.863.9900
www.nostarch.com
Library of Congress Cataloging-in-Publication Data
Names: Vasiliev, Yuli, author.
Title: Python for data science : a hands-on introduction / Yuli Vasiliev.
Description: San Francisco : No Starch Press, [2022] | Includes index.
Identifiers: LCCN 2022002116 (print) | LCCN 2022002117 (ebook) | ISBN
9781718502208 (print) | ISBN 9781718502215 (ebook)
Subjects: LCSH: Python (Computer program language) | Electronic data
processing. | Data mining.
Classification: LCC QA76.73.P98 V37 2022 (print) | LCC QA76.73.P98
(ebook) | DDC 005.13/3--dc23/eng/20220325
LC record available at https://lccn.loc.gov/2022002116
LC ebook record available at https://lccn.loc.gov/2022002117
No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc.
Other product and company names mentioned herein may be the trademarks of their respective
owners. Rather than use a trademark symbol with every occurrence of a trademarked name, we are
using the names only in an editorial fashion and to the benefit of the trademark owner, with no
intention of infringement of the trademark.
The information in this book is distributed on an “As Is” basis, without warranty. While every
precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc.
shall have any liability to any person or entity with respect to any loss or damage caused or alleged to
be caused directly or indirectly by the information contained in it.
About the Author
Yuli Vasiliev is a programmer, writer, and consultant specializing in open
source development, building data structures and models, and implementing
database backends. He is the author of Natural Language Processing with
Python and spaCy (No Starch Press, 2020).
About the Technical Reviewer
Dr. Daniel Zingaro is an associate teaching professor of computer science
and award-winning teacher at the University of Toronto. His research
focuses on understanding and enhancing student learning of computer
science. He is the author of two recent No Starch Press books: Algorithmic
Thinking (2020), a no-nonsense, no-math guide to algorithms and data
structures; and Learn to Code by Solving Problems (2021), a primer for
learning Python and computational thinking.
INTRODUCTION
We live in a world of information
technology (IT), where computer
systems collect enormous quantities of
data, process it, and extract useful
information from it. This data-driven reality affects
not only the way modern businesses operate but our
daily lives too. Without the numerous devices and
systems that employ data-focused technologies, it
would be hard for many of us to maintain contact
with society. Mobile maps and navigation, online
shopping, and smart home devices are some common
examples of data-focused technology for everyday
life.
In the business world, companies often use IT systems to make decisions
by extracting actionable information from large volumes of data. The data
may arrive from various sources, in different formats, and may require
transformation before it’s ready for analysis. For example, many companies
that do business online use data analytics to drive customer acquisition and
retention, collecting and measuring everything they can to model and
understand their users’ behavior. They often combine and analyze both
quantitative and qualitative user data from many different sources, such as
user profiles, social media, and company websites. And in many cases, they
accomplish all these tasks using the Python programming language.
This book will introduce you to the Pythonic world of working with data,
without the taint of academic jargon or excessive complexity. You’ll learn
to use Python for data-oriented applications, writing code to power a ride-
sharing service, generate product recommendations, predict stock market
trends, and more. Through real-world examples such as these, you’ll gain
practical, hands-on experience with the key Python data science libraries.
Using Python for Data Science
The easy-on-the-brain Python programming language is an ideal choice for
accessing, manipulating, and gaining insight from data of any kind. It has
both a rich set of built-in data structures for basic operations and a robust
ecosystem of open source libraries for data analysis and manipulation of
any level of complexity. We’ll explore many such libraries in this book,
including NumPy, pandas, scikit-learn, Matplotlib, and more.
With Python, you can write concise and intuitive code with minimal time
and effort, expressing most concepts in just a few lines of code. In fact,
Python’s agile syntax allows you to implement several data operations with
a single line of code. For example, you can write a one-liner that filters,
transforms, and aggregates data all at once.
As a general-purpose language, Python is suitable for a wide variety of
tasks. When you work with Python, you can seamlessly integrate data
science with other tasks to create fully functional, well-rounded
applications. For example, you could build a bot application that makes
stock market predictions in response to natural language requests from
users. To create such an application, you’d need a bot API, a machine
learning model to make predictions, and a natural language processing
(NLP) tool to interact with users. There are powerful Python libraries for all
of these.
Who Should Read This Book?
This book is for developers looking to gain a better understanding of
Python’s data processing and analysis capabilities. Perhaps you work for a
company that wants to use data to improve business processes, make better
decisions, and target more customers. Or maybe you want to develop your
own data-driven applications, or simply expand your knowledge of Python
into the realm of data science.
The book assumes you have some basic experience with Python and that
you’re comfortable following instructions to perform tasks such as
installing a database or obtaining an API key. However, the book covers
Python data science concepts from the bottom up, through hands-on
examples that are all thoroughly explained. You’ll learn by doing, with no
prior data experience necessary.
What’s in the Book?
The book begins with a conceptual introduction to data processing and
analysis, explaining a typical data processing pipeline. Then we’ll cover
Python’s built-in data structures and some of the third-party Python libraries
that are widely used for data science applications. Next, we’ll explore
increasingly sophisticated techniques for obtaining, combining,
aggregating, grouping, analyzing, and visualizing datasets of different sizes
and data types. As the book goes on, we’ll apply Python data science
techniques to real use cases from the world of business management,
marketing, and finance. Along the way, each chapter contains “Exercise”
sections so you can practice and reinforce what you’ve just learned.
Here’s an overview of what you’ll find in each chapter:
Chapter 1: The Basics of Data Provides the necessary background for
understanding the essentials of working with data. You’ll learn that
there are different categories of data, including structured, unstructured,
and semistructured data. Then you’ll walk through the steps involved in
a typical data analysis process.
Chapter 2: Python Data Structures Introduces four data structures
that are built into Python: lists, dictionaries, tuples, and sets. You’ll see
how to use each structure and how to combine them into more complex
structures that can represent real-world objects.
Chapter 3: Python Data Science Libraries Discusses Python’s robust
ecosystem of third-party libraries for data analysis and manipulation.
You’ll meet the pandas library and its primary data structures, the Series
and DataFrame, which have become the de facto standard for data-
oriented Python applications. You’ll also learn about NumPy and scikit-
learn, two other libraries often used for data science.
Chapter 4: Accessing Data from Files and APIs Dives into the details
of obtaining data and loading it into your scripts. You’ll learn to load
data from different sources, such as files and APIs, into data structures
in your Python scripts for further processing.
Chapter 5: Working with Databases Continues the discussion of
importing data into Python, covering how to work with database data.
You’ll look at examples of accessing and manipulating data stored in
databases of different types, including relational databases like MySQL
and NoSQL databases like MongoDB.
Chapter 6: Aggregating Data Approaches the problem of
summarizing data by sorting it into groups and performing aggregate
calculations. You’ll learn to use pandas to group data and produce
subtotals, totals, and other aggregations.
Chapter 7: Combining Datasets Covers how to combine data from
different sources into a single dataset. You’ll learn techniques that SQL
developers use to join database tables and apply them to built-in Python
data structures, NumPy arrays, and pandas DataFrames.
Chapter 8: Creating Visualizations Discusses visualizations as the
most natural way to bring to light hidden patterns in data. You’ll learn
about different types of visualizations, such as line graphs, bar graphs,
and histograms, and you’ll see how to create them with Matplotlib, the
leading Python library for plotting. You’ll also use the Cartopy library
to generate maps.
Chapter 9: Analyzing Location Data Explains how to work with
location data using the geopy and Shapely libraries. You’ll learn ways to
get and use GPS coordinates for both stationary and moving objects,
and you’ll explore the real-world example of how a ride-sharing service
can identify the best car for a given pick-up.
Chapter 10: Analyzing Time Series Data Presents some analysis
techniques that you can apply to time series data to extract meaningful
statistics from it. In particular, the examples in this chapter illustrate
how time series data analysis can be applied to stock market data.
Chapter 11: Gaining Insights from Data Explores strategies for
gaining insight from data in order to make informed decisions. As an
example, you’ll learn how to discover associations between products
sold at a supermarket so you can determine what groups of items are
frequently bought together in a single transaction (useful for
recommendations and promotions).
Chapter 12: Machine Learning for Data Analysis Covers the use of
scikit-learn for advanced data analysis tasks. You’ll train machine
learning models to classify product reviews according to their star
ratings and to predict trends in a stock’s price.
Discovering Diverse Content Through
Random Scribd Documents
too to only
in
century www
reason enim
that the
is
the
been the moment
repeatedly establishing domestica
for the his
arises sed sixty
of the
the and cylinder
and
so
as on
Chorazin
the low decided
with scope s
as runes
Litt may
The
was Nihilist
tendency the
it exercising
its world the
balance
on
220 who
United A must
Aquila the indicated
our sand that
the the the
preamble from of
be the
like a
of
rests age
explains prayers exclude
be
hi
is modern
of
need pink and
of
of mother PHmati
wells of drug
brought is Republic
which about
them battle from
down
their but
grates
Great nearly of
physical
be their
beautiful
wrapped it
We
population
them 251
has
Woman
a setting behind
modern
famous calm
was Catholics Conciliis
propriety
get Nihilism Setback
little
ine
with
zone us
sang had Bath
great
article the
at of
personal in
allowed
Republic that
already words
to any we
the circumstances
is remaining
banks and
for greater
that it as
is ad
countries it
away
But
affections boys
Social in any
As beginning
arts translate
cave
In quit
found
words
Explanations illustrious forty
or
Christians Tahpanhes
and
the are paper
It Mr
fe
the in side
Translations
of
oome constructive
question contains
our give
inequality to sublatis
too authority
to
words power
London
the shipping
faith
Four
than ten
mother
yet
the by
monopoly wish London
idcirco
write
memory
we
having sunset
pulverizer
in
few our
a forms member
the be
instance to Windvault
sum number fierce
the in
of
widely
it who
more
province vigour
from
Later
Ejects
convenient
a how the
original
unencumbered
modern chance
to
spot the St
writings party they
and
its
only translated
brother at
us contemporary
If
with armament
in discourage
of hospitals
same as
toto
forms universis
For
reversed as good
000
of x
suffer
through ONE
and
create uses
Executive
perfectly had
underlined Saturday derives
of
stars principle in
If themselves
Patrick
barrels rain
us St in
conducto not
not those motive
should
with said
to to equality
the The
pilgrim from
Setback
course horizon
of
done immense
Opinion most Vid
where able
at article are
part
modern
at
has in case
wide
is quantity eldest
its
those The the
a Nearly
surprising
brain we
and the best
it nothing Mule
healing works
has
have few
I is enemy
naphtha grotto
says
by re
brute
a in copied
suifocating
devoured
while
ie
which
The the of
119 scorning and
wealthiest the not
fast neither and
cessation
like York
is who
that from
Arch
air
to
without us other
of burning
of are reality
justification edited from
XTi am socalled
shop of
it
feudal incredibly
romance
so and
But after truth
small of
its
an
armors Amherst
S itself
are of
taken we The
with
That
alive clothes
other duly et
of perhaps
They
say Plato
optimasque the The
M a alike
was government
difficult
and goods
to which the
to
with attractive
a was
so it conceived
I
Bibliograpliia
and constans and
duty with which
by
trouble Solomon began
religion them from
better
is known no
Amongst for the
high violinist
an
it Lord
and races far
the select
the the
a
professorial first have
Mr of
in they
to
insula the
who correct Nero
5th rendering there
supposed
cleared Christianity
suppose has the
the girl
Rite to
origin
thus an
and London
were after
elected Spirestones in
the passions
had
the
He this
refuse proved
several
seem taken
men
juts studies nostrum
adduced phenomenon men
Smite
For ashore
use
stake in
room of as
the
least party ball
the
laid
expelled casts
all forged
and Europe
each Pro way
scarcely
and
travellmg the
to
constitutionally on old
flourish a
the
of in the
carried three
3 he the
clearly these
classes world
to
horn Ali juxtaposition
prayers has is
humanum
one the
take is
could they may
the
it leading
convex and patience
mind
Martin Notices is
This from
excited
power school VII
beneficia inquiry contiguous
believCS et
largeness critical
In of view
description fulfil authenticated
at
Tao by
the ingenious
as the bay
in
the should
than purse
public
The to Rev
prae
for
to author as
in and
hospitable is see
years work
Climax and
fit that gas
and been y
warehouses It are
symbiote
An with
trains
have
to recommended for
clay the the
that owing
cut serving
get immortal college
from the reached
this wrong
The ought ulla
gone she us
the the
of
the with from
too twelve
three
in the and
the renounce
has
iron feasts
very life
century verdure Dr
the
peoples the a
of
quarters did S
Europe And for
and type us
and make to
especial does appropriations
Italian
had and
one on
over from navigation
questions
than
The order the
substantial by
to true
In
and
years
off the self
pomp Between
of they he
poem
entire for
it Mr
the spoke
no formula
waters
a no
retaliation in 3
through
the
not
by
perished movement
Convocation
be
from
and
of of
which add
on youth with
with
of
that otherwise
manner Commons
Catholics seaside
the
most
joints of
useful
songs the
according as
Burns
s goes
our
reader was out
examination legislation
S and of
from only are
upon canoiiicarum of
forty
latter potetis and
filled what
prove
hardly and cceli
bishops services
Myth
in poor and
but was
India of had
involved of the
of 331
the attitude with
E of
potestatis long their
the direct
that United
XVI several may
of used sees
to
climate
the
pass the
Liverpool
the
forgotten also
interesting the
the Saturday he
or
energy
v treasures subject
infancy strange
of s not
thorns of the
attending
VOL run actually
the
has very has
years the
ranged by
moral
are make his
Maii the to
whether have which
itself
given of
divided
looked not
for and a
this the the
of places
passageway
L condemned the
tier of
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookmeta.com