School of Data Science
Data Analyst
Syllabus
udacity.com
Data Analyst
BEFORE YOU START
Overview:
Learn how to analyze data using in-demand Python libraries
like NumPy and pandas. Students will start by going over the
basics of the data analysis process, then dive into advanced
data wrangling skills to work with messy, complex real-world
datasets. Finally, you will create highly customized
visualizations using the Matplotlib Python library.
Educational Objectives Prerequisites
This program prepares you for a career as a data analyst A well-prepared learner
by helping you learn to organize data, uncover patterns has experience with:
and insights, draw meaningful conclusions, and clearly Basic Python
communicate critical ndings. ou ll develop pro ciency
fi Y ’ fi
in Python and its data analysis libraries NumPy, pandas, (
Descriptive Statistics
Matplotlib as you build a portfolio of pro ects to
) j
Machine Learning Fluency
showcase in your ob search.
j
Length of Program*: Skill level: School:
2 months
Intermediate
School of Data Science
Soft are ar
w /H dw are an ver ion re irement :
d s qu s
For this anode ree ro ram ou will need access to the Internet
N g p g , y .
Additional software such as P thon and its common data anal sis libraries e. . andas and at lotlib will be
y y ( g, p M p )
required but the ro ram includes Udacit or s aces with all of the relevant ac a es installed so students
, p g y W k p p k g ,
will not need to download an additional software.
y
*The length of this program is an estimation of total hours the average student may take to complete all required coursework,
including lecture and project time. If you spend about 5-10 hours per week working through the program, you should finish within
the time provided. Actual hours may vary.
udacity.com
Data A n a l y s t
Course #1:
Introduction to Data Analysis
with Pandas and NumPy
PROJECT #1
Investigate a Dataset
In this project, you will analyze a dataset and then communicate
your findings about it. This includes asking questions, exploring
the dataset, performing basic data wrangling, drawing
conclusions, and presenting your findings with numbers and
visualizations. Your analysis will be performed in a Jupyter
Notebook using the NumPy and pandas Python libraries.
Exploring and Inspecting Data
Supporting Lesson Content
Form and ask questions about data
Define data wrangling and EDA
Gather data
The Data Analysis Process
Describe the types of problems that Data Analysts can solve
Read CSV files with pandas
Use pandas to inspect and assess data
Describe the five steps in the data analysis process:
Question, Wrangle, Explore, Draw Conclusions, and
Communicate
M anipulating Data U sing Pandas and Nu m Py
Describe three important Python packages for data analysis: Use pandas to perform simple data cleaning tasks
NumPy, pandas, and Matplotlib Use the pandas query function to filter data
Fix column data types using pandas
Use pandas concatenate and merge to combine data
Jupyter Notebooks
Use pandas explode to expand data
Explain that Jupyter Notebooks can combine explanatory
text, math equations, code, and visualizations
C o mm unicating R esults
Create a new Jupyter Notebook
Use pandas to summarize a dataset
Use code and Markdown cells in a Jupyter Notebook
Use pandas plotting to create simple visualizations
Use keyboard shortcuts in a Jupyter Notebook
Draw conclusions from data using descriptive statistics and
Use magic keywords in a Jupyter Notebook
visualizations
Convert notebooks to other formats Use visuals to communicate results
udacity.com
Data A n a l y s t
Course #2:
Advanced Data Wrangling
PROJECT #2
Wrangle and Analyze Data
Real-world data rarely comes clean. Using Python and its
libraries, you will gather data from a variety of sources and in a
variety of formats, assess its quality and structure, then clean it.
This is called data wrangling. You will document your wrangling
efforts in a Jupyter Notebook, plus showcase them through
analyses and visualizations using Python (and its libraries).
Supporting Lesson Content Assessing Data
Describe the assessing phase
Distinguish between dirty data (content or “quality” issues)
Introduction to Data Wrangling and messy data (structural or “tidiness” issues)
Identify each step of the data wrangling process (gathering, Identify data quality issues and categorize them
assessing, and cleaning)
Assess data quality visually
Explain why data wrangling is important
Assess data quality programmatically using pandas
Strategize about data structuring needed for analytical datasets
Gathering Data Assess data structure visually
Describe the gathering phase
Assess data structure using pandas
Unzip file archives using Python
Extract gathered tabular data from flat files using pandas
Cleaning Data
Gather data by programmatically downloading files
Describe the cleaning phase
Extract data from text files using Python
Identify each step of the data cleaning process (defining,
Gather data by accessing APIs
coding, and testing)
Extract gathered data from JSON files
Define data cleaning tasks based on assessment findings
Gather and extract data from HTML files using BeautifulSoup
Clean data using Python
Extract data from a SQL database
Test cleaning code visually
Identify additional file formats that data analysts might Test cleaning programmatically using Python
encounter Store cleaned data using flat files
udacity.com
Data A nalyst
Course #3: Data Visualization
with Matplotlib and Seaborn PROJECT #3
Communicate Data Findings
In this course you will learn how to: In Part I, Exploratory data visualization, you will use Python
Implement a broad variety of visualizations to visualization libraries to systematically explore your selected
communicate key metrics and features of a dataset using dataset, starting with plots of single variables and building up
exploratory analysis.
to plots of multiple variables.
Apply appropriate plots, limits, transformations, and In Part II, Explanatory data visualization, you will produce a
aesthetics for exploratory analysis of a dataset, to short presentation that illustrates interesting properties,
understand variable distributions and features.
trends, and relationships that you discovered in your selected
dataset. The primary method of conveying your findings will
Utilize encodings and design principles to effectively be through transforming your exploratory visualizations from
respond to business questions using explanatory the first part into polished, explanatory visualizations.
analysis.
Univariate Exploration of Data
Supporting Lesson Content Use bar charts to depict distributions of categorical
variables.
Use histograms to depict distributions of numeric
Data Visualization in Data Analysis variables.
Understand why visualization is important in the practice Use axis limits and different scales to change how your
of data analysis.
data is interpreted.
Know what distinguishes exploratory analysis from
Explanatory analysis, and the role of data visualization in Multivariate Exploration of Data
each. Use encodings like size, shape, and color to encode values of
the third variable in a visualization.
Design of Visualizations Explore multiple relationships between multiple variables at
Interpret features in terms of the level of measurement.
the same time.
Know different encodings that can be used to depict data in Use feature engineering to capture relationships between
visualizations.
variables.
Understand various pitfalls that can affect the effectiveness
and truthfulness of visualizations. Explanatory Visualizations
Bivariate Exploration of Data
Understand what it means to tell a compelling story with
data.
Use scatterplots to depict relationships between numeric Choose the best plot type, encodings, and annotations to
variables.
polish your plots.
Use violin and box charts to depict relationships between Create high-quality image files using a Jupyter Notebook to
categorical and numeric variables.
convey your findings.
Use clustered bar charts to depict relationships between
categorical variables
Visualization Case Study
Use faceting to create plots across different subsets of
Apply your knowledge of data visualization to a dataset
the data involving the characteristics of diamonds and their prices.
udacity.com
Data Analyst
Course #1 Instructor
Matt Maybeno
Principal Software Engineer
Matt is a Principal Software Engineer at SOCi. With a masters in Bioinformatics
from SDSU, he utilizes his cross domain expertise to build solutions in NLP and
predictive analytics.
Course #2 Instructor
Ria Cheruvu
Intel NEX AI Ethics Lead Architect
Ria is Intel NEX AI Ethics Lead Architect, leading trustworthy AI. She is an emerging
industry speaker and has a master’s in data science from Harvard University. Ria
previously served as a Teaching Fellow for Harvard's 2021 Data Science graduate
curriculum and Lead Instructor for Eduonix's ML Deployment course.
Course #2 Instructor
Josh Magee
Senior Data Scientist
Josh is a Senior Data Scientist at Local Logic, where he models commercial real
estate trends, acquisitions, and sustainable cities. He was formerly Assistant
Professor of Data Analytics at Stonehill College, and was a postdoctoral researcher
in nuclear physics at Lawrence Livermore National Laboratory.
udacity.com
Learn More at
www.udacity.com
udacity.com