0% found this document useful (0 votes)
40 views7 pages

Data Analyst

The Data Analyst program at Udacity teaches students to analyze data using Python libraries like NumPy and pandas, focusing on data wrangling and visualization skills. The curriculum includes hands-on projects that cover the data analysis process, data cleaning, and effective communication of findings through visualizations. Prerequisites include basic Python knowledge and descriptive statistics, with the program designed for intermediate learners over an estimated duration of 2 months.

Uploaded by

Pawandeep Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views7 pages

Data Analyst

The Data Analyst program at Udacity teaches students to analyze data using Python libraries like NumPy and pandas, focusing on data wrangling and visualization skills. The curriculum includes hands-on projects that cover the data analysis process, data cleaning, and effective communication of findings through visualizations. Prerequisites include basic Python knowledge and descriptive statistics, with the program designed for intermediate learners over an estimated duration of 2 months.

Uploaded by

Pawandeep Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

School of Data Science

Data Analyst
Syllabus

udacity.com
Data Analyst

BEFORE YOU START

Overview:
Learn how to analyze data using in-demand Python libraries
like NumPy and pandas. Students will start by going over the
basics of the data analysis process, then dive into advanced
data wrangling skills to work with messy, complex real-world
datasets. Finally, you will create highly customized
visualizations using the Matplotlib Python library.

Educational Objectives Prerequisites


This program prepares you for a career as a data analyst A well-prepared learner

by helping you learn to organize data, uncover patterns has experience with:
and insights, draw meaningful conclusions, and clearly Basic Python 

communicate critical ndings. ou ll develop pro ciency


fi Y ’ fi

in Python and its data analysis libraries NumPy, pandas, (


Descriptive Statistics

Matplotlib as you build a portfolio of pro ects to


) j
Machine Learning Fluency
showcase in your ob search.
j

Length of Program*: Skill level: School:


2 months
Intermediate
School of Data Science

Soft are ar
w /H dw are an ver ion re irement :
d s qu s

For this anode ree ro ram ou will need access to the Internet
N g p g , y .

Additional software such as P thon and its common data anal sis libraries e. . andas and at lotlib will be
y y ( g, p M p )

required but the ro ram includes Udacit or s aces with all of the relevant ac a es installed so students
, p g y W k p p k g ,

will not need to download an additional software.


y

*The length of this program is an estimation of total hours the average student may take to complete all required coursework,
including lecture and project time. If you spend about 5-10 hours per week working through the program, you should finish within
the time provided. Actual hours may vary.

udacity.com
Data A n a l y s t

Course #1:

Introduction to Data Analysis

with Pandas and NumPy
PROJECT #1

Investigate a Dataset

In this project, you will analyze a dataset and then communicate


your findings about it. This includes asking questions, exploring
the dataset, performing basic data wrangling, drawing
conclusions, and presenting your findings with numbers and
visualizations. Your analysis will be performed in a Jupyter
Notebook using the NumPy and pandas Python libraries.

Exploring and Inspecting Data


Supporting Lesson Content
Form and ask questions about data

Define data wrangling and EDA

Gather data

The Data Analysis Process


Describe the types of problems that Data Analysts can solve
Read CSV files with pandas

Use pandas to inspect and assess data


Describe the five steps in the data analysis process:
Question, Wrangle, Explore, Draw Conclusions, and
Communicate
M anipulating Data U sing Pandas and Nu m Py

Describe three important Python packages for data analysis: Use pandas to perform simple data cleaning tasks

NumPy, pandas, and Matplotlib Use the pandas query function to filter data

Fix column data types using pandas

Use pandas concatenate and merge to combine data

Jupyter Notebooks
Use pandas explode to expand data
Explain that Jupyter Notebooks can combine explanatory
text, math equations, code, and visualizations

C o mm unicating R esults
Create a new Jupyter Notebook

Use pandas to summarize a dataset

Use code and Markdown cells in a Jupyter Notebook

Use pandas plotting to create simple visualizations

Use keyboard shortcuts in a Jupyter Notebook

Draw conclusions from data using descriptive statistics and


Use magic keywords in a Jupyter Notebook
visualizations

Convert notebooks to other formats Use visuals to communicate results

udacity.com
Data A n a l y s t

Course #2:

Advanced Data Wrangling
PROJECT #2

Wrangle and Analyze Data


Real-world data rarely comes clean. Using Python and its
libraries, you will gather data from a variety of sources and in a
variety of formats, assess its quality and structure, then clean it.
This is called data wrangling. You will document your wrangling
efforts in a Jupyter Notebook, plus showcase them through
analyses and visualizations using Python (and its libraries).

Supporting Lesson Content Assessing Data


Describe the assessing phase

Distinguish between dirty data (content or “quality” issues)



Introduction to Data Wrangling and messy data (structural or “tidiness” issues)

Identify each step of the data wrangling process (gathering, Identify data quality issues and categorize them

assessing, and cleaning)

Assess data quality visually

Explain why data wrangling is important


Assess data quality programmatically using pandas

Strategize about data structuring needed for analytical datasets

Gathering Data Assess data structure visually

Describe the gathering phase


Assess data structure using pandas
Unzip file archives using Python

Extract gathered tabular data from flat files using pandas


Cleaning Data
Gather data by programmatically downloading files
Describe the cleaning phase

Extract data from text files using Python


Identify each step of the data cleaning process (defining,
Gather data by accessing APIs
coding, and testing)

Extract gathered data from JSON files


Define data cleaning tasks based on assessment findings

Gather and extract data from HTML files using BeautifulSoup


Clean data using Python

Extract data from a SQL database


Test cleaning code visually

Identify additional file formats that data analysts might Test cleaning programmatically using Python

encounter Store cleaned data using flat files

udacity.com
Data A nalyst

Course #3: Data Visualization



with Matplotlib and Seaborn PROJECT #3

Communicate Data Findings


In this course you will learn how to: In Part I, Exploratory data visualization, you will use Python
Implement a broad variety of visualizations to visualization libraries to systematically explore your selected
communicate key metrics and features of a dataset using dataset, starting with plots of single variables and building up
exploratory analysis.
to plots of multiple variables.

Apply appropriate plots, limits, transformations, and In Part II, Explanatory data visualization, you will produce a
aesthetics for exploratory analysis of a dataset, to short presentation that illustrates interesting properties,
understand variable distributions and features.
trends, and relationships that you discovered in your selected
dataset. The primary method of conveying your findings will
Utilize encodings and design principles to effectively be through transforming your exploratory visualizations from
respond to business questions using explanatory the first part into polished, explanatory visualizations.

analysis.

Univariate Exploration of Data


Supporting Lesson Content Use bar charts to depict distributions of categorical
variables.

Use histograms to depict distributions of numeric


Data Visualization in Data Analysis variables.

Understand why visualization is important in the practice Use axis limits and different scales to change how your
of data analysis.
data is interpreted.
Know what distinguishes exploratory analysis from
Explanatory analysis, and the role of data visualization in Multivariate Exploration of Data
each. Use encodings like size, shape, and color to encode values of
the third variable in a visualization.

Design of Visualizations Explore multiple relationships between multiple variables at


Interpret features in terms of the level of measurement.
the same time.

Know different encodings that can be used to depict data in Use feature engineering to capture relationships between
visualizations.
variables.
Understand various pitfalls that can affect the effectiveness
and truthfulness of visualizations. Explanatory Visualizations
Bivariate Exploration of Data
Understand what it means to tell a compelling story with
data.

Use scatterplots to depict relationships between numeric Choose the best plot type, encodings, and annotations to
variables.
polish your plots.

Use violin and box charts to depict relationships between Create high-quality image files using a Jupyter Notebook to
categorical and numeric variables.
convey your findings.
Use clustered bar charts to depict relationships between
categorical variables
Visualization Case Study
Use faceting to create plots across different subsets of
 Apply your knowledge of data visualization to a dataset
the data involving the characteristics of diamonds and their prices.

udacity.com
Data Analyst

Course #1 Instructor
Matt Maybeno
Principal Software Engineer

Matt is a Principal Software Engineer at SOCi. With a masters in Bioinformatics


from SDSU, he utilizes his cross domain expertise to build solutions in NLP and
predictive analytics.

Course #2 Instructor
Ria Cheruvu
Intel NEX AI Ethics Lead Architect
Ria is Intel NEX AI Ethics Lead Architect, leading trustworthy AI. She is an emerging
industry speaker and has a master’s in data science from Harvard University. Ria
previously served as a Teaching Fellow for Harvard's 2021 Data Science graduate
curriculum and Lead Instructor for Eduonix's ML Deployment course.

Course #2 Instructor
Josh Magee
Senior Data Scientist

Josh is a Senior Data Scientist at Local Logic, where he models commercial real
estate trends, acquisitions, and sustainable cities. He was formerly Assistant
Professor of Data Analytics at Stonehill College, and was a postdoctoral researcher
in nuclear physics at Lawrence Livermore National Laboratory.

udacity.com
Learn More at
www.udacity.com

udacity.com

You might also like