0% found this document useful (0 votes)
6 views4 pages

Udacity 1

tdtd

Uploaded by

ananiyagossaye8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views4 pages

Udacity 1

tdtd

Uploaded by

ananiyagossaye8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

e saw that we could calculate the variance as:

1n∑i=1n(xi−xˉ)2n1i=1∑n(xi−xˉ)2
You will also see:

1n−1∑i=1n(xi−xˉ)2n−11i=1∑n(xi−xˉ)2
The reason for this is beyond the scope of what we have covered thus far, but you can find an
explanation here(opens in a new tab).

You can commonly find answers to your questions with a quick Google search(opens in a new
tab). Now is a great time to get started with this practice! This answer should make more sense
at the completion of this lesson.

Standard Deviation vs. Variance

The standard deviation is the square root of the variance. In practice, you usually use the
standard deviation rather than the variance. The reason for this is because the standard deviation
shares the same units with our original data, while the variance has squared units.

What Next?

In the next sections, we will be looking at the last two aspects of quantitative
variables: shape and outliers. What we know about measures of center and measures of spread
will assist in your understanding of these final two aspects.

Supporting Materials

 Calculating Variance(opens in a new tab)

PreviousNext
Give Page Feedback
e saw that we could calculate the variance as:

1n∑i=1n(xi−xˉ)2n1i=1∑n(xi−xˉ)2
You will also see:

1n−1∑i=1n(xi−xˉ)2n−11i=1∑n(xi−xˉ)2
The reason for this is beyond the scope of what we have covered thus far, but you can find an
explanation here(opens in a new tab).

You can commonly find answers to your questions with a quick Google search(opens in a new
tab). Now is a great time to get started with this practice! This answer should make more sense
at the completion of this lesson.

Standard Deviation vs. Variance

The standard deviation is the square root of the variance. In practice, you usually use the
standard deviation rather than the variance. The reason for this is because the standard deviation
shares the same units with our original data, while the variance has squared units.

What Next?

In the next sections, we will be looking at the last two aspects of quantitative
variables: shape and outliers. What we know about measures of center and measures of spread
will assist in your understanding of these final two aspects.

Supporting Materials

 Calculating Variance(opens in a new tab)

PreviousNext
Give Page Feedback
When working with data, building a quick plot lets you quickly see the shape
of your data.

Distribution
Types of Data
Shape
Bell Shaped Heights, Weight, Scores
Left Skewed GPA, Age of Death, Price
Right Skewed Distribution of Wealth, Athletic Abilities

Of course, Ananiya! Let's dive into the concepts of outliers and the shape of
data.

Outliers

Outliers are data points that fall significantly far from the other values in a
dataset. They can skew your analysis and affect summary statistics like the
mean and standard deviation. For example, if you're analyzing the salaries of
a group of entrepreneurs and one of them is a CEO earning millions, that
value could be an outlier. It would inflate the mean salary, making it appear
much higher than what most entrepreneurs actually earn.

Identifying Outliers
There are several methods to identify outliers:

1. Visual Inspection: A simple way is to plot your data using a histogram or box
plot. If you see a point that stands out from the rest, it might be an outlier.
2. Statistical Methods: You can use techniques like the Z-score or the Interquartile
Range (IQR) method. For instance, in the IQR method, you calculate the first
(Q1) and third quartiles (Q3) and then find the IQR (Q3 - Q1). Any point that
lies below Q1 - 1.5 * IQR or above Q3 + 1.5 * IQR is considered an outlier.

Shape of Data

The shape of your data distribution can provide insights into its characteristics.
Common shapes include:

 Normal Distribution : Symmetrical, bell-shaped curve where most data points


cluster around the mean.
 Skewed Distribution : If the tail on one side is longer than the other, the data is
skewed. Right skew (positive skew) means the tail is on the right, while left
skew (negative skew) means the tail is on the left.
 Uniform Distribution : All values occur with roughly the same frequency.

Understanding the shape of your data helps in choosing the right statistical
methods and interpreting results correctly.

Example

Imagine you have the following dataset representing the ages of a group of
people: [22, 23, 24, 25, 26, 27, 28, 29, 30, 100]. The age "100" is an outlier
because it is significantly higher than the other values. If you calculate the
mean, it would be skewed upwards due to this outlier, while the median would
give a better central tendency of the data.

Do you have any specific questions about outliers or the shape of data? Or is
there a particular aspect you'd like to explore further?
Descriptive vs. Inferential Statistics
In this section, we learned about how Inferential Statistics differs
from Descriptive Statistics.

Descriptive Statistics

is about describing our collected data using the


Descriptive statistics
measures discussed throughout this lesson: measures of center, measures of
spread, the shape of our distribution, and outliers. We can also use plots of
our data to gain a better understanding.

Inferential Statistics

is about using our collected data to draw conclusions to


Inferential Statistics
a larger population. Performing inferential statistics well requires that we
take a sample that accurately represents our population of interest.

A common way to collect data is via a survey. However, surveys may be


extremely biased depending on the types of questions that are asked, and the
way the questions are asked. This is a topic you should think about when
tackling the first project.

We looked at specific examples that allowed us to identify the

1. Population - our entire group of interest.


2. Parameter - numeric summary about a population
3. Sample - a subset of the population
4. Statistic - numeric summary about a sample

Recap:

 Cost Per Acquisition (CPA) = (Marketing and Sales Cost)/ number of new
leads customers
 CPA is referring to marketing + sales costs (overhead, salaries) in the
numerator and includes only leads (non-paying customer) in the denominator.
 Here “acquisition” refers to a non-paying customer.

You might also like