0% found this document useful (0 votes)
33 views25 pages

Lesson 4

BA

Uploaded by

maheshrj83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views25 pages

Lesson 4

BA

Uploaded by

maheshrj83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Business Analytics

Mrs Hashani Kumarasinghe


Software Engineer -Data Science
Zone 24X7
hashanik@zone24X7.com
Todays Plan
1. Measures of central tendency
2. Measure of dispersion
3. Chebyshev’s theorem
4. Measure of shape
5. Descriptive statistics for grouped data
6. Descriptive statistics for categorical data
7. Measure of Association
Population and Samples
Population
Population

A population consists of all items of interest for a particular decision or investigation

Sample
sample
A sample is a subset of a population
Measures of central tendency
A measure of central tendency is a summary statistic that represents the center point or typical value of a
dataset. These measures indicate where most values in a distribution fall and are also referred to as the central
location of a distribution.

The three most common measures of central tendency are the mean, median, and mode.

Mean = sum of the observations divided by the number of observations

Population mean =
Measures of central tendency
Sample mean =

Property of mean

The sum of the deviations of each observation from the mean is zero
Measures of central tendency
Median

The measure of location that specifies the middle value when the data are arranged from least to greatest
is the median.

Mode

The mode is the observation that occurs most frequently.

How to calculate Central tendency measures using R ?


Measure of dispersion
Dispersion refers to the degree of variation in the data. The most common dispersion methods are Range ,
Variance and Standard deviation

Range = Difference between the maximum value and the minimum value in the data set.

Variance(Population) =

Variance(Sample) =

The larger the variance, the more the data are spread out from the mean and the more variability one can
expect in the observations.
Measure of dispersion
Standard Deviation: The standard deviation is the square root of the variance.

Population Std Deviation =

Sample std Deviation =


Chebyshev’s Theorem
For any set of data, the proportion of values that lie within k standard deviations (k>1) of the mean is at
least 1 - (1/k^2).

K= 2 → 1-(¼) = ¾ =75%

K= 3 → 1-(1/9=8/9→ 89%)

Example: For Cost per order data in the Purchase Orders database.

two standard deviation interval around the mean is [$33,390.34, $85,980.98] → 89/94 → 94.68%

three-standard deviation interval is [$63,233.17, $115,823.81] → 92/94 → 97.9%


Mean Sales per month for 2020= 200,000 LKR

Standard deviation = 10,000 LKR

Two Std deviation range(2* sigma) = 180,000 - 220,000

1 → 200,000

2→ 210,000

3→ 170,000
Empirical rules
1. Approximately 68% of the observations will fall within one standard deviation of the mean, or between
x - s and x + s.

2. Approximately 95% of the observations will fall within two standard deviations of the mean, or within
x-2s and x+2s.

3. Approximately 99.7% of the observations will fall within three standard deviations of the mean, or
within x -3s and x+3s.
Process Capability index
To measure how well a manufacturing process can achieve the specifications, we usually take a sample of
output, measure the dimension, compute the total variation using the third empirical rule.
Standardized Values
Standardized values (also called standard scores or normal deviates) are the same thing as z-scores. A
standardized value is what you get when you take a data point and scale it by population data. It tells us
how far from the mean we are in terms of standard deviations.
Coefficient of Variation
The coefficient of variation (CV) provides a relative measure of the dispersion in data relative to the mean
and is defined as

The coefficient of variation provides a relative measure of risk to return.


Measures of Shape
Consider the following data set. The histogram of Cost per order variable and A/P terms variables could be
different in shapes
Measures of Shape
Cost per order is ___Positively ____________________ Skewed

A/P Terms is ____Symmetric_________________________


Measures of Shape
The coefficient of skewness (CS) measures the degree of asymmetry of observations around the mean.
The coefficient of skewness is computed as
Characteristics of a Skewed Distribution
If the distribution is perfectly symmetrical and unimodal→ mean = mode= median

If the distribution is negatively skewed → mean < median < mode

If the distribution is positively skewed → mode < median < mean


Coefficient of kurtosis(CK)
Kurtosis refers to the peakedness (i.e., high, narrow) or flatness (i.e., short, flattopped) of a histogram. The
coefficient of kurtosis (CK) measures the degree of kurtosis of a population

Distributions with values of CK less than 3 are more flat with a wide degree of dispersion; those with values
of CK greater than 3 are more peaked with less dispersion.
Descriptive statistics for grouped data
Mean of the population =

Mean of the Sample =

Population Variance =

Sample Variance =
Descriptive statistics for Categorical data
Proportion: Proportions are key descriptive statistics for categorical data, such as defects or errors in
quality control applications or consumer preferences in market research. It is the fraction of the data
that have a certain characteristic

Ex:
Measure of association
How you will answer the following kind of questions

does a higher percentage of students in the top 10% of their high school class suggest a higher
graduation rate? Is acceptance rate related to the amount spent per student? Do schools with
lower acceptance rates tend to accept students with higher SAT scores?
Measure of association
Co variance = measure of the linear association between two variables.

Thus the Sample covariance will be =


Measure of association
Correlation: Correlation is a measure of the linear relationship between two variables, X and Y, which does not
depend on the units of measurement.

Correlation is measured using correlation coefficient.

Correlation coefficient for population =

Correlation coefficient for sample =


Measure of association

Diagram Source: web

You might also like