0% found this document useful (0 votes)
131 views25 pages

FDS Unit 3

This document provides an overview of statistics essentials for data science, including populations and samples, descriptive statistics, measures of central tendency, and distributions. Key points: - Populations are complete collections of data, while samples are subsets used to generalize about populations. Samples should represent the overall population. - Descriptive statistics summarize and organize data, while inferential statistics generalize from samples to populations. - Measures of central tendency describe typical values and include the mode (most frequent value), median (middle value), and mean (average). These help summarize key attributes of data sets.

Uploaded by

shaliniboddu521
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
131 views25 pages

FDS Unit 3

This document provides an overview of statistics essentials for data science, including populations and samples, descriptive statistics, measures of central tendency, and distributions. Key points: - Populations are complete collections of data, while samples are subsets used to generalize about populations. Samples should represent the overall population. - Descriptive statistics summarize and organize data, while inferential statistics generalize from samples to populations. - Measures of central tendency describe typical values and include the mode (most frequent value), median (middle value), and mean (average). These help summarize key attributes of data sets.

Uploaded by

shaliniboddu521
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

4-1 B.

Tech CIVIL Regulation: R20 FDS: UNIT-3

UNIT-3
Statistics Essentials for Data Science
Syllabus:
Statistics Essentials for Data Science: Sample or Population Data? The
Fundamentals of Descriptive Statistics, Measures of Central Tendency,
Asymmetry, and Variability, Practical Example: Descriptive Statistics,
Distributions, Estimators and Estimates, Normal distributions – z scores – normal
curve problems
Populations and Samples
 In statistics, a population refers to any complete collection of observations or
potential observations, whereas a sample refers to any smaller collection of
actual observations drawn from a population.
 In everyday life, populations often are viewed as collections of real objects (e.g.,
people, whales, automobiles), whereas in statistics, populations may be viewed
more abstractly as collections of properties or measurements (e.g., the ethnic
backgrounds of people, life spans of whales, gas mileage of automobiles).
 Depending on our perspective, a given set of observations can be either a
population or a sample
 Ordinarily, populations are quite large and exist only as potential observations
(e.g., the potential scores of all U.S. college students on a test that measures
anxiety). On the other hand, samples are relatively small and exist as actual
observations (the actual scores of 100 college students on the test for anxiety).
 When using a sample (100 actual scores) to generalize to a population (millions
of potential scores), it is important that the sample represent the population;
otherwise, any generalization might be erroneous.
 Population Vs Sample:
Population Sample
Any complete collection of Any smaller collection of actual
observations or potential observations. observations from a population.
populations are quite large samples are relatively small
exist only as potential observations exist as actual observations
The population is complete set The sample is subset of population
Example: All the students in the class Example: the top 10 students in the
are population class are the sample.
The Fundamentals of Descriptive Statistics
 Statistics exists because of the prevalence of variability in the real world.
 It consists of two main subdivisions: descriptive statistics, which is concerned
with organizing and summarizing information for sets of actual observations,
and inferential statistics, which is concerned with generalizing beyond sets of
actual observations—that is, generalizing from a sample to a population.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 1
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

 Descriptive statistics will provide tools, such as tables, graphs, and averages
that help us describe and organize the inevitable variability among observations.
 Examples are:
 A tabular listing, ranked from most to least, of the total number of
romantic affairs during college reported anonymously by each member of
our stat class
 A graph showing the annual change in global temperature during the last
30 years
 A report that describes the average difference in grade point average
(GPA) between college students who regularly drink alcoholic beverages
and those who don’t.
Descriptive Statistics vs inferential Statistics

Descriptive Statistics Inferential Statistics


Make inferences and draw conclusions
Describe and summarize data
about a population based on sample data
Analyzes and interprets the Uses sample data to make generalizations
characteristics of a dataset or predictions about a larger population
Focuses on a subset of the population
Focuses on the entire population or
(sample) to draw conclusions about the
dataset
entire population
Estimates parameters, tests hypotheses,
Provides measures of central tendency
and determines the level of confidence or
and dispersion
significance in the results
Hypothesis testing, confidence intervals,
Mean, median, mode, standard
regression analysis, ANOVA (analysis of
deviation, range, frequency tables
variance), chi-square tests, t-tests, etc.
Generalize findings to a larger
population, make predictions, test
Summarize, organize, and present data
hypotheses, evaluate relationships, and
support decision-making
Estimated using sample statistics (e.g.,
Not typically estimated sample mean as an estimate of population
mean)
Crucial; the sample should be
Sample Representation not required representative of the population to ensure
accurate inferences

Measures of Central Tendency


(Describing Data with Averages)
 Averages consist of numbers (or words) about which the data are, in some
sense, centred. They are often referred to as measures of central tendency
Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 2
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

 A measure of center is a single number used to describe a set of numeric data.


It describes a typical value from the data set.
 Several types of average yield numbers or words that attempt to describe, most
generally, the middle or typical value for a distribution.
 Three different measures of central tendency are:
 Mode
 Median
 Mean.
 Each of these has its special uses, but the mean is the most important average in
both descriptive and inferential statistics.

Mode : Most Frequent


Median: Middle Most
Mean: Average(Balance Point)
Mode
 A mode is defined as the value that has a higher frequency in a given set of
values. It is the value that appears the most number of times.
 Example: In the given set of data: 2, 4, 5, 5, 6, 7, the mode of the data set is 5
since it has appeared in the set twice.
 The mode equals the value of the most frequently occurring or typical score.
 It is easy to assign a value to the mode. If the data are organized. However, if
the data are not organized, some counting may be required.
 The mode is readily understood as the most prevalent or typical value.
 Distributions can have more than one mode (or no mode at all).
 Distributions with two obvious peaks, even though they are not exactly the same
height, are referred to as bimodal.
 Distributions with more than two peaks are referred to as multimodal.
 The presence of more than one mode might reflect important differences among
subsets of data. For instance, the distribution of weights for both male and
female statistics students would most likely be bimodal, reflecting the
combination of two separate weight distributions—a heavier one for males and
a lighter one for females.
 Example1: Determine the mode for the following retirement ages: 60, 63, 45,
63, 65, 70, 55, 63, 60, 65, 63.
Answer: mode = 63
 Example2: The owner of a new car conducts six gas mileage tests and obtains
the following results, expressed in miles per gallon: 26.3, 28.7, 27.4, 26.6, 27.4,
26.9. Find the mode for these data.
Answer: mode = 27.4

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 3
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

Median
 The median reflects the middle value when observations are ordered from least
to most.
 The median splits a set of ordered observations into two equal parts, the upper
and lower halves.
 In other words, the median has a percentile rank of 50, since observations with
equal or smaller values constitute 50 percent of the entire distribution.
 To find the median, scores always must be ordered from least to most (or vice
versa). This task is straightforward with small sets of data but becomes
increasingly cumbersome with larger sets of data that must be ordered manually.
 When the total number of scores is odd, there is a single middle-ranked score, and
the value of the median equals the value of this score. When the total number of
scores is even, the value of the median equals a value midway between the values
of the two middlemost scores.

 In either case, the value of the median always reflects the value of middle-ranked
scores, not the position of these scores among the set of ordered scores
 Example 1: Find the median for the following retirement ages: 60, 63, 45, 63,65,
70, 55, 63, 60, 65, 63.
Solution: Median = 63
 Example2: Find the median for the following gas mileage tests: 26.3, 28.7, 27.4,
26.6, 27.4, 26.9.
Solution: Median = 27.15 (halfway between 26.9 and 27.4)
Mean
 The mean is the most common average.
 The mean is found by adding all scores and then dividing by the number of
scores.
 That is

 There is no requirement that presidential terms be ranked before calculating the


mean.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 4
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

 Even when large sets of unorganized data are involved, the calculation of the
mean is usually straightforward, particularly with the aid of a calculator or
computer.
 The mean serves as the balance point for its frequency distribution.
 Mean cannot be used with qualitative data.
 Example 1: Find the mean for the following retirement ages: 60, 63, 45, 63, 65,
70, 55, 63, 60, 65, 63.
Solution:

 Example 2: Find the mean for the following gas mileage tests: 26.3, 28.7, 27.4,
26.6, 27.4, 26.9.
Solution:

 Mean = Sum of observation/Number of observation


 Median = {(n+1)/2}th term when n is odd
& Median = [(n/2)th term + {(n/2)+1}th]/2 when n is even
 Mode = Value repeated the maximum number of times

Which Average?
 When a distribution of scores is not too skewed, the values of the mode, median,
and mean are similar, and any of them can be used to describe the central
tendency of the distribution.
 When extreme scores cause a distribution to be skewed, the values of the three
averages can differ appreciably.
 Unlike the mode and median, the mean is very sensitive to extreme scores, or
outliers.
 In the long run, however, the mean is the single most preferred average for
quantitative data.
 Ideally, when a distribution is skewed, report both the mean and the median.
Appreciable differences between the values of the mean and median signal the
presence of a skewed distribution.
 If the mean exceeds the media, the underlying distribution is positively skewed
because of one or more scores with relatively large values.
 On the other hand, if the median exceeds the mean, the underlying distribution
is negatively skewed because of one or more scores with relatively small values.
 Following summarizes the relationship between the various averages and the
two types of skewed distributions (shown as smoothed curves).

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 5
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

Averages for Qualitative and Ranked Data


Mode Always Appropriate for Qualitative Data
 For quantitative data, in principle, all three averages can be used.
 The mode always can be used with qualitative data.
Median Sometimes Appropriate for Qualitative Data
 The median can be used whenever it is possible to order qualitative data from
least to most because the level of measurement is ordinal.
 It’s easiest to determine the median class for ordered qualitative data by using
relative frequencies
Mean cannot be used with qualitative data.
Averages for Ranked Data
 When the data consist of a series of ranks, with its ordinal level of measurement,
the median rank always can be obtained. It’s simply the middlemost or average
of the two middlemost ranks.
 The mean and modal ranks tend not to be very informative and will not be
discussed.
Asymmetry (Skewness)
 Asymmetry in the lack of balance or symmetry. Asymmetry is an imbalance –
a skew or tilt to one side more than other.
 In statistics, skewness is a measure of the asymmetry of the probability
distribution. Skewness is a measure of the degree of asymmetry of a
distribution.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 6
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

 If the left tail is larger than the right tail, the function is said to have negative
skewness. If the right tail is larger, it has a positive skew. If the two are equal, it
has zero skewness.
 Asymmetrical distribution is a situation in which the values of variables occur
at irregular frequencies and the mean, median, and mode occur at different
points. An asymmetric distribution exhibits skewness.
 Data in most real applications are not symmetric. They may instead be either
positively skewed, where the mode occurs at a value that is smaller than the
median or negatively skewed, where the mode occurs at a value greater than the
Median

Estimators and Estimates


 An estimator is a function of the sample, i.e., it is a rule that tells us how to
calculate an estimate of a parameter from a sample.
 An estimate is a value of an estimator calculated from a sample.
 An estimator is a statistic that estimates some fact about the population. We can
also think of an estimator as the rule that creates an estimate. For example,
the sample mean (x̄) is an estimator for the population mean, μ.
 The quantity that is being estimated (i.e. the one we want to know) is called
the estimand.
 For example, let’s say we wanted to know the average height of children in a
certain school with a population of 1000 students. We take a sample of 30
children, measure them and find that the mean height is 56 inches. This is our
sample mean, the estimator. We use the sample mean to estimate that the
population mean (our estimand) is about 56 inches.
 Estimators can be a range of values (like a confidence interval) or a single
value (like the standard deviation). When an estimator is a range of values, it’s
called an interval estimate. For the height example above, we might add on a
confidence interval of a couple of inches either way, say 54 to 58 inches. When
it is a single value — like 56 inches — it’s called a point estimate.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 7
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

Types of Estimators
 Estimators can be described in several ways:
 Biased: a statistic that is either an overestimate or an underestimate.
 Efficient: a statistic with small variances (the one with the smallest
possible variance is also called the “best”). Inefficient estimators can give
we good results as well, but they usually require much larger samples.
 Invariant: statistics that are not easily changed by transformations, like
simple data shifts.
 Shrinkage: a raw estimate that’s improved by combining it with other
information.
 Sufficient: Sufficient statistics summarize all the available data about a
sample within a parameter.
 Unbiased: an accurate statistic that neither underestimates nor
overestimates.

Distributions
 Over many years, eminent statisticians noticed that data from samples and
populations often formed very similar patterns. For example, a lot of data were
grouped around the ‘middle’ values, with fewer observations at the outside
edges of the distribution (very high or very low values). These patterns are
known as ‘distributions’, because they describe how the data are ‘distributed’
across the range of possible values.
 A statistical distribution, or probability distribution, describes how values are
distributed for a field. In other words, the statistical distribution shows which
values are common and uncommon.
 There are many kinds of statistical distributions, including the bell-shaped
normal distribution. We use a statistical distribution to determine how likely a
particular value is.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 8
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

Types of Distributions
 Bernoulli Distribution
 Uniform Distribution
 Binomial Distribution
 Poisson Distribution
 Normal or Gaussian Distribution
 Exponential Distribution
Bernoulli distribution
 The Bernoulli distribution is one of the easiest distributions to understand.
 It can be used as a starting point to derive more complex distributions.
 Any event with a single trial and only two outcomes follows a Bernoulli
distribution.
 Flipping a coin or choosing between True and False in a quiz are examples of a
Bernoulli distribution.
Uniform distribution
 In statistics, uniform distribution refers to a statistical distribution in which all
outcomes are equally likely.
 Consider rolling a six-sided die. We have an equal probability of obtaining all
six numbers on our next roll, i.e., obtaining precisely one of 1, 2, 3, 4, 5, or 6,
equalling a probability of 1/6, hence an example of a discrete uniform
distribution.
Binomial Distribution
 The Binomial Distribution can be thought of as the sum of outcomes of an
event following a Bernoulli distribution. Therefore, Binomial Distribution is
used in binary outcome events, and the probability of success and failure is the
same in all successive trials.
 An example of a binomial event would be flipping a coin multiple times to
count the number of heads and tails.
Poisson distribution
 Poisson distribution deals with the frequency with which an event occurs
within a specific interval. Instead of the probability of an event, Poisson
distribution requires knowing how often it happens in a particular period or
distance.
 The Poisson distribution describes the probability of k events happening in a
unit of time.
 For example, a cricket chirps two times in 7 seconds on average. We can use the
Poisson distribution to determine the likelihood of it chirping five times in 15
seconds.
Normal distribution
 Normal distribution is the most used distribution in data science. In a normal
distribution graph, data is symmetrically distributed with no skew. When

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 9
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

plotted, the data follows a bell shape, with most values clustering around a
central region and tapering off as they go further away from the center.
 The normal distribution frequently appears in nature and life in various forms.
For example, the scores of a quiz follow a normal distribution. Many of the
students scored between 60 and 80 as illustrated in the graph below. Of course,
students with scores that fall outside this range are deviating from the center.
Exponential distribution
 Exponential distribution is one of the widely used continuous distributions.
 It is used to model the time taken between different events.
 For example, in physics, it is often used to measure radioactive decay; in
engineering, to measure the time associated with receiving a defective part on
an assembly line; and in finance, to measure the likelihood of the next default
for a portfolio of financial assets.
 Another common application of exponential distributions in survival analysis
(e.g., expected life of a device/machine).
Bernoulli Distribution: Single-trial with two possible outcomes
Uniform Distribution: All outcomes are equally likely
Binomial Distribution: A sequence of Bernoulli events
Poisson Distribution: an event occurs within a specific interval
Normal Distribution: Symmetric distribution of values around the mean
Exponential Distribution: Model elapsed time between two events
The Normal Distributions
 A Normal distribution (or Gaussian distribution) is a continuous probability
distribution that is symmetrical on both sides of the mean, so that right side of
the center is mirror image of the left side.
 Normal distribution is so important because it accurately describe the
distribution of values for many natural phenomena.
 Many observed frequency distributions approximate the well-documented
normal curve, an important theoretical curve noted for its symmetrical bell-
shaped form.
 Characteristics that are the sum of many independent processes frequently
follow normal distributions. For example, heights, blood pressure, measurement
error, and IQ scores follow the normal distribution.
 The normal curve is defined in terms of standard deviation and mean.
 The normal curve can be used to obtain answers to a wide variety of questions.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 10
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

Properties of the Normal Curve:


Important properties of the normal curve are:
 The normal curve is a theoretical curve defined for a continuous variable.
 The normal curve is symmetrical; its lower half is the mirror image of its upper
half.
 It is in bell-shaped form
 The normal curve peaks above a point midway along the horizontal spread and
then tapers off gradually in either direction from the peak.
 The curve approaches the x-axis, but it never touches, and it extends farther
away from the mean.
 The values of the mean, median and mode, located at a point midway along the
horizontal spread, are the same for the normal curve.
 The total area under the curve should be equal to 1.
 The normal distribution curve must have only one peak. (i.e., unimodal)
 The normal curve is asymptotic to the X-axis: i.e., the curve continues to
decrease in height on both ends away from the middle point (mean); but it never
touches the horizontal axis.
Different Normal Curves
 When using the normal curve, two bits of information are indispensable: values
for the mean and the standard deviation
 Various types of normal curves are produced by an arbitrary change in the value
of either the mean (μ) or the standard deviation (σ)
 Every normal curve can be interpreted in exactly the same way once any
distance from the mean is expressed in standard deviation units

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 11
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

Z Scores
 A unit-free, standardized score that indicates how many standard deviations a
score is above or below the mean of its distribution is called Z Score
 To obtain a z score, express any original score, whether measured in inches,
milliseconds, dollars, IQ points, etc., as a deviation from its mean (by
subtracting its mean) and then split this deviation into standard deviation units
(by dividing by its standard deviation), that is,

where X is the original score and μ and σ are the mean and the standard
deviation respectively, for the normal distribution of the original scores
 A z score consists of two parts:
 a positive or negative sign indicating whether it’s above or below the
mean; and
 a number indicating the size of its deviation from the mean in standard
deviation units.
 Example:
 A z score of 2.00 always signifies that the original score is exactly two
standard deviations above its mean.
 Similarly, a z score of –1.27 signifies that the original score is exactly
1.27 standard deviations below its mean.
 A z score of 0 signifies that the original score coincides with the mean.
 Problem: Express each of the following scores as a z score:
(a) Margaret’s IQ of 135, given a mean of 100 and a standard deviation
of 15
(b) a score of 470 on the SAT math test, given a mean of 500 and a standard
deviation of 100
(c) a daily production of 2100 loaves of bread by a bakery, given a mean of
2180 and a standard deviation of 50
(d) Sam’s height of 69 inches, given a mean of 69 and a standard deviation of 3
(e) a thermometer-reading error of –3 degrees, given a mean of 0 degrees and a
standard deviation of 2 degrees

Answers:
(a) z = (135-100)/15= 2.33
(b) z = (470-500)/100= -0.30
(c) z = (2100-2180)/50= -1.60
(d) z = (69-69)/3= 0.00
(e) z = (-3-0)/2= -1.50

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 12
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

STANDARD NORMAL CURVE


 If the original distribution approximates a normal curve, then the shift to
standard or z scores will always produce a new distribution that approximates
the standard normal curve.
 This is the one normal curve for which a table is actually available.
 The standard normal curve always has a mean of 0 and a standard deviation of
1.
 Although there is infinite number of different normal curves, each with its own
mean and standard deviation, there is only one standard normal curve, with a
mean of 0 and a standard deviation of 1.
 Converting all original observations into z scores leaves the normal shape intact
but not the units of measurement.

Standard Normal Table (Z Table)


 The standard normal table consists of columns of z scores coordinated with
columns of proportions.
 In a typical problem, access to the table is gained through a z score, such as –
1.00, and the answer is read as a proportion
Using the Top Legend of the Table
 Table columns are arranged in sets of three, designated as A, B, and C in the
legend at the top of the table. When using the top legend, all entries refer to the
upper half of the standard normal curve.
 The entries in column A are z scores, beginning with 0.00 and ending with 4.00.
Given a z score of zero, column B indicates the proportion of area between the
mean and the z score, and column C indicates the proportion of area beyond the
z score, in the upper tail of the standard normal curve.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 13
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

Using the Bottom Legend of the Table


 Because of the symmetry of the normal curve, the entries in table also can refer
to the lower half of the normal curve. Now the columns are designated as A′, B′,
and C′ in the legend at the bottom of the table. When using the bottom legend,
all entries refer to the lower half of the standard normal curve.
 The nonzero entries in column A′ are negative z scores, beginning with -0.01
and ending with -4.00.
 Column B′ indicates the proportion of area between the mean and the negative z
score, and column C′ indicates the proportion of area beyond the negative z
score, in the lower tail of the standard normal curve.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 14
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

 When using the standard normal table, it is important to remember that


 For any z score, the corresponding proportions in columns B and C (or
columns B′ and C′) always sum to .5000.
 Similarly, the total area under the normal curve always equals 1.0000, the
sum of the proportions in the lower and upper halves, that is, .5000 +
.5000.
 Finally, although a z score can be either positive or negative, the
proportions of area under the curve are always positive or zero but never
negative.
 Problem: Using standard normal Table , find the proportion of the total area
identified with the following statements:
(a) above a z score of 1.80
(b) between the mean and a z score of –0.43
(c) below a z score of –3.00
(d) between the mean and a z score of 1.65
(e) between z scores of 0 and –1.96
Answers: (a) .0359 (b) .1664 (c) .0013 (d) .4505 (e) .4750
Normal Curve Problems
 There are two general types of normal curve problems:
(1) Finding proportions: these problems require finding the unknown
proportion (of area) associated with some score or pair of scores and
(2) Finding scores: these problems require finding the unknown score or
scores associated with some area.
 Answers to the first type of problem usually require converting original scores
into z scores and answers to the second type of problem usually require
translating a z score back into an original score.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 15
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

 Rough graphs of normal curves can be used an aid to visualizing the solution.
Only after thinking through to a solution, do any calculations and consult the
normal tables.
Finding Proportions
 In these Normal curve problems, standard normal table (table A) must be
consulted to find the unknown proportion (of area) associated with some known
score or pair of known scores.
Finding Proportions for One Score
 Step-by-step procedure:
1. Sketch a normal curve and shade in the target area
2. Plan solution according to the normal table.
X 
3. Convert X to z using formula, z 

4. Find the target area by consulting standard normal table
 Example: to find the proportion of all applicants who are shorter than exactly
66 inches, given that the distribution of heights approximates a normal curve
with a mean of 69 inches and a standard deviation of 3 inches.

= (66-69)/3 = -3/3 = -1
Look up column A’ to 1.00 (representing a z score of –1.00), and note the
corresponding proportion of .1587 in column C’: This is the answer.

It can be concluded that only .1587 (or .16 or 16%) of all of the applicants will
be shorter than 66 inches.
Finding Proportions between Two Scores
 Step-by-step procedure:
1. Sketch a normal curve and shade in the target area
2. Plan solution according to the normal table.
X 
3. Convert X to z using formula, z 

4. Find the target area.
 Example: Assume that, when not interrupted artificially, the gestation periods
for human foetuses approximate a normal curve with a mean of 270 days (9
months) and a standard deviation of 15 days. What proportion of gestation
periods will be between 245 and 255 days?
Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 16
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

Finding Proportions beyond Two Scores


 Step-by-step procedure:
1. Sketch a normal curve and shade in the two target areas
2. Plan solution according to the normal table.
X 
3. Convert X to z using formula, z 

4. Find the target area.
 Problem: Assume that high school students’ IQ scores approximate a normal
distribution with a mean of 105 and a standard deviation of 15. What proportion
of IQs are more than 30 points either above or below the mean?
Answer:
Expressing IQ scores of 135 and 75 as

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 17
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

Finding Scores
 In this type of normal curve problems standard normal table (table A) must be
consulted to find the unknown score or scores associated with some known
proportion.
 Essentially, this type of problem requires that the use of table A by entering
proportions in columns B, C, B′, or C′ and finding z scores listed in columns A
or A′.
Finding One Score
 Step-by-step procedure:
1. Sketch a normal curve and, on the correct side of the mean, draw a line
representing the target score
2. Plan solution according to the normal table.
3. Find z by consulting standard normal table
4. Convert z to the target score using formula, X=  + (z) (  )
 Problem: Exam scores for a large psychology class approximate a normal curve
with a mean of 230 and a standard deviation of 50. Furthermore, students are
graded “on a curve,” with only the upper 20 percent being awarded grades of A.
What is the lowest score on the exam that receives an A?

Finding Two Scores


 Step-by-step procedure:
1. Sketch a normal curve. On either side of the mean, draw two lines
representing the two target scores
2. Plan solution according to the normal table.
3. Find z by consulting standard normal table
4. Convert z to the target score, using formula X =  + (z) (  )
 Problem: Assume that the annual rainfall in the San Francisco area
approximates a normal curve with a mean of 22 inches and a standard deviation

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 18
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

of 4 inches. What are the rainfalls for the more atypical years, defined as the
driest 2.5 percent of all years and the wettest 2.5 percent of all years?

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 19
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

More About Z Scores


Z Scores for Non-normal Distributions
 z scores are not limited to normal distributions.
 Non-normal distributions also can be transformed into sets of unit-free,
standardized z scores.
 In this case, the standard normal table cannot be consulted, since the shape of
the distribution of z scores is the same as that for the original non-normal
distribution.
 Regardless of the shape of the distribution, the shift to z scores always produces
a distribution of standard scores with a mean of 0 and a standard deviation of 1.
 Z scores can provide efficient descriptions of relative performance on one or
more tests.
 The use of z scores can help to identify a person’s relative strengths and
weaknesses on several different tests.

 For example, above table shows Sharon’s scores on college achievement tests in
three different subjects. The evaluation of her test performance is greatly
facilitated by converting her raw scores into the z scores listed in the final
column of above table. A glance at the z scores suggests that although she did
relatively well on the math test, her performance on the English test was only
slightly above average, as indicated by a z score of 0.50, and her performance
on the psychology test was slightly below average, as indicated by a z score of –
0.67.
Standard Score
 Any unit-free scores expressed relative to a known mean and a known standard
deviation is called standard score.
 Although z scores qualify as standard scores because they are unit-free and
expressed relative to a known mean of 0 and a known standard deviation of 1,
other scores also qualify as standard scores.
Transformed Standard Scores
 z scores can be changed to transformed standard scores, other types of unit-
free standard scores that lack negative signs and decimal points.
 These transformations change neither the shape of the original distribution nor
the relative standing of any test score within the distribution.

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 20
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

 For example, a test score located one standard deviation below the mean might
be reported not as a z score of –1.00 but as a T score of 40 in a distribution of T
scores with a mean of 50 and a standard deviation of 10.
 Following figure shows the values of some of the more common types of
transformed standard scores relative to the various portions of the area under the
normal curve.

Converting to Transformed Standard Scores


 Following formula can be used to convert any original standard score, z, into a
transformed standard score, z′, having a distribution with any desired mean and
standard deviation.
z’ = desired mean + (z) (desired standard deviation)
where z′ (called z prime) is the transformed standard score and z is the original
standard score.
 Problem: Assume that each of the raw scores listed originates from a
distribution with the specified mean and standard deviation. After converting
each raw score into a z score, transform each z score into a series of new
standard scores with means and standard deviations of 50 and 10, 100 and 15,
and 500 and 100, respectively.

Answers:

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 21
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

Tutorial Questions:

1. What is a measure of central tendency? Explain three different measures of


central tendency with their uses in describing data.

2. What is normal curve? List out the properties of normal curve

3. Explain in detail about z scores

4. Outline standard normal curve and standard normal table

5. Explain in detail about finding proportions and finding scores.


(or) What are two types of normal curve problems? How to answer these
Problems

6. Write short notes on


i) Estimators and Estimate
ii) Asymmetry
7. What is statistical distribution? Outline different types of distributions with
examples.

8. Explain in detail about z scores for non-normal distribution

Assignment Questions:

1. During their first swim through a water maze, 15 laboratory rats made the
following number of errors (blind alleyway entrances): 2, 17, 5, 3, 28, 7, 5,
8, 5, 6, 2, 12, 10, 4, 3.Find the mode, median, and mean for these data.

2. To the question “During your lifetime, how often have you changed your
permanent residence?” a group of 18 college students replied as follows:
1, 3, 4, 1, 0, 2, 5, 8, 0, 2, 3, 4, 7, 11, 0, 2, 3, 3. Find the mode, median,
and mean.

3. Express each of the following scores as a z score:


i) Kumar’s IQ of 135, given a mean of 100 and a standard deviation of 15
ii) a score of 470 on the SAT math test, given a mean of 500 and a
standard deviation of 100
iii) a daily production of 2100 loaves of bread by a bakery, given a mean of
2180 a standard deviation of 50
iv) Sam’s height of 69 inches, given a mean of 69 and a standard deviation of 3
v) a thermometer-reading error of –3 degrees, given a mean of 0 degrees
and a standard deviation of 2 degrees

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 22
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

4. Find the proportion of the total area identified with the following statements:
(a) above a z score of 1.80
(b) between the mean and a z score of –0.43
(c) below a z score of –3.00
(d) between the mean and a z score of 1.65
(e) between z scores of 0 and –1.96

5. Assume that GRE scores approximate a normal curve with a mean of 500 and
a standard deviation of 100. Find the proportions that correspond to the target
area described by each of the following statements:
(a) less than 400
(b) more than 650
(c) less than 700

6. Assume that SAT math scores approximate a normal curve with a mean of 500
and a standard deviation of 100.Find the target area(s) described by each of the
following statements:
(a) more than 570
(b) less than 515
(c) between 520 and 540
(d) between 470 and 520
(e) more than 50 points above the mean
(f) more than 100 points either above or below the mean

7. For the normal distribution of burning times of electric light bulbs, with a
mean equal to 1200 hours and a standard deviation equal to 120 hours, what
burning time is identified with the
(a) upper 50 percent?
(b) lower 75 percent?
(c) lower 1 percent?
(d) middle 90 percent?

8. Assume that each of the raw scores listed originates from a distribution with
the specified mean and standard deviation. After converting each raw score into
a z score, transform each z score into a series of new standard scores with
means and standard deviations of 50 and 10, 100 and 15, and 500 and 100,
respectively

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 23
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 24
4-1 B. Tech CIVIL Regulation: R20 FDS: UNIT-3

Prepared By: MD SHAKEEL AHMED, Associate Professor, Dept. Of IT, VVIT, Guntur Page 25

You might also like