0% found this document useful (0 votes)
109 views30 pages

Course Code: Caec 3A Course Title: College: Authors: Title of The Learning Resource

This document provides an outline and objectives for a statistical analysis course module on variables and values. The module will introduce Excel, cover variables and values, scales of measurement, numeric variables in Excel, and frequency distributions. It will help students understand key statistical concepts like different types of variables, scales of measurement, and how to represent numeric data in Excel and analyze frequency distributions. References and online video links are also provided for further reading.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views30 pages

Course Code: Caec 3A Course Title: College: Authors: Title of The Learning Resource

This document provides an outline and objectives for a statistical analysis course module on variables and values. The module will introduce Excel, cover variables and values, scales of measurement, numeric variables in Excel, and frequency distributions. It will help students understand key statistical concepts like different types of variables, scales of measurement, and how to represent numeric data in Excel and analyze frequency distributions. References and online video links are also provided for further reading.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Course Code: CAEC 3A

Course Title: STATISTICAL ANALYSIS WITH SIFTWARE APPLICATIONS


College: COLLEGE OF BUSINESS MANAGEMENT AND ACCOUNTANCY
Authors: NOEL C. RAMIREZ, MBA
Title of the learning
resource:
MODULE 1: BASIC CONCEPTS OF STATISTICS

OUTLINE

* Development of Statistics
* Importance of Statistics
* Kinds of Statistics
* Types of statistics
* Method of Selecting Samples

OBJECTIVES

At the end of this module, you will be able to:


● Gain knowledge on the development of Statistics
● Understand some important terms commonly used in Statistics
● Identify Statistics and its important
● Explain the two major kinds of Statistics, descriptive and inferential
● Distinguish between qualitative and quantitative variables, discrete and continuous
variables
● Understand the four levels of measurement scale; and
● Learn the methods of selecting samples

OVERVIEW

Statistics being the branch is often associated with anxiety and unease
particularly among students who fear mathematics for one reason. Well if you are one of those
students then sit back and relax and read this module as it will take you through a journey
in which you will discover a world of fun, excitement, vision and creativity. With
minimum mathematical details, introduce key concepts and universal terms that are
used among statisticians and briefly discuss common statistical tools, their underlying
principles and their practical merits.

LESSON PROPER:

Assessment: In your own words, answer briefly the following


thoughts/questions:
1. What do you know about the word Statistics?
_______________________________________________________________________
_______________________________________________________________________
_______________________________________________________________________
___________________________________
2. What comes to your mind when you hear the word Statistics?
_______________________________________________________________________
_______________________________________________________________________
____________________________________________

3. Can Statistics be used in your daily routine?


_______________________________________________________________________
_______________________________________________________________________
____________________________________________

We all apply Statistics in our everyday living, We encounter them in


newspapers, magazines, televisions and radios. We meet them when we discuss the cost
of living, unemployment, medical breakthroughs, weather predictions, sports, politics
and lottery. Although we are not aware of it, each of us is an informal statistician. We
are constantly gathering, organizing and analyzing data and use this to make decisions
that will dictate our actions.

Development of Statistics
The word “statistics” was derived from the latin word “status” meaning state.
Historically, statistics developed gradually as society has become interested in
collecting data for a variety of applications.
Statistics we define as the branch of mathematics concerned with collection,
classification, analysis and interpretation of numerical data with a defined purpose in
any field of the study.

Importance of Statistics

Statistics is in important on the following areas:


1. Politics
2. Market research
3. Medicine
4. Engineering
5. Economy
6. Education

Kinds of Statistics

1. Descriptive Statistics- as methods for organizing, summarizing and


presenting data in an instructive way.
2. Inferential Statistics are those methods that use a sample of the population
techniques for analysis of data and testing the reliability of the estimates

Types of Data

1. Primary Data are information collected from an original source of


data, which is first hand in nature.

2. Secondary Data are information collected from published and unpublished


sources and the like.

Types of variable

1. Qualitative are those that are considered non numeric by nature

2. Quantitative are variable that can expressed numerically

Types of Qualitative Variables

1. Discrete Variable are variable whose value is obtained by counting

2. Continuous Variable are variable whose value is obtained by measuring

Four Levels of Measurement Scales

1. Nominal Scale
- at this level, numbers are assigned to identify and classify individuals. The
only measurement to be used is counting.
2. Ordinal scale
- in this scale, the individuals or objects are arranged in rank and order
3. Interval Scale3
- it is used to obtain a more precise measurement by finding the difference
between values such as addition and subtraction.
4. Ratio Scale
- This is the most powerful level of measurement. The data are compared by
multiplication and numbers. Zero point is very important as well as the ratio
between two numbers.

Evaluation
I. Classify each of the following statements as descriptive statistics or inferential
statistics. Write DS for descriptive statistics and IS for inferential statistics.
__ 1. Eight out of ten applicants for a managerial position
_ are women.
__ 2. The average monthly income of a person aged 40-45
_ is P35,469.
__ 3. There is a relationship between high salary and
_ getting incentives.
__ 4. Based on the present sales trend, it is expected that
_ after two years, this year’s sales will be doubled.
__ 5. All four provinces of ARMM are among the 10
_ poorest provinces in Region 1.
__ 6. As a result of a recent poll, most Filipinos are in
_ favor of finding employment abroad.
__ 7. Bonuses can motivate employees.
_
__ 8. The average salary of casual employees in NCR
_ Company is P9500.
__ 9. Drinking beer can raise cholesterol level by 25%
_
__ 10 The average grade of CBMA students in Analytical
_ . Statistics class is 82.5%
II.Determine which of the following are qualitative and quantitative, discrete and continuous. Also
indicate the level of measurements.

Qualitative or Discrete or Level of


Quantitative Continuous Measurements

1. IQ of a person

2. Average income of a
family

3. Varieties of cars

4. Distance (in km) of


school from place to residence.

5. Population growth rate


in the Philippines.

6. Life span (in hours) of


10 flashlight batteries

7. Ranking of basketball
players

8. Rating of ten local


movies (excellent, good, fair,
poor)

9. Number of hotdogs sold


per day by a food chain

10..outcome of tossing a coin

CASE STUDY: Analyze the CURRENT statistical tally report from PDRRMC (Pangasinan)
regarding COVID cases, using the descriptive and inferential statistics AND quantitative and
qualitative, compare the rate of:

1. Recovered against death of the respective cases.

2. Active, PUM and PUI cases.


SUMMARY

In general, statistics is a study of data: describing properties of the data, which is called


descriptive statistics, and drawing conclusions about a population of interest from information
extracted from a sample, which is called inferential statistics.

Statistical knowledge helps you use the proper methods to collect the data, employ the
correct analyses, and effectively present the results. Statistics is a crucial process behind how we
make discoveries in science, make decisions based on data, and make predictions

Descriptive statistics involves summarizing and organizing the data so they can be easily
understood. Descriptive statistics, unlike inferential statistics, seeks to describe the data, but
does not attempt to make inferences from the sample to the whole population. Here, we typically
describe the data in a sample.

At the highest level, two kinds of data exist: quantitative and qualitative. There are two
types of quantitative data, which is also referred to as numeric data: continuous and discrete.

There are different kinds of measurement scales, and the type of data being collected
determines the kind of measurement scale to be used for statistical measurement. These
measurement scales are four in number, namely; nominal scale, ordinal scale, interval scale,
and ratio scale.
READINGS AND REFERENCES

READINGS/VIDEOS
Lesson
1 https://www.youtube.com/watch?v=zlfwdsEDC4Q

Lesson https://www.youtube.com/watch?v=EfZr1HR2R4E
2
REFERENCES
Lesson Sumera-Icutan et al, Statistics with Probability: A Comprehensive Approach 2 nd
1 Edition
Lesson
2
MODULE 2: VARIABLES and VALUES

OUTLINE

* Introduction to Excel
* Variables and Values
* Scales of Measurements
* Numeric Variables in Excel
* Frequency Distributions

OBJECTIVES

At the end of this module, you will be able to:


● Learn to use excel for statistical analysis
● Learn to record data in lists and make use of lists
● Categorize scales of measurements
● Chart numeric variables in Excel
● Organize and display data into a frequency distribution
● Build simulated frequency distributions

OVERVIEW

Microsoft Excel is a Spreadsheet Software / Application that can be


used to organize and manipulate numerical or text data. Excel is a product in its own
right but also can be found as part of the Microsoft Office  product family which
contains other products such as Word, PowerPoint, Access and Outlook.
It must seem odd to start a module about statistical analysis using Excel
with a discussion of ordinary, everyday notions such as variables and values. But variables
and values, along with scales of measurement (discussed in the next lesson), are at the heart
of how you represent data in Excel. And how you choose to represent data in Excel
has implications for how you run the numbers. With your data laid out properly, you can
easily and efficiently combine records into groups, pull groups of records apart to examine
them more closely, and create charts that give you insight into what the raw numbers are
really doing. When you put the statistics into tables and charts, you begin to
understand what the numbers have to say.
LESSON PROPER:

Variables and Values

A variable is an attribute or property that describes a person or a thing. Age is a


variable that describes you. It describes all humans, all living organisms, all objects are
anything that exists for some period of time. Surname is a variable, and so are Weight in
Pounds and Brand of Car. Database jargon often refers to variables as fields, and some
Excel tools use that terminology, but in statistics you generally use the term variable.
Variables have values. The number 21 is a value of the variable Age, the name Ramirez
is a value of the variable Surname, 120
is a value of the variable Weight in
Pounds, and Mazda is a value of the
variable Brand of Car. Values vary from
person to person and from object to
object hence the term variable.

Recording Data in Lists


When you run a statistical analysis,
your purpose is generally to summarize a group of numeric values that belong to the
same variable. For example, you might have obtained and recorded the weight in
pounds for 20 people, as shown in Figure 1.1.

Figure 1.1 This layout is ideal for analyzing data in Excel

Excel calls a list—a variable that occupies a column, records that each occupy a
different row, and values in the cells where the records’ rows intersect the variable’s
column. (The record is the individual being, object, location—whatever—that the list
brings together with other, similar records. If the list in Figure 1.1 is made up of
students in a classroom, each student constitutes a record.) A list always has a header,
usually the name of the variable, at the top of the column. In Figure 1.1, the header is
the label Weight in Pounds in cell A1.

There are some interesting questions that you can answer with a single-column list such
as the one in Figure 1.1. You could select all the values, or just some of them, and look
at the status bar at the bottom of the Excel window to see summary information such as
the average, the sum, and the count of the selected values. Those are just the quickest
and simplest statistical analyses you might run with this basic single-column list.

Numeric Scales

Although there is only one type of category scale, there are three types of numeric
scales: ordinal, interval, and ratio. You can use the value axis of any Excel chart to
represent any type of numeric scale, and you often find yourself analyzing one numeric
variable, regardless of type, in terms of another variable. Briefly, the numeric scale
types are as follows:
 Ordinall scales are often rankings, and tell you who finished first,
second, third, and so on. These rankings tell you who came out ahead,
but not how far ahead, and often you don’t care about that.
 Interval scales indicate differences Interval scales indicate differences.
 Ratio scales are similar to interval scales, but they have a true zero point,
one at which there is a complete absence of some quantity.

Charting Numeric Variables in Excel

Several chart types in Excel lend themselves beautifully to the visual


representation of numeric variables. This book relies heavily on charts of that type
because most of us find statistical concepts that are difficult to grasp in the abstract are
much clearer when they’re illustrated in charts.

Charting Two Variables

Earlier in this module briefly discuss two chart types that use a category variable
on one axis and a numeric variable on the other: Column charts and Bar charts. There
are other, similar types of charts, such as Line charts, that are useful for analyzing a
numeric variable in terms of different categories.
Understanding Frequency Distributions

In addition to charts that show two variables—such as numbers broken down by


categories in a Column chart, or the relationship between two numeric variables in an
XY chart—there is another sort of Excel chart that deals with only one variable. It’s the
visual content of a frequency distribution, an idea that’s absolutely fundamental to
intermediate and advanced statistical methods. A frequency distribution is intended to
display how many instances there are of each value of a variable.

Those two reasons help define the two general branches of statistics: descriptive
statistics and inferential statistics. Along with descriptive statistics such as averages,
ranges of values, and percentages or counts, the chart of a frequency distribution puts
you in a stronger position to understand a set of people or things because it helps you
visualize how a variable behaves across its range of possible values.

Visualizing the Distribution:

Descriptive Statistics It’s usually much easier to understand a variable, how it


behaves in different groups, how it may change over time, and even just what it looks
like, when you see it in a chart. For example, here’s the formula that defines the normal
distribution:

u = 1 / (σ ((2π)^0.5)) e ^ (−0.5 ((X − μ) / σ) ^ 2) —in particular, its


functions and its charts—to infer a population’s
Visualizing the Population

Inferential Statistics The other general rationale for examining frequency


distributions has to do with making an inference about a population, using the
information you get from a sample as a basis. This is the field of inferential statistics. In
later lesson of this modules, you will see how to use Excel’s tools in particular, its
functions and its charts to infer a population’s characteristics from a sample’s frequency
distribution.

Building a Frequency Distribution from a Sample Conceptually, it’s easy to


build a frequency distribution. Take a sample of people or things and measure each
member of the sample on the variable that interests you. Your next step depends on how
much sophistication you want to bring to the project.

1. Create a histogram using the 2D chart tool. Format the chart as shown below.

PAUSE: View the Inferential Statistics PowerPoint presentation. Then complete the
following.

2.Calculate the sample standard deviation using a function (STDEV.S) and display the
output near the population standard deviation calculation.

3. Use the Data Analysis tool to calculate the Descriptive Statistics (Summary
Statistics) and place the output data next to your calculated statistics.

4. Print your worksheet. Include all of the elements shown on the worksheet below.

5. Based on your histogram, does the data appear to be normally distributed? Explain
your answer.

Conclusion
1. Describe the difference between population standard deviation and sample standard
deviation? Describe a scenario in which you would use each.
2. Which type of standard deviation (population or sample) is displayed when the Data
Analysis Tool is used within Excel?
3. How do the statistics that you calculated using Excel compare to the statistics that
you calculated by hand for the water filtration cap?

SUMMARY

Variables are units of data that can change between different cases.


Variables can be analysed on their own (univariate analysis), with one
other variable (bivariate analysis) or with a number of others (multivariate analysis).
Distinctive Features. Variable analysis is a key characteristic of quantitative research.

Descriptive statistics describes data (for example, a chart or graph)


and inferential statistics allows you to make predictions (“inferences”) from that data.
With inferential statistics, you take data from samples and make generalizations about a
population.

READINGS AND REFERENCES


READINGS/VIDEOS

REFERENCES
MODULE 3: HOW VALUE CLUSTER TOGETHER

OUTLINE

* Mean
* Median
* Mode
* Central Tendency

OBJECTIVES

At the end of this module, you will be able to:


● Calculate Mean, Median and Mode
● Understand Functions, Arguments and Results
● Learn to minimize spreadsheets
● Get the Mode of Categories with a formula

OVERVIEW

A measure of central tendency is a single value that attempts to describe a set


of data by identifying the central position within that set of data. As such, measures of central
tendency are sometimes called measures of central location. They are also classed as
summary statistics. The mean (often called the average) is most likely the measure of
central tendency that you are most familiar with, but there are others, such as the median and
the mode.

The mean, median and mode are all valid measures of central tendency, but under
different conditions, some measures of central tendency become more appropriate to use
than others. In the following sections, we will look at the mean, mode and median, and
learn how to calculate them and under what conditions they are most appropriate to be
used.
This module, concerns with the measures of Central Tendency using Microsoft
Excel. It is also details the rationale for formulas and techniques involved in designing
them.

LESSON PROPER:

Calculating the Mean


In the context of statistics, it’s very convenient, and more precise,
to use the word mean this way. It avoids the vagueness of the word average, which as just
discussed can refer to the mean, to the median, or to the mode. So it shows that Excel uses the
function name AVERAGE( ) instead of MEAN( ). Nevertheless, Table 2.1 an example of how
you get a mean using Excel.

Table 2.1

Minimizing the Spread


The mean has a special characteristic that makes it more useful for certain
intermediate and advanced statistical analyses than the median and the mode. That
characteristic has to do with the distance of each individual observation from the mean
of those observations.

Calculating the Median


The median of a group of observations is usually, and somewhat casually,
thought of as the middle observation when they are in sorted order. And that’s usually a
good way to think of it, even if it’s a little imprecise.
Table 2.2
Choosing to Use the Median
The median is sometimes a more descriptive measure of central tendency than
the mean. For example,Table 2.2 shows what’s called a skewed distribution that is, the
distribution isn’t symmetric. Most of the values bunch up on the left side, and a few are
located off to the right (of course, a distribution can skew either direction—this one
happens to skew right). This sort of distribution is typical of home prices and it’s the
reason that the real estate industry reports medians instead of means.

Calculating the Mode


The mean gives you a measure of central tendency by taking all the actual
values in a group into account. The median measures central tendency differently, by
giving you the midpoint of a ranked group of values. The mode takes yet another tack:
It tells you which one of several values occurs most frequently.

Table 2.3 Excel’s MODE( ) function works only with numeric values.

In this activity you will collect data of the height of students and use Microsoft Excel to
perform statistical analyses and create a statistical chart to display your data.

Equipment
 Notebook (Journal)
 Completed activity 3.4 Applied Statistics
 Measuring tape

Procedure
IMPORTANT: This activity requires the use of the Analysis ToolPak, which is an Add-in to
Microsoft Excel that is not available by default in the program. To load the Analysis ToolPak
follow the instructions of your instructor or read Analysis ToolPak Loading Instructions.

Part 1. Perform a statistical analysis in Excel of height measurements of students in your


class.
1. Open an Excel workbook. On worksheet 1 type
“Activity 3.5 Statistical Analysis with Excel” in cell A1
and your name in cell A2.
2. In your notebook, record the height of each student
in your class in feet and inches to the nearest quarter of
an inch.
3. Input the raw data into an Excel worksheet using a
separate column (A) for feet and a separate column (B)
for inches. Include appropriate data (column) headers.

4. Using a formula, convert each height to decimal feet


and place the results in column C and include an
appropriate column heading.
5. Format the height measurements in decimal feet to
show two decimal places.
6. In the cell just below the column of heights in
decimal feet, calculate the sum of the height
measurements using the SUM function. Format the cell
containing the sum to display a box around the number
and add the text “Sum =” in the cell to the left of the
sum cell. Note that the sum should display two decimal
places.
7. Calculate the statistics indicated in the image to the
right. Create the text labels in the appropriate cells. Be
sure to calculate the population standard deviation
(STDEV.P) as well as the appropriate Mode function
(single or multimodal).

OPTIONAL: Use formulas to calculate the standard


deviation of your height data.

If your data has more than one mode, use the


MODE.MULT function. This function will create an
answer in the form of an array. Therefore before typing
the function into a cell, highlight multiple cells
(vertically), type in the function text and select the
range of values. Then depress Cntl/Shift/Enter keys
simultaneously to indicate an array will be returned.

You can create a simple formula to calculate the range.

8. Create class intervals (value ranges) for a histogram


at 0.250 feet intervals that will include your minimum
and maximum recorded height.
9. Create a frequency table using the Histogram tool
choice in the Data Analysis Tool (Data tab, Analysis
panel).
SUMMARY

This module has examined the three principal measures of central tendency in a
set of values. Central tendency is a critically important attribute in any sample or
population, but so is variability. If the mean informs you where the values tend to cluster,
the standard deviation and related statistics tell you how the values tend to disperse.

Mean is the most frequently used measure of central tendency and generally
considered the best measure of it. However, there are some situations where
either median or mode are preferred. Median is the preferred measure of central
tendency when: There are a few extreme scores in the distribution of the data.

Central tendency is defined as “the statistical measure that identifies a single


value as representative of an entire distribution. It aims to provide an accurate description
of the entire data. It is the single value that is most typical/representative of the collected
data.

READINGS AND REFERENCES

READINGS/VIDEOS
Lesson
1 https://www.youtube.com/watch?v=zlfwdsEDC4Q

Lesson https://www.youtube.com/watch?v=EfZr1HR2R4E
2
REFERENCES
Lesson Sumera-Icutan et al, Statistics with Probability: A Comprehensive Approach 2 nd
1 Edition
Lesson
2

MODULE 4: VARIABILITY ; HOW VALUE


DISPERSE

OUTLINE

* Range
* Standard Deviation
* Variance
* Estimate and Degrees of Freedom
* Variability Functions

OBJECTIVES

At the end of this module, you will be able to:


● Measure Variability with the Range
● Learn the concept of Standard Deviations
● Calculate standard deviation and variance

OVERVIEW

LESSON PROPER:

Measuring Variability with the Range


Just as there are three primary ways to measure the central tendency in a
frequency distribution, there’s more than one way to measure variability. Two of these
methods, the standard deviation and the variance, are closely related and take up most
of the discussion in this module.
Sample Size and the Range
The size of the range is entirely dependent on the values of the largest and the
smallest values. The range does not change until and unless there’s a change in one or
both of those values, the maximum and the minimum. All the other values in the
frequency distribution could change and the range would remain the same. The other
values could be distributed more homogeneously, or they could bunch up near one or
two modes, and the range would still not change.
Variations on the Range
 The Interquartile Range (IQR)
The interquartile range is simply the distance
between the 25th and the 75th percentiles in a data
 The Semi-Interquartile Range
This statistic is sometimes termed Q. It is exactly half
the size of the IQR. Along with the IQR, it is much
less likely than the range to be influenced by the
sample size, and it is therefore considered a much better
indicator of the degree of variability in a data set than is
the range.

The Concept of a Standard Deviation


Suppose someone told you that you stand 19 units tall. What do you conclude
from that information? Does that mean you’re tall? short? of average height? What
percent of the population is taller than you are?

Calculating the Standard Deviation and Variance


Excel provides you with no fewer than six functions to calculate the standard
deviation of a set of values, and it’s pretty easy to get the standard deviation on a
worksheet. If the values you’re concerned with are in cells A2:A21, you might enter this
formula to get the standard deviation:

=STDEV.P(A2:A21)

The square of a standard deviation is called the variance. It’s another important
measure of the variability in a set of values. Also, several functions in Excel return the
variance of a set of values. One is VAR.P( ). Again, other versions are discussed later in
“Excel’s Variability Functions.” You enter a formula that uses the VAR.P( ) function
just as you enter one that uses a standard deviation function:

=VAR.P(A2:A21)

Here’s what’s often called the definitional formula of the variance:

s2=∑i=1N(Xi–X¯)2/N

Population Parameters and Sample Statistics


You normally use the word parameter for a number that describes a population
and statistic for a number that describes a sample. So the mean of a population is a
parameter, and the mean of a sample is a statistic.

The variance as a parameter is defined in this way:

σ2=∑i=1N(Xi−μ)2/N
Therefore, any time you estimate the variance (or the standard deviation) of a
population using the variance (or standard deviation) of a sample, your sample statistic
is virtually certain to underestimate the size of the population parameter.

∑i=1N(Xi–X¯)2/(N−1)

The quantity (N − 1) in this formula is called the degrees of freedom.

Bias in the Estimate and Degrees of Freedom


When you use N, instead of the N − 1 degrees of freedom, in the calculation of
the variance, you are biasing the statistic as an estimator of the population. It is then
biased negatively: It’s an underestimate of the population variance.

Excel’s Variability Functions


The 2010 version of Excel reorganized and renamed several statistical functions,
and Excel 2013 and 2016 retain those changes. The aim is to name the functions
according to a more consistent pattern than was used in earlier versions, and to make a
function’s purpose more apparent from its name.

Standard Deviation Functions


For example, since 1995 Excel has offered two functions that return the standard
deviation:
STDEV()—This function assumes that its argument list is a sample from
a population, and therefore uses N − 1 in the denominator.
STDEVP()—This function assumes that its argument list is the
population, and therefore uses N in the denominator. In its 2003 version, Excel
added two more functions that return the standard deviation:
STDEVA()—This function works like STDEV() except that it accepts
alphabetic, text values in its argument list and also Boolean (TRUE or FALSE)
values. Text values and FALSE values are treated as zeros, and TRUE values
are treated as ones. STDEVPA()—This function accepts text and Boolean
values, just as does STDEVA(), but it assumes that the argument list constitutes
a population.

Microsoft decided that using P, for population, at the end of the function
name STDEVP() was inconsistent because there was no STDEVS(). That
would never do, and to remedy the situation, Excel versions since 2010 include
two new standard deviation functions that append a letter to the function name
in order to tell you whether it’s intended for use with a sample or on a
population:
STDEV.S()—This function works just like STDEV—it ignores Boolean
values and text.
STDEV.P()—This function works just like STDEVP—it also ignores
Boolean values and text. STDEV.S() and STDEV.P() are termed consistency
functions because they introduce a new, more consistent naming convention
than the earlier versions. Microsoft also states that their computation algorithms
bring about more accurate results than is the case with STDEV() and
STDEVP().

Variance Functions
Similar considerations apply to the worksheet functions that return the
variance. The function’s name is used to indicate whether it is intended for a
population or to infer a population value from a sample, and whether it can deal
with non numeric values in its arguments.

SUMMARY

READINGS AND REFERENCES

READINGS/VIDEOS
Lesson
1 https://www.youtube.com/watch?v=zlfwdsEDC4Q

Lesson https://www.youtube.com/watch?v=EfZr1HR2R4E
2
REFERENCES
Lesson Sumera-Icutan et al, Statistics with Probability: A Comprehensive Approach 2 nd
1 Edition
Lesson
2
MODULE 5: CORRELATION

OUTLINE

* Correlation
* Correl Functions

OBJECTIVES

At the end of this module, you will be able to:


● Understand and calculate Correlation
● Use the Correl Functions
● Use the Analysis Tools and Correlation Tools
● Understand Best Combination and Shared Variance

OVERVIEW

This module begins a look at how two or more variables covary: that is, how
higher values on one variable are associated with higher values on another, and how
lower values on the two variables are also associated. The reverse situation also occurs
frequently, when higher values on one variable are associated with lower values on
another variable.
LESSON PROPER:

Understanding Correlation
The degree to which two variables behave in this way—that is, the way
they covary—is called correlation. A familiar example is height and weight. They have
what’s called a positive correlation: High values on one variable are associated with
high values on the other variable (see Table 5.1).

Table 5.1

A positive correlation appears in a chart as a general lower-left to upper-right


trend. The chart in Table 5.1 has a marker for each of the 12 people whose height and
weight appear in cells A2:B13. Generally, the lower the person’s height (according to
the horizontal axis), the lower the person’s weight (according to the vertical axis), and
the greater the height, the greater the weight.

The reverse situation appears in Table 5.2, which charts the number of points
scored in a game against the order of each player’s finish. The higher the number of
points, the lower (that is, the better) the finish. That’s an example of a negative
correlation: Higher values on one variable are associated with lower values on the other
variable.
Table 5.2
Table 5.3 Two uncorrelated variables tend to display a relationship such as this one: a
random spray of markers on the chart.
Table 5.3

The Correlation, Calculated Notice the formula in the formula bar shown in Table 5.3:

=CORREL(A2:A13,B2:B13)

Using the CORREL( ) Function

Table 5.7 shows how you might use the CORREL() function to look into
the relationship between two variables that interest you. Suppose that you’re a loan
officer at a company that provides home loans, and you want to examine the
relationship between purchase prices and buyers’ annual income for loans that your
office has made during the past month. Table 5.7 It’s always a good idea to validate
the correlation with a chart.
Table 5.7
You gather the necessary data and enter it into an Excel worksheet as shown in
columns A through C of Table 5.7. Notice in Table 5.7 that a value here, the buyer’s
name in column A uniquely identifies each pair of values. Although an identifier like
that isn’t at all necessary for calculating a correlation coefficient, it can be a big help in
verifying that a particular record’s values on the two variables actually belong together.
For example, without the buyer’s name in column A, it would be more difficult to check
that the Neils’ house cost $195,000 and their annual income is $110,877. If you don’t
have the values on one variable paired with the proper values on the other variable, the
correlation coefficient will be calculated correctly only by accident. Therefore, it’s good
to have a way of making sure that, for example, the Neils’ income of $110,877 matches
up with the cost of $195,000.

Using Correlation

To this point, we have talked mostly about the concept of a correlation


coefficient how it is defined and how it can illuminate the nature of the relationship
between two variables. That’s useful information by itself, but things go much further
than that. For example, it’s probably occurred to you that if you know the value of one
variable, you can predict the value of another variable that’s correlated with the first.
That sort of prediction is the focus of the remainder of this chapter. The basics discussed
here turn out to be the foundation of several analyses discussed in later chapters. Used in
this way, the technique goes by the name regression, which is the basis for the
designation of the correlation coefficient, r.

SUMMARY
READINGS AND REFERENCES

READINGS/VIDEOS
Lesson
1 https://www.youtube.com/watch?v=zlfwdsEDC4Q

Lesson https://www.youtube.com/watch?v=EfZr1HR2R4E
2
REFERENCES
Lesson Sumera-Icutan et al, Statistics with Probability: A Comprehensive Approach 2 nd
1 Edition
Lesson
2

You might also like