0% found this document useful (0 votes)
22 views9 pages

Stats Lec01

Uploaded by

zhaoyixue116
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views9 pages

Stats Lec01

Uploaded by

zhaoyixue116
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

HUDM 4122- Statistics

Lecture 01 – Background Information

HUDM4122: Probability and Statistical Inference


The purpose of this course is to give you some theoretical background and to deepen your
understanding more applied (i.e. directly useful) material.

This course assumes that you have already had a previous statistics course as an undergraduate.
Specifically, you need to have reasonable knowledge of algebra, calculating means (averages), and
reading and interpreting graphs. Here is a brief introductory lecture on the materials that we presume
you already know.

Notation

Scientific Notation: Because your calculator has a limited amount of space to report numbers, you may
find that it returns a value that looks something like this: 2.35E4 or this: 2.35E-4. This is scientific
notation, which is telling you to move the decimal point a certain number of space to the right (2.35E4 is
really 2.35 X 104 , or 23,500) or to the left (2.35E-4 is really 2.35 X 10−4 , or .000235).

Factorial Notation: n! is called “n factorial” and is requires you to multiply n by all of the subsequent
lesser integers up to 1. For example, if you have 5!, you would multiply, 5*4*3*2*1 for a total of 120. 5!
= 120. 4! = 4*3*2*1 = 24.

Subscripts: X i = i-th entry (observation) of a dataset, X.


The “i” basically is a “index” of the position of a point in a data set.
If X is a dataset that has 3 values, 6, 3, and 7, they are denoted as:
X1=6
X2=3
X3=7
6 is the first (i = 1) value in the list, 3 is the second (i=2), and 7 is the third (i=3) Subscript notation is most
useful when combined with:
n
Summation Notation ∑X
i =1
i

The Greek letter, ∑ is telling us to add up a bunch of numbers. Sometimes all we have to do is add up

a set of numbers. Other times, we have to do something to each individual value in the set before
adding it up. For example, here we are expected to square each value of X before adding it up.
n

∑X
i =1
i
2
Remember Please Excuse My Dear Aunt Sally? Order of operations? Parentheses, Exponents,

Multiplication, Division, Addition, Subtraction. You follow the order of operations for these summation
operations.

1
Example:
i (index) Xi X i2
1 6 36
2 3 9
3 7 49
4 0 0
5 6 36
6 9 81
7 0 0
8 1 1
ΣX i =32 ΣX i = 212
2

Descriptive Statistics - Concerned with the presentation, summarization and description of data.

Ex: In Spring semester of 2008, 138 adult males from an urban college in the northeast were asked to
list their height in inches. The responses are summarized below:
1. Graphical Summary - height bar graph

2. Tabular (Table) Summary -

Height in Inches Frequency


60 - 63.5 20
64 - 67.5 46
68 - 71.5 45
72 - 75.5 22
76 - 79.5 5
Total: 138

2
3. Numerical Summary: Measure of Center

a) Mean: the arithmetic average of a set of measurements - the sum of the


measurements divided by the total number of measurements (not the true
center)
n

∑x i
X= i =1

n
(Refer to the summation notation section. You add up a set of numbers and
then divide by n, which is the total number of data points you have).

b) Median: the median of a set of n measurements is the middle value when


the measurements are arranged in ascending order.

Order the data. If n is odd, the median is the unique middle value. If n is
even, the median is the average of the two middle numbers

c) Mode: the value in the data set that occurs with the greatest frequency.
Note: If no value occurs with the greatest frequency, there is not one
value called the mode. If two values occur with the highest frequency, the
data is called bimodal. If more than two values occur with the highest
frequency, the data is called multimodal.

4. Numerical Summary: Measure of Dispersion (Spread)

a) Range - the difference between the largest and smallest values in a dataset.
Depends on only 2 values. (extremes). It is a poor measure because the
extremities are not typical of the total variability in your dataset.

b) Variance - the measure of variability of the scores from a dataset

Sample Variance - sometimes called the sample estimate of the variance.


n

∑(x − X ) i
2

s =
2 i =1
n −1
Follow the summation notation order of operations rules.

c) Standard deviation – positive square root of variance

s= s2

3
Example 1: The ages of a sample of students in an American literature class are as follows:
{20, 19, 65, 20, 21, 18, 17, 20, 19}

Q1: Compute the Mean, Median, and Mode of these ages:

∑x i

Mean = X= i =1
= 219/9 = 24.33
n
Median = (order the data first)
{17, 18, 19, 19, 20, 20, 20, 21, 65} Then find the middle number = 5th position = 20 years

Mode = most common age, 20 years

Example 2: Diners were asked to rate a meal in a restaurant on a scale from 0 to 10.
Here are the results:
{6, 3, 7, 0, 6, 9, 0, 1} Order the data: {0, 0, 1, 3, 6, 6, 7, 9}

Q1: Find the Mean, Median, and Mode for the sample: Mean = 4, Median = 4.5, Modes = 0 and 6 (This
set of distribution is bi-modal because it has 2 most common numbers)
n

∑(x − X ) i
2
84
Q2: Find the sample variance: s=
2 i =1
= = 12
n −1 7
i(index) X X- X (X i - X ) 2
1 0 -4 16
2 0 -4 16
3 1 -3 9
4 3 -1 1
5 6 2 4
6 6 2 4
7 7 3 9
8 9 5 25
n n

∑X
i =1
i =4 ∑ (X - X )
i =1
i
2
84

Q3: Find the sample estimate of the standard deviation. 12 = 3.464

Inferential Statistics - Concerned with using sample data to make an inference about a population of
data
4
A. Population - complete collection of all elements of interest in a particular study.
B. Sample - part of the population that is assumed to be representative of the population

Types of Statistical Studies

A. Experimental Study - one or more factors in the study are controlled for so data can be
obtained on how the factors influence the variables.

B. Non-experimental Study (Observational Study) - Do not attempt to control for the


influences of factors on the variables of interest. Observing events as they are in "nature."

Some vocabulary

A. Data - values that are collected, analyzed, and summarized


B. Data set - collection of data for a particular study
C. Subjects/Participants - entities on which data are collected (students)
D. Variable - Characteristic of interest (column)
E. Observation - set of measures collected on a particular subject or participant. (rows)

Types of Data

A. Quantitative Data - Observations measured on a numerical scale. These types of data


indicate how much or how many of something

B. Qualitative Data - Non-numerical data that can be classified into groups or categories.
These types of data are labels or names used to identify an attribute for each subject.

Scales of Measurement - Assignment of numeral to objects or events

A. Nominal Scale: Distinguish one object or event from another on the basis of a name. The
observations for the variable are labels that identify an attribute (gender, occupation, major,
etc.)

B. Ordinal Scale: Based on the relative amounts of some characteristic. This data can be rank
ordered. (taste preferences: bad, good, excellent)

C. Interval Scale: When objects or events can be distinguished from one another and ranked,
and when the differences between measurements have meaning. There is a fixed unit of
measurement (temperature)

D. Ratio Scale: When measurements have the properties of the previous three scales the
additional property that their ratios are meaningful. The zero point here is inherently defined
and must mean "nothing." (weight, price, area)

Methods of Describing Qualitative Data

5
Tabular Methods (Table)
Frequency - the frequency for a category is the number of observations that fall in that
particular category

Frequency distribution - a tabular summary showing the frequency of items in each of


the non-overlapping classes or categories

Relative Frequency - the relative frequency for a category is the frequency divided by
the total number of observations, n.

Relative Frequency distribution - tabular summary showing the relative frequencies in


each of the non-overlapping categories.

Example: A survey was taken in an intro to psych class from University College asking students what their
major is. Here is a tabular summary of the results:

Major Frequency Relative Frequency


1 = Psychology 18 18/50 = 0.36
2 = Biology 5 0.10
3 = Economics 9 0.18
4 = Communications 15 0.30
5 = Chemistry 3 0.06
Total n = 50 1.00

Q1: What is the most common major for the students in this class? (Psychology)
Q2: What percentage of students are chemistry majors? (6%)

6
Graphical Methods (Graph)

Bar Graph - The categories or classes are on the horizontal axis and the frequency for each class is on the
vertical axis. The bars should be of equal width and the spaces between them should also be of equal
width. The height of the bar is the frequency of the class. You should always start with the vertical axis
at 0.

Pie Chart - Circular chart used to represent the relative frequency distribution. The relative
frequency distribution is used to subdivide the circle into sectors whose size corresponds to the
relative frequency of each category.

Chemistry
6%

Communications
30% Psychology
36%

Economics
18%
Biology
10%
7
Methods of Describing Quantitative Data

Tabular Methods (Table)

Frequency – the frequency for a class (range of data) is the number of observations that
fall in that particular range

Relative Frequency – the relative frequency for a class is the frequency divided by the
total number of observations, n.

Cumulative Frequency - the cumulative frequency for a class is the total number of
values less then or equal to the upper class limit for each classification.

Cumulative Relative Frequency Distributions - the cumulative frequency divided by the


total number of observations, n.

Example: A frequency distribution table for the scores on midterm exam from the students in an intro
psych class.

Test Score Frequency Relative Freq Cumulative Freq Cumulative Rel. Freq.
10 - 29 11 0.22 11 0.22
30 - 49 2 0.04 13 0.26
50 - 69 13 0.26 26 0.52
70 - 89 15 0.30 47 0.82
90 - 109 6 0.12 47 0.94
110 - 130 3 0.06 50 1.00
Total 50 1.00

Q1: How many students scored less than 70? 26


Q2: How many students scored at least 70? 24
Q3: Proportion at most 49? 26%
Q4: Proportion at least 90? 18%

8
Graphical Methods

Histogram - bar graph of the frequency distribution. The vertical axis identifies the
frequencies for each class, and the horizontal axis is used for identifying the lower class
limits. The bars must touch.

16

14

12

10
10 - 29

30 - 49
8

50 - 69
6

70 - 89
4
90 - 109
2
110 - 130

Test Scores

You might also like