STATS Survey Sampling - process of obtaining info
Two Types of Data:
Stasistics 1. Primary Data acquired directly from
Latin: ratio status – study of practical politics source.
(TRAPO) 2. Secondary Data non-primary data
Statista – the person in political affairs Level of measurement - determines which
German: Statistik – political science of several statistical calculations are meaningful
countries 1. Nominal - Calculated using names,
Anglo –Saxon: Statistics – state and facts labels, or qualities, quali only
Singular- science and art 2. Ordinal - Arranged in order, but
Plural – processed data, set of quantitative data. differences between data entries are not
Data – Facts and figures meaningful, quali or quanti
2 Categories of statistics 3. Interval - A zero entry simply
1. Descriptive statistics – art, deals with represents a position on a scale; the
methods entry is not an inherent zero
2. Inferential statistics – science, deals 4. Ratio - a zero entry is meaningful
with analysis bar chart or bar graph
Role of statistics in decision making - is a chart with rectangular bars with
1. Opinion poll survey lengths proportional to the values that
2. Design/model they represent.
Universe – set of all individuals or entities - Bar charts are used for comparing two
under consideration or more values that were taken over time
2 Types of Universe or on different conditions, usually on
1. Finite – limit exists small data sets.
2. Infinite – large values/numbers horizontally oriented (also called bar chart)
Population – the entire group which where data vertically oriented (also called column chart)
is being gathered pie chart (or a circle graph)
Sample – subset or a part of a [population - is a circular chart divided into sectors,
Variable – characteristics of interest illustrating relative magnitudes or
Qualitative – descriptive, being observed frequencies
Quantitative – variable that take numerical - In a pie chart, the arc length of each
values, amount/quantity sector (and consequently its central
1. Discrete – assume finite, countably angle and area), is proportional to the
infinite number of values. quantity it represents.
2. Continuous – can take infinitely many Frequency Histogram - is a bar chart which
values, deals with decimals uses the FDT.
Methods of collecting data frequency polygon - we use only the frequency
1. Objective method- measurement, and the midpoint
counting observation Ogive - is also used to graph cumulative
2. Subject method - provided by frequency.
respondent less than” ogive - shows how many items in the
3. Use of Existing records – published distribution have a value less than the upper
statistics limit of each class
General classification of data greater than” “ogive - shows how many items
1. Census or complete enumeration – in the distribution have a value greater than or
process of gathering information from equal to the lower limit of each class
every unit of the population.
Textual presentation K = √N where N = number of
is an expository form describing a set of observations
information. This is a useful manner of • square root rule based on N
presenting limited amounts of information. • dependent on convention
Tabular 3. Determine the class interval size, C
- for large data set C = R/K
- organized in appropriate rows and • or class width
columns • distance from one upper limit
Graphical to the next upper limit
- attractive/presentable; relationships can 4. Compute the actual classes
easily be seen • lowest value (min) is in the lowest class
- bar, pie, line, pictogram, dot plot, stem • highest value (max) is in the highest
and leaf plot class
STEM-AND-LEAF PLOT • min is used as the lower limit of the
- hybrid of a table and a graph first class
- arrangement of a set of data by its stems • no overlapping
and leaves True Class Boundaries
- presents a histogram-like picture of the LTCB = LL – ½ (unit of measurement)
data, while allowing the experimenter to UTCB = UL +½ (unit of measurement)
retain the actual observed values of each Class Mark – the midpoint
data point CM = ½ (LL + UL)
Relative Frequency (RF)
- a quick way to obtain an informative
RF = F
visual representation of a data set
Total F
leaf the last digit
stem the first digit
Cumulative Frequency (<CF, >CF)
FREQUENCY DISTRIBUTION TABLE
<CF – no. of observation less than or
(FDT)
equal to the upper limit of a given
a statistical table for showing the
class interval.
frequency (no. of observations)
>CF - no. of observation greater than or
contained in each defined classes.
equal to the lower limit of a given class
Summarizes quantitative and qualitative
data
Measure of Central Tendency
Organizes large amount of data by - a value within the range of a set of data which
grouping the observations into non- describes its location or position relative to the
overlapping classes or categories entire set of data
- where clusters of data are located
Median Md
• A value which divides the array of data
• Quantitative FDT into two equal parts.
Steps in Constructing a Quantitative FDT • Denoted by Md.
• It is the middle value of an array.
1. Determine the range, R Array – raw data arranged in increasing or
R = max- min decreasing order of magnitude.
2. Compute the number of classes (class
intervals)
IQR = Q3 – Q1
range of the middle 50 % of the observation
Mode (Mo)
Semi-Interquartile Range
value(s) which occur most frequently on a • half of the IQR
given data set • tells the average deviation of
It is denoted by Mo. observations from the median
Can be determined for both quantitative and SIR = ½ (Q3 – Q1)
qualitative data
QUANTILES
Percentiles 4. Variance
• numbers which divide the array of data arithmetic mean of the squared
into 100 equal parts. deviations of observations from the mean
• the jth percentile, denoted by Pj, is the
number that separates the bottom j% of The population variance is denoted by
the data from the top (100-j) %.
2 and is given by the formula
• Best applied to large ( 100) data sets.
Deciles
• numbers which divide the array of data
Standard Deviation,
into 10 equal parts.
1st decile (D1) = 10th percentile (P10)
Most common measure of
2nd decile (D2) = 20th percentile (P20)…
variation.
10th decile (D10) = 100th percentile (P100)
A measure of spread about the
Quartiles
mean.
• numbers which divide the array of data
into 4 equal parts. N 2
xi
1st quartile (Q1)= 25th percentile (P25) i 1
=
Q2 = P50 N
Q3 = P75 Coefficient of Variability, CV
Q4 = P100
ratio of the SD over the mean of a given
In General: Quantile = n.p
population
Where n = no. of observations
p = proportion
Measure of Dispersion/Variability
CV = x 100%
A single value which measures the spread or
variability of the observations in a given data Measure of skewness (tail)
set. Measure departure from symmetry
Mean Absolute Deviation Measure of Kurtosis, K
average of the absolute values of all Measures the “heaviness” of the tails
deviations from the mean Degree of peakedness / flatness of a
N distribution
MAD = xi
i 1 Boxplot
N
- A boxplot provides a graphical
Where: N = total no. of observations representation of the data based on the
Xi = ith observed value five-number summary
= mean from ungrouped data
Inter-Quartile Range, (IR or IQR)