0% found this document useful (0 votes)
16 views3 pages

Statistics and Data Analysis Guide

The document defines key statistical concepts such as population, sample, quantitative vs qualitative data, discrete vs continuous data, and different sampling methods. It then discusses measures of central tendency like median, quartiles, and interquartile range. Additional concepts covered include standard deviation, variance, different types of distributions, skewness, bivariate distributions, percentiles, deciles, quartiles, and time series analysis methods.

Uploaded by

waheed.abdulr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views3 pages

Statistics and Data Analysis Guide

The document defines key statistical concepts such as population, sample, quantitative vs qualitative data, discrete vs continuous data, and different sampling methods. It then discusses measures of central tendency like median, quartiles, and interquartile range. Additional concepts covered include standard deviation, variance, different types of distributions, skewness, bivariate distributions, percentiles, deciles, quartiles, and time series analysis methods.

Uploaded by

waheed.abdulr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

 Population – Everyone or everything in a study

o Population mean is a parameter


 Sample – small subset/portion of the population
o Sample mean is a statistic
 For example choosing 100 individuals randomly out of 100,000 people
 Quantitative data involves numbers while Quantitive data is descriptive data in form of words
 Discrete data
o Data that can only take on certain values
o Associated with counting
 Continuous Data
o Data that can take on any value
o Associated with measurement
 A stratified random sample aims to represent each group or stratum in the population fairly
o The size of each group in the sample should be proportional to the size of each group in
the population
o Sample members are selected at random from every group in the population
 Quota sampling
o Similar to stratified random only difference is interviewer actually chooses people from
groups

 Median of a dataset is Q2,


 Q1 and Q3 are the medians of start of data till Q2 and median of Q2 till end of data respectively
 Q1 and Q3 come at 25 percent and 75 percent frequency mark in cumulative frequency table
 Interquartile range/IQR = Q3-Q1
 To find whether a point is anomalous, it must lie outside the range [Q1 - 1.5 IQR, Q3 + 1.5 IQR]
 Mean of grouped data set = ∑midpoint * frequency/∑frequency
 Weighted mean = ∑wx / ∑w where w is the weight of each value
 Standard deviation = sqrt( ∑(element – mean)2 / n)
 Variance
o Measure of spread of data—tells you how far your data is spread from the mean
o V = ∑(element – mean)2 / n

 Relative frequency distribution


o f / ∑f = RFD value
 Cumulative relative frequency
o Same as cumulative frequency but deals with relative frequency values
 Frequency density = Frequency/Class Width

 Dot plot
o Data set values on horizontal line and associated frequency as number of dots above
value
o Value with most dots is mode
o Values are in ascending order and have equal
divisions
 Box and Whisker plot
 Stem and Leaf plot
o Always helpful to have a key since conversion can
be different
o Can represent integers and floats
o Stem goes from 1,2,3,4 all the way till n
o Stem value corresponds to digit of highest house in dataset value
 Creating Histogram for ungrouped data:
o Make classes depending on range of data and then make a frequency distribution table
o Plot histogram
 Frequency polygon:
o Need classes, their associated frequency as well as the midpoints of each class
o When plotting, we need a value that is below our lowest midpoint value and a value
higher than our greatest midpoint value
o Coordinates of points will be (midpoint , associated frequency)
o Plot the graph and connect the dots to form a frequency polygon\
 Skewness
o If a frequency distribution is symmetric and unimodal, then median, mode and mean
are equal to one another
o If data is skewed to the right, then:
 mean is greater than median
 Mode is less than median
 Q3-Q2 > Q2-Q1 or max-Q3 > Q1-min
o Vice Versa for skewed to left
 Bivariate Distribution
o Explanatory Variable on X axis and Response variable on y axis
o Graph is usually scatterplot
o Can be described by Direction (Positive/Negative), Strength (Strong/Weak) and Form
(Linear/Non Linear)
 If graph has upward trend, then it is positive and vice versa
 If values are close to each other and spread out very little then it’s a strong
correlation – this only works when form is linear though

 Percentile
o Dividing Data in 100 equal portions
o 70th percentile for example means the data point where 70 percent of data is less than
the value of the data point and 30 percent is greater than it
oLk = (k/100) (n+1) where Lk indicates position of element from beginning of dataset for
kth percentile
o When Lk is a decimal, (for example 4.5)look at both the upper and lower position
elements and take their average (in this instance, take average of 4th and 5th element to
find kth percentile
o To find the percentile of an integer in the dataset:
 use the formula (x + 0.5y) * 100 / n = P
 x = number of data points less than the chosen value
 y= frequency of chosen value
 n= number of data points
o Round of answer to nearest integer to find the percentile of the number
 Decile and Quartile
o Dividing Data in 10 equal Portions = Decile
o Dividing Data in 4 portions = Quartile
o Q1=P25
o Dn= corresponding value of Cumulative Relative Frequency = 0.1 * n
 Where Dn is nth Decile
o If CRF =0.01 * n falls exactly between two distinct data points, then Pn will be the
average of those data points

Time Series

 Method of semi averages


o Divide data set into 2(or any number) portions of equal quantity, take average of all data
points in each portion to reduce it to 2 single data points.
o Draw a linear graph connecting both points and extrapolate it to find future value
o If data set contains uneven number of elements, then ignore middle element
 Moving averages/trend

You might also like