1/31
Statistics
       Descriptive Statistics
           Shaheena Bashir
             FALL, 2019
2/31
Outline
    Introduction
    Data
    Graphic Presentation
       Categorical Data
       Numeric Data
    Numeric Presentation
                           o
 3/31
 Introduction
Why Study Statistics
        IThe field of statistics uses numerical information obtained
         from samples to draw inferences about populations
       I Statistics is a general intellectual method that applies
         wherever data, variation and chance that are omnipresent in
         modern life appear.
       I Being able to provide sound evidence-based arguments and
         critically evaluate data based claims are important skills that
         all citizens should have.
       I The study of statistics provides students with tools, ideas and
         dispositions to react intelligently to information in the world
         around them. Reflecting this need to improve students ability
         to think statistically, statistical literacy and reasoning are
         becoming part of the mainstream school and university
         curricula in many countries.
                                                                           o
     As a consequence, statistics education is becoming a thriving field
 4/31
 Data
Data
        o
 5/31
 Data
Data
    Data, data, everywhere and we are forced to look at it.
     I 63% of the people polled support the president’s decision
        to· · ·
     I scientists at a major research university report that treatment
        of Parkinsons disease with a combination of drugs A and B
        has the potential to extend remission of the disease by an
        average of 2 years· · ·
     I the nations trade deficit narrowed last month, for the first
        time in· · ·
     I the Dow Jones Industrial Average rose again today to a new
        record high, marking the seventh consecutive day of record
        highs, but the broader market· · ·
     I despite the claims of robust economy, the % of families living
        below the poverty line has not dropped substantially over the
        past six months· · ·                                             o
 6/31
 Data
Data Presentation
    Once you have collected data,
        I   What do the numbers indicate?
        I   What will you do with it?
    For example, suppose you are interested in buying a house in a
    particular area. You may have no clue about the house prices, so
    you might ask your real estate agent to give you a sample data set
    of prices. Looking at all the prices in the sample often is
    overwhelming. A better way might be to look at the median price
    and the variation of prices. The median and variation are just two
    ways that you will learn to describe data. Your agent might also
    provide you with a graph of the data.
                                                                         o
 7/31
 Data
Data Types: Quantitative
        I   A quantitative variable is one for which the associated
            observations will be numerical and, therefore, such that the
            usual arithmetic manipulations make sense. Quantitative data
            then correspond to the measured numerical values of a
            quantitative variable
             1. Discrete: number of day-scholars, number of successful
                candidates in CSS, etc.
             2. Continuous: housing prices, rainfall, heights, etc.
                                                                           o
 8/31
 Data
Data Types: Categorical/Qualitative
        I   A categorical variable is one for which the associated
            observations are simply listings of physical characteristics or
            traits of the subjects or objects being studied. For example,
            eye color is a categorical variable, with categories brown, blue,
            green, etc. Categorical data then correspond to observed
            sample counts in each of the possible categories
             1. Nominal: having the disease vs not having
             2. Ordinal: pain on lickert scale (0-10; no pain to excruciating
                pain), etc.
                                                                                o
 9/31
 Data
Data Presentation
    The goal of statistics is to help researchers organize and interpret
    the data. Data can be described and presented in many different
    formats.
        I   Graphic Presentation
        I   Numeric Presentation
                                                                           o
10/31
Graphic Presentation
  Categorical Data
        I   The main purpose of some studies is to see how a set of data
            is distributed across a small set of categories or classes.
        I   If each observation falls into exactly one of the classes, we say
            that the classes partition the data collection.
               I     For example, the classes urban, suburban, and rural partition
                     new housing construction
               I     categories cats, dogs, and ’other’ partition domestic animals.
        I   The classes or categories in a partition are exhaustive and
            exclusive, meaning that they include every possible
            observation and they do not overlap, respectively.
                                                                                      o
 11/31
 Graphic Presentation
   Categorical Data
Contingency Tables
     Tabular arrangement to present categorical data
     Degrees earned in foreign langauges in 1992
                                BS      MS     PhD   Total
                        M       3990    971    378   5339
                        F       9913    1955   472   12340
                        Total   13903   2926   850   17679
         I   What percent of females take Doctorate degrees?
         I   What percent of MS degree holders were men?
                                                               o
 12/31
 Graphic Presentation
   Categorical Data
Contingency Tables
     Gender × Treatment Type
                                Controls   Treated     Total
                        M       42(75%)    41(73.2%)
                        F       14(25%)    15(26.8%)
                        Total
                                                               o
13/31
Graphic Presentation
  Categorical Data
    Bar graphs and pie charts that can be used to provide visual
    summarization of categorical data.
        I   Bar graphs can be used to display either the actual frequency
            counts or the relative frequencies (sample percentages) in a
            sample.
        I   Pie charts, on the other hand, are designed solely to visualize
            relative frequencies (sample percentages). It is most
            commonly used when we wish to pictorially display the sample
            relative frequencies, or percentages, rather than the raw
            frequencies for the various classes of the partition. It is
            particularly effective for displaying differences between two
            populations with respect to the same categories.
                                                                              o
 14/31
 Graphic Presentation
   Categorical Data
Cars Data Set
                             mpg      cyl    disp     hp       drat   wt     qsec
         Mazda RX4           21.00    6.00   160.00   110.00   3.90   2.62   16.4
         Mazda RX4 Wag       21.00    6.00   160.00   110.00   3.90   2.88   17.0
         Datsun 710           22.80   4.00   108.00   93.00    3.85   2.32   18.6
         Hornet 4 Drive       21.40   6.00   258.00   110.00   3.08   3.21   19.4
         Hornet Sportabout   18.70    8.00   360.00   175.00   3.15   3.44   17.0
         ..                  ..
          .                   .
                                                                               o
 15/31
 Graphic Presentation
   Categorical Data
Bar Chart
                                 Car Distribution
                        14
                        12
                        10
                        8
                        6
                        4
                        2
                        0
                             3          4           5
                                                        o
                                  Number of Gears
 16/31
 Graphic Presentation
   Categorical Data
Stacked Bar Chart
                                      Car Distribution by Gears and Engine Shape
                                                                                   s
                                 14
                                                                                   v
                                 12
                                 10
                        Counts
                                 8
                                 6
                                 4
                                 2
                                 0
                                       3                  4                  5
                                                                                       o
                                                    Number of Gears
 17/31
 Graphic Presentation
   Categorical Data
Pie Chart
                                      Car Distribution
                                                3 gears 47%
                                                              5 gears 16%
                        4 gears 38%
                                                                            o
18/31
Graphic Presentation
  Numeric Data
                       o
 19/31
 Graphic Presentation
   Numeric Data
Dot Plot
     A dotplot of a set of quantitative data is a technique for grouping
     observations that are equal. The horizontal axis is the scale of the
     variable being measured and a dot is placed above the value of
     each observation. Stacking the dots vertically above the outcome
     represents repeated values. This form of graphical display is only
     useful if there are a limited number of distinct outcomes among
     the sample data.
                                                                            o
 20/31
 Graphic Presentation
   Numeric Data
Dot Chart
                                            Collection A
                            4       6                 8                  10       12
                                            Collection B
                            4       6                 8                  10       12
                                                                                           o
 21/31
 Graphic Presentation
   Numeric Data
Stem and Leaf Plot
     Another way to display quantitative data for which the number of
     observations is not too large is known as a stem plot (or a stem
     and leaf display).
         I   The stem usually corresponds to the first digit (or digits) in a
             number and
         I   the leaf then represents the final digit.
                                                                                o
 22/31
 Graphic Presentation
   Numeric Data
Stem and Leaf Plot
         I   To produce the stem plot for a given choice of stem and leaf,
             the stems are listed in a column from smallest (at the top) to
             the largest (at the bottom).
         I   Then the leaf for each observation is recorded to the right in
             the row of the display containing the observation’s stem.
         I   For ease of interpretation, the leaves are also usually sorted
             from smallest to largest within a given stem.
                                                                              o
 23/31
 Graphic Presentation
   Numeric Data
Example
     Present the below data of scores of 10 students, in stem & leaf
     display.
                        3, 4, 5, 6, 7, 9, 10, 11, 12, 13
                                                                       o
 24/31
 Graphic Presentation
   Numeric Data
Histogram
     A histogram is a graph of numerical data for different categories of
     events, individuals, or objects.
          Step 1: Divide the range for the observed data values into a
                    reasonable number of interval classes of equal width.
          Step 2: Record the number of observations in each class,
                    either as a straight count or as a percentage of the
                    total number of observations in the data collection.
                    Thus, this step creates either a frequency or relative
                    frequency table for the data collection and our
                    particular choice of interval classes.
          Step 3: Graphically display the histogram. The horizontal
                    axis for this display corresponds to the units of
                    measurement for our observations, divided into the
                    interval classes specified in Step 1. Either frequency
                    or relative frequency is plotted on the vertical axis.   o
 25/31
 Graphic Presentation
   Numeric Data
Histogram
                                                   Distribution of Female Weights
                                    60
                                    50
                                    40
                        Frequency
                                    30
                                    20
                                    10
                                    0
                                         40   45     50      55       60    65      70   75
                                                                                              o
                                                             Weight in Kg
26/31
Numeric Presentation
        I   Frequency Distribution
                                     o
 27/31
 Numeric Presentation
Frequency Distribution
         I   A frequency distribution is a tabular arrangement of data that
             indicates the individual number of events, individuals, or
             objects in the separate categories.
         I   A cumulative frequency distribution indicates the successive
             addition of the number of events, individuals, or objects in the
             different categories of the histogram, which always sums to
             100%.
                                                                                o
 28/31
 Numeric Presentation
Example
    Sam’s team has scored the following numbers of goals in recent
    games
                        2, 3, 1, 2, 1, 3, 2, 3, 4, 5, 4, 2, 2, 3
    Put the number in order
                        1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5
                                Score     Frequency
                                  1           2
                                  2           5
                                  3           4
                                  4           2
                                  5           1
                                                                     o
    
https://www.mathsisfun.com/data/frequency-distribution.html
    c
 29/31
 Numeric Presentation
Example: Grouped Frequency Distribution
    These are the numbers of newspapers sold at a local shop over the
    last 10 days:
                        22, 20, 18, 23, 20, 25, 22, 20, 18, 20
                             Paper Sold     Frequency
                                18              2
                                19              0
                                20              4
                                21              0
                                22              2
                                23              1
                                24              0
                                25              1
    
https://www.mathsisfun.com/data/frequency-distribution-
     c
                                                                        o
    grouped.html
30/31
Numeric Presentation
                       Paper Sold   Frequency
                         15-19          2
                         20-24          7
                         25-29          1
                                                o
 31/31
 Numeric Presentation
Frequency Distribution of Weights
                        Weight   Frequency
                        40-45            4
                        45-50           24
                        50-55           77
                        55-60           61
                        60-65           26
                        65-70            7
                        70-75            1