(IN) Measures
(IN) Measures
  Population (Tổng thể) - all items of interest for a particular decision or investigation
                                                                                 - all married drivers over 25 years old
                                                                                 - all subscribers to Netflix
                                                                                Sample (Mẫu) - a subset of the population
                                                                                      The median specifies the middle value when the data are
Purchase Orders database
                                                                                       arranged from least to greatest.
Using formula:
                                                                                       ◦ Half the data are below the median, and half the data are above
                                                                                         it.
                                                                                       ◦ For an odd number of observations, the median is the middle of
                                                                                         the sorted numbers.
=SUM(B2:B95)/COUNT(B2:B95)
                                                                                       ◦ For an even number of observations, the median is the mean of
                                                                                         the two middle numbers.
Mean = $2,471,760/94
       = $26,295.32                                                                   We could use the Sort option in Excel to rank-order the data
                                                                                       and then determine the median. The Excel function
Using Excel AVERAGE Function                                                           =MEDIAN(data range) could also be used.
=AVERAGE(B2:B95)                                                                      The median is meaningful for ratio, interval, and ordinal data.
                                                                                      Not affected by outliers.
                                                                                                                                                                             1
                                                                                                                 12/7/2024
    Sort the data from smallest to largest. Since we      The mode is the observation that occurs most
     have 90 observations, the median is the average        frequently.
     of the 47th and 48th observation.                     The mode is most useful for data sets that contain
                                                            a relatively small number of unique values.
Median =
($15,562.50 + $15,750.00)/2
                                                           You can easily identify the mode from a frequency
= $15,656.25                                                distribution by identifying the value or group
                                                            having the largest frequency or from a histogram
=MEDIAN(B2:B94)
                                                            by identifying the highest bar.
                                                           Excel function: =MODE.SNGL(data range).
                                                           For multiple modes: =MODE.MULT(data range)
   Purchase Orders
    database: A/P Terms                                    The midrange is the average of the greatest and
                                                            least values in the data set.
     Mode = 30 months
                                                           Caution must be exercised when using the
                                                            midrange because extreme values easily distort
                                                            the result. This is because the midrange uses
                                                            only two pieces of data, whereas the mean uses
   Cost per order
                                                            all the data; thus, it is usually a much rougher
     Mode is the group                                     estimate than the mean and is often used for only
      between $0 and                                        small sample sizes.
      $13,000
   Purchase Orders data                                The Excel file Computer Repair Times includes 250
   Use the Excel MIN and MAX functions or sort the     repair times for customers.
    data and find them easily.                             What repair time would be
   Cost per order midrange:                                reasonable to quote to a
    = ($68.78 + $127,500)/2                                 new customer?
    = $63,784.89                                           Median repair time is 2
                                                            weeks; mean and mode are
                                                            about 15 days.
                                                           Examine the histogram.
                                                                                                                        2
                                                                                                                        12/7/2024
   The range is the simplest and is the difference                  Purchase Orders data
    between the maximum value and the minimum                        For the cost per order data:
    value in the data set.                                            ◦ Maximum = $127,500
   In Excel, compute as =MAX(data range) -                           ◦ Minimum = $68.78
    MIN(data range).                                                 Range = $127,500 - $68.78 = $127,431.22
   The range is affected by outliers, and is often used
    only for very small data sets.
                                                                                                                               3
                                                                                                                                            12/7/2024
   The standard deviation is the square root of the                    Purchase Orders Cost per order data
    variance.
    ◦ Note that the dimension of the variance is the square of the      Using the results of Example 4.8, take the square
      dimension of the observations, whereas the dimension of the
      standard deviation is the same as the data. This makes the         root of the variance:
      standard deviation more practical to use in applications.
   For a population:
    ◦ In Excel: =STDEV.P(data range)                                    Alternatively, use the STDEV.S function for the
   For a sample:                                                        data range.
Excel file: Closing Stock                                                   For any data set, the proportion of values that lie
   Prices
                                                                             within k (k > 1) standard deviations of the mean
Intel (INTC):
 Mean = $18.81                                                               is at least 1 – 1/k2
 Standard deviation = $0.50                                                 Examples:
General Electric (GE):                                                   ◦    For k = 2: at least ¾ or 75% of the data lie within two
 Mean = $16.19                                                                standard deviations of the mean
 Standard deviation = $0.35                                              ◦    For k = 3: at least 8/9 or 89% of the data lie within three
                                                                              standard deviations of the mean
INTC is a higher risk
investment than GE.
                                                                                                                                                   4
                                                                                                                                                          12/7/2024
Empirical rules
   A standardized value, commonly called a z-score,                      The numerator represents the distance that xi is from the
    provides a relative measure of the distance an                         sample mean; a negative value indicates that xi lies to
    observation is from the mean, which is independent                     the left of the mean, and a positive value indicates that it
    of the units of measurement.                                           lies to the right of the mean. By dividing by the standard
                                                                           deviation, s, we scale the distance from the mean to
   The z-score for the ith observation in a data set is
                                                                           express it in units of standard deviations. Thus,
    calculated as follows:
                                                                           ◦ a z-score of 1.0 means that the observation is one standard
                                                                             deviation to the right of the mean;
                                                                           ◦ a z-score of 2 1.5 means that the observation is 1.5 standard
                                                                             deviations to the left of the mean.
    ◦ Excel function: =STANDARDIZE(x, mean, standard_dev).
                                                                                                                                                                 5
                                                                                                                                             12/7/2024
   Purchase Orders Cost per order data                                  The coefficient of variation (CV) provides a relative
                                                                          measure of dispersion in data relative to the mean:
                                      =(B2 - $B$97)/$B$98, or
                                      =STANDARDIZE(B2,$B$97,$B$98).
                                                                                                                                                    6
                                                                                                                                                  12/7/2024
   Kurtosis (độ nhọn) refers to the peakedness (i.e., high,                 Comparing measures of location can sometimes reveal
    narrow) or flatness (i.e., short, flat-topped) of a                       information about the shape of the distribution of
    histogram.                                                                observations.
   The coefficient of kurtosis (CK) measures the degree of                   ◦ For example, if the distribution were perfectly symmetrical and
    kurtosis of a population                                                    unimodal, the mean, median, and mode would all be the same.
                                                                              ◦ If it were negatively skewed, we would generally find that
                                                                                mean < median < mode
                                                                              ◦ Positive skewness would suggest that mode < median < mean
 Sample mean:
 Population variance:
 Sample variance:
                                                                                                                                                         7
                                                                                                                           12/7/2024
   The proportion, denoted by p, is the fraction of           Proportion of orders placed by Spacetime Technologies
    data that have a certain characteristic.                    =COUNTIF(A4:A97, “Spacetime Technologies”)/94
                                                                = 12/94 = 0.128
   Proportions are key descriptive statistics for
    categorical data, such as defects or errors in
    quality control applications or consumer
    preferences in market research.
                                                                                                                                  8
                                                                                                                                                                    12/7/2024
                                                                                                                                                                           9
                                                                                                                                           12/7/2024
   Colleges and Universities data                                    When using the CORREL function, it does not
                                                                       matter if the data represent samples or
                                                                       populations. In other words,
                                                                   CORREL(array1,array2) =
                                                                   COVARIANCE.P(array1,array2) / STDEV.P(array1)*STDEV.P(array2)
and
                                                                   CORREL(array1,array2) =
                                                                   COVARIANCE.S(array1,array2) / STDEV.S(array1)*STDEV.S(array2)
                                                                                                                                                 10
                                                                                                                                                  12/7/2024
11