STATISTICS                                           STATISTICIAN
- Is          a      branch      of   applied        - Statistics practitioner is a person
       mathematics that involves the                        who uses statistical techniques
       collection, description, analysis,                   properly.
       and inference of conclusions from                -   An individual who works with the
       quantitative data.                                   mathematics of statistics. Work
   -   Is the study and manipulation of                     involves research that develops
       data; gather, review, analyze, and                   techniques and concepts that may
       draw conclusions from data.                          help the statistics practitioner
   -   Art and science of collecting,
       analyzing, and interpreting data.                KEY STATISTICAL CONCEPTS
          o Business Statistics                         •   Population – is the group of all
             o       Descriptive    Statistics –            items of interest to a statistics
                     deals with     methods of              practitioner.      (frequently        very
                     organizing,  summarizing,              large/infinitely large no.)
                     and presenting data in a                   ➢   Parameter – a descriptive
                     convenient            and                      measure of a population is
                     informative way; Graphical                     called a parameter.
                     Techniques/Numerical                       ➢   Census – the process of
                     Techniques                                     collecting data from the
                                                                    population
DATA – is raw, unorganized facts that need              •   Sample – a set of data drawn from
to be processed. Data can be something                      the studies population
simple and seemingly random and useless                         ➢   Statistic – a descriptive
until it is organized.                                              measure of a sample
             o       Unorganised and unrefined                  ➢   Survey – the process of
                     facts                                          collecting data from the
             o       Individual  unit that                           sample
                     contains raw materials             •   Statistical Inference      –     is   the
                     which do not carry any                 process of making an estimate,
                     specific meaning                       prediction, or decision about a
             o       Doesn’t     depend         on          population based on sample data.
                     information                               ➢ it is far easier and cheaper
             o       Raw data -> insufficient                       to take a sample from the
                                                                    population of interest and
INFORMATION – when data is processed,                               draw conclusions or make
organized, structured or presented in a                             estimates    about   the
given context.                                                      population on the basis
             o       Compromises processed,                         information provided by
                     organized data presented                       the sample
                     in a meaningful context                    ➢   Estimates         >           Test
             o       Group of data that carries                   Hypotheses > Conclusion
                     a logical meaning                  •   Measures of Reliability
             o       Depends on data                            ➢   Confidence level – is the
             o       Sufficient    for   decision                   proportion of times that an
                     making                                         estimating procedure will
                                                                    be correct
           ➢   Significance level   –           HIERARCHY OF DATA – The data types can
               measures how frequently          be placed in order of the permissible
               the   conclusion    will    be   calculations.
               wrong                                •   INTERVAL
                                                          ➢   Values are real numbers
TYPES OF DATA AND INFORMATION                             ➢   All calculations are valid
   •   Variable – is some characteristic                  ➢   Data may be treated as
       of a population or sample.                             ordinal or nominal
   •   Value    –    are    the   possible         •   ORDINAL
       observations of the variable                       ➢ Values must represent the
   •   Data – are the observed values of                      ranked order of the data
       a variable                                         ➢   Calculations based on an
           ➢ Datum – is the singular                          ordering process are valid
               form of data. Data is plural               ➢   Data may be treated as
   •   Interval – are real numbers, such                     nominal but not as interval
       as heights, weights, incomes, and           •   NOMINAL
       distances                                          ➢   Values are the arbitrary
           ➢   Quantitative/Numerical                         numbers that represent
           ➢   Ratio                                          categories
   •   Nominal – categories                               ➢   Only calculations based on
          ➢ Qualitative/Categorial                            the    frequencies     or
   •   Ordinal – appears to be nominal,                       percentages of occurrence
       but the difference is that the order                   are valid
       of their values                                    ➢   Data may not be treated
                                                              as ordinal or interval
   SAMPLING METHODS – A representative
   sampling methos guarantees a good            DATA BASED ON RESEARCH DESIGN
   statistical inference about    the              •   Qualitative data
   population.                                            - Include labels or names
   •  Probability (random) and Non-                           used     to    identify an
       probability (convenience)                              attribute of each element
   •   Population (all) vs Sample (part)                  -   Qualitative data use either
                                                              the nominal or ordinal
   SCALE OF MEASUREMENT
                                                              scale of measurement and
      •   Nominal Scale – consists of
                                                              may be nonnumeric or
           labels or names used to
                                                              numerical
           identify an attribute of the
                                                          -   Summarize:      count       the
           element
                                                              observation, compute the
       •   Ordinal Scale - appears to be
                                                              proportion by category
           nominal, but the difference is
                                                   •   Quantitative data
           that the order of their values
                                                          - Require numeric            values
       •   Interval Scale – with values
                                                              that indicate how much or
           (ordinal) and interval between
                                                              how many
           values is fixed (real numbers)
                                                          -   Quantitative  data    are
       •   Ratio Scale – all properties and
                                                              obtained using either the
           absence of the variable at zero
                                                              interval or ratio scale of
           point
                                                              measurement
           -   Summarize:      Arithmetic                                 log (n); n is the
               operation is meaningful                                    sample size
           -   Cross-sectional data –                                -    class width
               are data collected at the
               same or approximately the          SHAPES OF HISTOGRAM
               same point in time                    •   symmetry – a histogram is said to
           -   Time series data – are                    be symmetric if, when we draw a
               data collected over several               vertical line down the center of the
               time periods                              histogram, the two sides are
                                                         identical   in   shape   and   size.
RELATIONSHIP BETWEEN VARIABLES
   •   Univariate – techniques applied to
       a single set of data
   •   Bivariate – there are many
       situations where we wish to depict
       the relationship between variables            •   skewness – a skewed histogram is
   •   Cross-classification table (cross-                one with a long tail extending to
       tabulation table) – is used to                    either the right or the left. The
       describe the relationship between                 former is called positively skewed,
       two nominal variables                             and the latter is called negatively
                                                         skewed.
DESCRIBING DATA
   •   Descriptive statistical methods are
       used to summarize data sets so
       that we can extract the relevant
       information
   •   Bar charts,      pie    charts,      and
       frequency        distributions       are
       employed    to    summarize       single
       sets of nominal data.
   •   Interval Data
           ➢   Histogram – created by
               drawing rectangles whose              •   Number of Modal Classes – a
               bases are the intervals and               mode is the observation that
               whose heights are the                     occurs with the greatest frequency.
               frequencies.                              A modal class is the class with the
           ➢   Classes    –        create    a           largest number of observations
               frequency distribution for            •   Unimodal histogram – is one with
               interval data by counting                 a single peak. A special type of
               the       number         of               symmetric unimodal histogram is
               observations that fall into               one  that    is  bell   shaped.
               each series of intervals.
           ➢   Class Interval
                   -     sturges     formula
                         Number     of class
                         intervals = 1 + 3.3         •   Bimodal histogram – is one with
                                                         two peaks not necessarily equal in
         height
   •     Cross-sectional        Data            –
         Observations at the same point in
         time
                                                    Scatter Diagram
   •     Time-series Data – represent
                                                       - To know the relationship of two
         measurements   at successive
                                                           Interval Data
         points in time
                                                       -   The      two    most     important
             ➢ Line Chart        –     plot    of
                                                           characteristics are the strength
                  variable over time
                                                           and direction of the linear
                                                           relationship.
MEASURING INFLATION
                                                       -   To determine the strength of the
   •     Inflation – is the increase in the
                                                           linear relationship: draw a straight
         prices for goods and services.
                                                           line through the points in such a
   •     Consumer Price Index (CPI) –
                                                           way that the line represents the
         works with basket of some 300
                                                           relationship. If most points fall
         goods and services in the United
                                                           close to the line, there is a linear
         States (also in other countries),
                                                           relationship.
         including such diverse items as
         food,      housing,      clothing,
         transportation,     health,          and
         recreation.
   •     Basket – is defined for the “typical”
         or   “average”      middle-income
         family, and the set of items and
         their    weights   are    revised
         periodically (10 years-         United
                                                       -
         States; 7 years – Canada)                     -   There    are     other  types    of
   1.    Compute the inflation adjusted                    relationships, such as quadratic or
         values                                            exponential one
            ➢ Use the CPI
   2.    Convert the CPI from months to
         year
            ➢     One year as base for the
               index
   3.    Compute the inflation adjusted
         values. Use 2012 as the base year
             ➢ Compute the 2012 base
             ➢    Compute the 2012 CPI
             ➢    Compute    the    inflation
                  adjusted values
OTHERS
                                                       -
        Direction                                 -   Figure without scale. No y-axis
            ➢ Positive        –    dependent          scale.
                variable increases when
                independent increase
            ➢   Negative – dependent
                variable decreases when
                independent increase
   -    in interpreting the results of a
        scatter diagram it is important to
        understand that if two variables          -   Graphs with different caption. For
        are linearly related it does not              the same graph, interpretation
        mean that one is causing the                  might be different due to the
        other. We can express this more               caption.
        eloquently as Correlation is not
        causation
   GRAPHICAL EXCELLENCE
   1.   the graph represents large data
        sets concisely and coherently.
        Graphical techniques -> large data        -   Showing a big drop in your graph.
        sets; Small Data sets -> table; One           For this, percentage form in the y-
        or two numbers -> sentence.                   axis           is        preferred.
   2.   The ideas and concepts             the
        statistics practitioner wants to
        deliver are clearly understood by
        the viewer. Chart is designed to
        describe what would otherwise be
        described in words.
   3.   The graph encourages the viewer
        to compare two or more variables.
        Graphs are often best used to             -   The first chart shows almost no
        depict relationships between two              difference in scale. But when
        or more variables or to explain how           adjusted, an increase in sales can
        and why        the   observed   results       now be observed. Expanding the
        occurred.                                     scale is usually truncated (zigzag)
   4.   The display induces the viewer to             to show the vertical axis begins not
        address the substance of the data
        and not the form of the graph.
   5.   There is no distortion of what the
        data reveal.
GRAPHICAL DECEPTION
       at            zero              value.      ➢    Ogive:       Relative       Frequency
                                                        distribution
                                                        Time Series (measuring inflation)
                                                   ➢    Line chart
                                                   ➢    Scatter diagram
                                                OGIVE
                                                   •    Ogive     is       for      graphical
                                                        representation
                                                   •    Frequency distribution lists the
                                                        number of observations that fall
                                                        into each class interval.
   -   The first chart shows volatility. The
                                                   •
       second chart shows stability.
                                                   •    relative   frequency      distribution
                                                        highlights the proportion of the
                                                        observations that fall into each
                                                        class
                                                   •    cumulative     relative     frequency
                                                        distribution     highlight     the
                                                        proportion of observations that lie
   -
   -   Bar chart width should be the                    below each of the class limits.
       same
                                                   APPLICATION       OF    HISTOGRAM       IN
                                                   FINANCE
                                                       • stock and bond valuation – a
                                                            basic understanding of how
                                                            financial assets, such as stocks
                                                            and bonds, are valued is
   -                                                        critical to good financial
   -   When using pictogram, the width                      management.       Both    are
       should be the same                                   considered long-term financial
                                                            assets. Valuation is necessary
                                                            for capital budgeting and
                                                            capital structure decision.
                                                        •   Return on Investment – is
                                                            calculated by dividing the gain
   -                                                        (or loss) by the value of the
                                                            investment.
                                                            e.g. $100 investment that is
GRAPHICAL TECHNIQUES                                        worth $106 after 1 year has a
   - Cross-sectional        data:   Nominal,                6% rate of return. A $100
       Ordinal and Interval                                 investment that loses $20 has
       Interval Data (refer to activity)
   ➢   Histogram
   ➢   Stem-and leaf display
            a   -20%     rate   of   return.
                                               DESCRIPTIVE TECHNIQUES
                                                  •   Measures of Central Location
                                                         ➢ Mean – arithmetic mean or
                                                              simply the average.
COMPARING TWO INVESTMENTS                                     Code: =average ([input
Finance                                                       range)]
    1. Maximize the rate of return on
         investment
             •  Histogram – the center of
                the histogram gives us                    ➢   Median – calculated by
                information  about  the                       placing all the observation
                return one might expect                       in order. The observation
                from the investment                           that falls in the middle. The
    2.   Reduce Risk                                          sample and population
            •   Histogram – the spread                        medians are       computed
                variation of the histogram                    the same way.
                provides    us    guidance                    Code: =median         ([input
                about the risk. Narrow                        range)]
                spread means confidence                   ➢   Mode – the observation
                in prediction. Wide spread                    that occurs with   the
                means uncertainty.                            greatest frequency. Both
                                                              statistic and parameter
LINE CHART – Is a plot of the variable over
                                                              are computed the same
time. It is created by plotting the value of
                                                              way.
the variable on the vertical axis and the
time periods on the horizontal axis.                     What measure to use?
                                                  -   Mean – usually the first; interval
                                                  -   Median – not sensitive to extreme
                                                      values; ordinal or interval
                                                  -   Mode – nominal, ordinal, interval
                                                  -   Geometric mean – growth rates;
                                                      interval
                                                          ➢ Geometric Mean – let Ri
                                                              denote the rate of return
                                                              (in decimal form) in period
SCATTER      DIAGRAM     (Scatterplot)    –                   i (i= 1,2,…,n). the geometric
describes the relationship between two                        mean Rg of the returns R1,
variables.                                                    R2, …, Rn is defined such
                                                              that
                                                          ➢   Code: =geomean ([input
                                                              range)]
                                                          what measure to use?
-   Mean – usually the first                               distance between each
-   Median – not sensitive to extreme                      data point and the mean.
    values
-   Geometric Mean – use to find the
    average of growth rate or rate of
    change, in a variable over time.
-   Arithmetic mean of n returns                       ➢
                                                       ➢   Interpretation: unit of the
    (growth rates) – is the appropriate
                                                           standard deviation is the
    mean to calculate if you wish to
                                                           same as the unit of the
    estimate the mean rate of return
                                                           original data.
    for any single period in the future.
         ➢ Factors that identify what      NORMAL DISTRIBUTION
           measure to use?                    •   Mean = Median = Mode
-   Mean – interval                           •   Symmetry about the center
-   Median – ordinal or interval (with        •   For a bell shape histogram, we
    extreme observation)                          apply the empirical rule:
-   Mode – Nominal, ordinal, interval             1.   Approximately 68% of all
•   Measures   of     Variability     -                observations fall within one
    Measures the spread or variability                 standard    deviation   of   the
    of the data                                        mean.
        ➢   Range – calculated using              2.   Approximately 95% of all
            two measures: Largest and                  observations fall within two
            Smallest value                             standard   deviations   of   the
                                                       mean.
                                                  3.   Approximately 99.7% of all
                                                       observations fall within three
                                                       standard   deviations   of   the
                                                       mean.
                                           CHEBYSHEV’S THEOREM
                                              •   A more general interpretation of
                                                  the standard deviation, which
                                                  applies to all shapes of histograms.
                                              •   The proportion of observations in
                                                  any sample or population that lie
                                                  within k standard deviations of the
                                                  mean is at least
        ➢   Mean absolute deviation
            (MAD)  –    the  mean
            absolute deviation of a
            dataset is the average            •
        For Skewed histogram                       •   Percentile – the Pth percentile is
            ➢ When k=2, chebyshev’s                    the value for which P percent are
                  theorem states that at               less than that value and (100 – P)%
                  least three-quarters (75%)           are greater than that value.
                  of all   observations lie        •   Quartile – measures of relative
                  within   two     standard            standing for dividing dataset into
                  deviations of the mean               quarters.
            ➢     When k=3, chebyshev’s            •   Q1 – first/lower Quartile; Q2 –
                  theorem states that at               Second/middle       quartile;   Q3   –
                  least eight-ninths (88.9%)           Third/upper quartile
                  of all   observations lie
                  within   three   standard
                  deviations of the mean.
COEFFIECIENT OF VARIATION (CV) – a set             PERCENTILE
of observations is the standard deviation of
the observations divided by their mean:
                                                   INTERQUARTILE RANGE – measures the
                                                   spread of the middle 50% of the
                                                   observation. Large values of this
                                                   statistic mean that the first and third
                                                   quartile are far apart, indicating a high
                                                   level of variability
KURTOSIS – A measure of the tailedness of
a distribution. Tailedness is how often
outliers occur.
                                                   BOX PLOTS – this technique graphs five
                                                   statistics: the minimum and maximum
                                                   observations, and the first, second, and
                                                   third quartiles. It also depicts other
                                                   features of a set of data.
                                                   -   The three vertical lines of the box
                                                       are the first, second, and third
                                                       quartiles. The lines extending to the
                                                       left and right are called whiskers.
                                                   -   Any point that lie outside the
                                                       whiskers are called outliers. The
MEASURES OF RELATIVE STANDING
                                                       whiskers extend outward to the
   •  Describe the position of particular
                                                       smaller of 1.5 times the interquartile
        values relative to the entire data
                                                       range or to the most extreme point
        set.
                                                       that is not an outlier.
    •   Observed in median           =      50th
        percentile = 2nd quartile
                                                   ‘
OUTLIERS                                    LEAST SQUARES METHOD
- Are unusually       large   or   small       - Used to produce a straight-line
    observations. Because an outlier is            equation
    considerably removed from the              -   Produces a straight line drawn
    main body of the data set, its                 through the points so that the sum
    validity is suspect.                           of squared deviations between the
-   Outliers should be checked to                  points and the line is minimized.
    determine that they are not the
    result of an error in recording their
    values.
-   Can   also   represent   unusual
    observations  that   should be
    investigated.
DESCRIPTIVE TECHNIQUE
MEASURES OF LINEAR RELATIONSHIP
    •   Related to scatter diagram,
        which shows the relationship
        between two interval variables.
        It is only limited in showing the
        direction and strength of the
        linear relationship
            ➢   Covariance
                                                           ➢   Coefficient        of
                                                               Determination (r^2) –
            ➢   Coefficient           of
                                                               measures the amount
                Correlation
                                                               of variation in the
                                                               dependent       variable
                                                               that is explained by the
                                                               variation  in    the
                                                               independent variable.
                                                               – we calculate it by
                                                               squaring          the
coefficient   of
correlation