ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
UNIT -1
 MEASURES OF CENTRAL TENDENCY
    A measure of central tendency is a single value that represents the center point of a dataset. This value can
also be referred to as “the central location” of a dataset.
    In statistics, there are three common measures of central tendency:
     The mean
     The median
     The mode
   1)   Mean:
       The mean represents the average value of the dataset.
       It can be calculated as the sum of all the values in the dataset divided by the number of values.
       In general, it is considered as the arithmetic mean.
       Some other measures of mean used to find the central tendency are as follows:
             Geometric Mean
             Harmonic Mean
             Weighted Mean
       The formula to calculate the mean value is given by:
                                          (OR)
   2) Median:
    Median is the middle value of the dataset in which the dataset is arranged in the ascending order or in
      descending order.
  V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                               Page 1
              ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
       When the dataset contains an even number of values, then the median value of the dataset can be found
       by taking the mean of the middle two values.
      Consider the given dataset with the odd number of observations arranged in descending order – 23, 21,
       18, 16, 15, 13, 12, 10, 9, 7, 6, 5, and 2
       Here 12 is the middle or median number that has 6 values above it and 6 values below it.
      Now, consider another example with an even number of observations that are arranged in descending
       order – 40, 38, 35, 33, 32, 30, 29, 27, 26, 24, 23, 22, 19, and 17
     When you look at the given dataset, the two middle values obtained are 27 and 29.
       Now, find out the mean value for these two numbers.
        i.e.,(27+29)/2 =28
      Therefore, the median for the given data distribution is 28.
   3) Mode:
    The mode represents the frequently occurring value in the dataset.
    Sometimes the dataset may contain multiple modes and in some cases, it does not contain any mode at
      all.
    Consider the given dataset 5, 4, 2, 3, 2, 1, 5, 4, 5
Since the mode represents the most common value. Hence, the most frequently repeated value in the given
dataset is 5.
  V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                      Page 2
                  ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
Harmonic Mean:
 The harmonic mean is another measure of central tendency, particularly useful when dealing with rates,
   ratios, or similar situations.
 It is defined as the reciprocal of the arithmetic mean of the reciprocals of a set of values.
 The formula for the harmonic mean of n numbers x1, x2, x3....xn.
         Example :
          Suppose we have a sequence given by 1, 3, 5, 7. To find the harmonic mean, we take the reciprocal of
          these terms. This is given as 1, 1/3, 1/5, 1/7. Next, we divide the total number of terms (4) by the sum of
          the terms (1 + 1/3 + 1/5 + 1/7). Thus, the harmonic mean = 4 / (1 + 1/3 + 1/5 + 1/7) = 2.3864.
Geometric Mean
  The geometric mean is another measure of central tendency that is particularly useful when dealing with
    values that are influenced by exponential growth or decay.
  It is the ‘n’ th root of the product of ‘n’ th values.
  The formula for the geometric mean(GM) of ‘n’ numbers are given as
          Example:
Weighted Mean:
   The Weighted mean is another measure of central tendency. The weighted Mean means the sum of the
      products of each value and its weight divided by the sum of the weights.
   The formula for the weighted mean(WM) of ‘n’ numbers are given as
          Example:
  V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                               Page 3
              ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
 MEASURES OF DISPERSION
      Measures of Dispersion measure the scattering of the data.
      It tells us how the values are distributed in the data set.
      In statistics, we define the measure of dispersion as various parameters that are used to define the
        various attributes of the data.
   
      These measures of dispersion capture variation between different values of the data.
Types of Measures of Dispersion:
There are two main types of dispersion methods in statistics which are:
           Absolute Measure of Dispersion
           Relative Measure of Dispersion
   1) Absolute Measure of Dispersion:
       The measures of dispersion that are measured and expressed in the units of data themselves are called
Absolute Measure of Dispersion. For example – Meters, Dollars, Kg, etc.
The types of absolute measures of dispersion are:
        1. Range: It is simply the difference between the maximum value and the minimum value
           given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
        2. Variance: The average squared deviation from the mean of the given data set is
           known as the variance. Variance (σ2)=∑(X−μ)2/N
        3. Standard Deviation: The square root o the variance is known as the standard deviation
           i.e. S.D. =√σ.
        4. Quartiles and Quartile Deviation: The quartiles are values that divide a list of numbers
           into quarters. The quartile deviation is half of the distance between the third and the first
           quartile.
        5. Mean and Mean Deviation: The average of numbers is known as the mean and the
           arithmetic mean of the absolute deviations of the observations from a measure of central
           tendency is known as the mean deviation (also called mean absolute deviation).
   2) Relative Measure of Dispersion:
    The relative measures of dispersion are used to compare the distribution of two or more datasets.
    This measure compares values without units. Common relative dispersion methods include:
       1. Co-efficient of Range
       2. Co-efficient of Variation
       3. Co-efficient of Standard Deviation
       4. Co-efficient of Quartile Deviation
       5. Co-efficient of Mean Deviation
 STANDARD DEVIATION AND VARIANCE
   Standard deviation:
 Standard deviation is the square root of the average of squared deviations of the items from their mean.
  Symbolically it is represented by σ.
 The square root of the variance is known as the standard deviation. i.e. S.D. =√σ.
  V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                              Page 4
                ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
 The formulae of the standard deviation(SD) is as given below
Variance:
    The average squared deviation from the mean of the given data set is known as the variance.
    The formulae for the Variance is as given below
     Variance (σ2)=∑(X−μ)2/N
  Example:1
  Question: Find the Standard Deviation of the data set. X = {2, 3, 4, 5, 6}
   Solution:
Given: n = 5, and observations xi = {2, 3, 4, 5, 6}
We know,
Mean(μ) = (Sum of Observations)/(Number of Observations)
⇒ μ = (2 + 3 + 4 + 5 + 6)/ 5
⇒μ=4
Variance (σ2) = Σ (xi – x̅)2/n
⇒ σ2 = 1/n[(2 – 4)2 + (3 – 4)2 + (4 – 4)2 + (5 – 4)2 + (6 – 4)2]
⇒ σ2 = 10/5 = 2
Thus, σ = √(2) = 1.414
 Example:2
Question: If a die is rolled, then find the variance and standard deviation of the possibilities.
Solution: When a die is rolled, the possible outcome will be 6.
So the sample space, n = 6 and the data set = { 1;2;3;4;5;6}.
To find the variance, first, we need to calculate the mean of the data set.
Mean, x̅ = (1+2+3+4+5+6)/6 = 3.5
We can put the value of data and mean in the formula to get;
σ2 = Σ (xi – x̅)2/n
σ2 = ⅙ (6.25+2.25+0.25+0.25+2.25+6.25)
σ2 = 2.917
    V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                     Page 5
                    ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
      Now, the standard deviation, σ = √2.917 = 1.708
     PROBABILITY:
      There are a number of events day to day life which one is not sure whether it will occur or not but one is
        always to known what chance is there for a happening or events to occur.
      Example:
         1. One may be interested to estimate weather it will rain today or not.
        2. One would like to evaluate his chance of winning for head in a definite number of tosses.
        3. What is the chance that there are four aces in one hand in a game of cards among four players.
       The numerical evaluation of chance factor of an event is known as probability.
       Probability theory starts with basic concepts such as random experiments, sample spaces, events, and the
         probability of events.
       A random experiment is any process or action that results in one of several possible outcomes, like tossing
         a coin or rolling a die.
       The sample space is the set of all possible outcomes of an experiment, and an event is a specific outcome
         or a set of outcomes.
       Probability Formula:
              P(E) = Number of favorable outcomes / Total number of outcomes
              Note : where P(E) denotes the probability of an event E.
       CENTRAL MOMENTS
  In probability theory and statistics, a central moment is a moment of a probability distribution of a
   random variable about the random variable’s mean.
   In other words, it represents the expected value of a specified integer power of the deviation of the random
   variable from its mean.
 The n-th central moment of a real-valued random variable X is denoted as μ_n and defined as: [ \mu_n = E[(X -
   E[X])^n] ] where (E) represents the expectation operator.
 Some key central moments include:
o Zeroth central moment (μ_0): Always equal to 1.
o First central moment (μ_1): Always 0 (distinct from the first raw moment or expected value).
o Second central moment (μ_2): Known as the variance (often denoted as σ^2), where σ represents the standard
  deviation.
o Third and fourth central moments: Used to define skewness and kurtosis, respectively.
        V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                      Page 6
                   ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
Random variables:
 A random variable is a numerical valued function defined on a sample space.
 A random variable is a variable which takes a definite set of values with a definite probability
   associated with each value.
 Random variables may be either discrete or continuous.
 A random variable is said to be discrete if it assumes only specified values in an interval.
   Otherwise, it is continuous.
 We generally denote the random variables with capital letters such as X and Y.
Types of Random Variables:
               Discrete Random Variable
               Continuous Random Variable
   1) Discrete Random Variable: A discrete random variable can take only a finite number of distinct
      values such as 0,1, 2, 3, 4,…
    Examples of discrete random variables include:
       The number of heads in the tossing of a coin.
              The number of accidents per year.
              The number of students that come to class on a given day.
              The number of points on the dice in its throw.
 2) Continuous Random Variables: A Continuous random variable which contains infinite number of
    possible values called continuous random variables.
    Examples of continuous random variables include:
                  The heights
                  The weights
                  Temperatures
                  Life periods
CORRELATION:
       Correlation denotes the relationship between two variables.
       If change in one variable is initiating change in the other variable then it is said that there is a
  V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                                  Page 7
                ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
         correlation between them.
        There are two kinds of correlation.
                    1) Positive Correlation
                    2) Negative Correlation.
        Positive Correlation means when the value of two variables moves in same direction.
        Negative Correlation means when the value of two variables moves in opposite direction.
 LINEAR AND RANK CORRELATION:
     Linear correlation and rank correlation are two different measures used to quantify the strength and
      direction of the relationship between two variables in a dataset.
     While linear correlation measures the degree to which two variables are linearly related.
     Rank correlation assesses the monotonic relationship between variables, regardless of the linearity of the
      relationship.
 Linear Correlation:
 Linear correlation, often represented by the Pearson correlation coefficient (r), measures the strength and
direction of a linear relationship between two continuous variables.
 It ranges from -1 to 1, where:
   r = 1 indicates a perfect positive linear relationship.
   r = -1 indicates a perfect negative linear relationship.
  r = 0 indicates no linear relationship.
 The Pearson correlation coefficient is calculated as the covariance of the two variables divided by the product
  of their standard deviations:
 Rank Correlation:
 Rank correlation measures the strength and direction of the relationship between two variables by comparing
their ranks.
 It is based on the ordinal position of data points rather than their actual values.
 Rank correlation coefficients, such as Spearman's rank correlation coefficient ( ρ) and Kendall's tau (τ), are
commonly used for this purpose.
  - Spearman's rank correlation coefficient (ρ)
  - Measures the strength and direction of the monotonic relationship between two variables.
  - It ranges from -1 to 1, where:
    ρ = 1 indicates a perfect monotonic increasing relationship.
    ρ = -1 indicates a perfect monotonic decreasing relationship.
    ρ = 0 indicates no monotonic relationship.
    V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                           Page 8
                ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
 - Kendall's tau (τ):
   - Measures the degree of correspondence between the rankings of two variables.
   - It ranges from -1 to 1, with similar interpretation to Spearman's rho(ρ).
 Linear correlation assesses the strength and direction of linear relationships between variables, while rank
 correlation evaluates the monotonic relationship between variables based on their ranks.
 COVARIANCE AND CORRELATION
     Covariance and Correlation are two mathematical concepts which are commonly used in the field of
      probability and statistics.
     Both concepts describe the relationship between two variables.
 Covariance:
1. It is the relationship between a pair of random variables where change in one variable causes change in
   another variable.
2. A positive covariance indicates that the variables to move in the same direction.
3. While a negative covariance indicates that the variables to move in opposite directions.
4. It is used for the linear relationship between variables.
5. It gives the direction of relationship between variables.
     V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                         Page 9
               ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
Correlation:
 Correlation measures the strength and direction of the linear relationship between two variables.
 It standardizes the covariance by dividing it by the product of the standard deviations of the two variables,
  resulting in a correlation coefficient (r) that ranges from -1 to 1.
 The most common type of correlation coefficient is the Pearson correlation coefficient, which is calculated as:
  V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                            Page 10
                 ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
 DISCRETE PROBABILITY DISTRIBUTIONS
 Bernoulli distribution:
 A Bernoulli distribution has only two possible outcomes, namely 1 (success) and 0 (failure), and a
 single trial.
 So the random variable X which has a Bernoulli distribution can take value 1 with the probability of
 success, say p, and the value 0 with the probability of failure, say q or1-p.
        The probability mass function is given by
 The probabilities of success and failure need not be equally likely, like the result of a fight between
 me and hulk. He is pretty much certain to win. So in this case probability of my success is 0.15 while
 my failure is 0.85.
 Here, the probability of success(p) is not same as the probability of failure. So, the chart below shows
 the Bernoulli Distribution of our fight.
 The expected value of a random variable X from a Bernoulli distribution is found as follows:
                         E(X)=1*p+0*(1-p) =p
 The variance of a random variable from a Bernoulli distribution is:
                         V(X)=E(X²)– [E(X)]² =p–p² =p(1-p)
  Binomial Distribution:
   Binominal appropriation is a discrete likelihood conveyance.
   This distribution was discovered by a Swiss Mathematician James Bernoulli.
   It is used in such situation where an experiment results in two possibilities - success and failure.
   Binomial distribution is a discrete probability distribution which expresses the probability of one set of two
    alternatives-successes (p) and failure (q).
   Binomial distribution is defined and given by the following probability function:
           Formula
     V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                             Page 11
                  ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
    Where −
   Example
   Problem Statement:
    Eight coins are tossed at the same time. Discover the likelihood of getting no less than 6 heads.
    Solution:
    Let p=probability of getting a head. q=probability of getting a tail.
  Poisson Distribution:
 Poisson conveyance is discrete likelihood dispersion and it is broadly use in measurable work.
 This conveyance was produced by a French Mathematician Dr. Simon Denis Poisson in 1837 and the
  dissemination is named after him.
 The Poisson circulation is utilized as a part of those circumstances where the happening's likelihood of an
  occasion is little, i.e., the occasion once in a while happens.
 For instance, the likelihood of faulty things in an assembling organization is little, the likelihood of happening
  tremor in a year is little, the mischance's likelihood on a street is little, and so forth.
      V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                         Page 12
                 ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
 All these are cases of such occasions where the likelihood of event is little.
 Poisson distribution is defined and given by the following probability function:
           Formula:
   Problem Statement:
   A producer of pins realized that on a normal 5% of his item is faulty. He offers pins in a parcel of 100 and
   insurances that not more than 4 pins will be flawed. What is the likelihood that a bundle will meet the ensured
   quality?
     V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                        Page 13
                  ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
    CONTINUOUS PROBABILITY DISTRIBUTIONS
  Exponential Distribution:
 Exponential distribution or negative exponential distribution represents a probability distribution to describe
  the time between events in a Poisson process.
 In Poisson process events occur continuously and independently at a constant average rate.
 Exponential distribution is a particular case of the gamma distribution.
   Probability density function:
   Probability density function of Exponential distribution is given as:
    Formula
    Where −
       λλ = rate parameter.
       xx = random variable.
   Cumulative distribution function:
   Cumulative distribution function of Exponential distribution is given as:
            Formula
      V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                      Page 14
                   ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
    Where −
           λλ = rate parameter.
           xx = random variable.
   Chi-square distribution:
 The chi-squared distribution (chi-square or X2 - distribution) with degrees of freedom, k is the distribution of a
  sum of the squares of k independent standard normal random variables.
 It is one of the most widely used probability distributions in statistics. It is a special case of the gamma
  distribution.
   Chi-squared distribution is widely used by statisticians to compute the following:
           Estimation of Confidence interval for a population standard deviation of a normal distribution using a
            sample standard deviation.
           To check independence of two criteria of classification of multiple qualitative variables.
           To check the relationships between categorical variables.
      V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                         Page 15
            ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
    To study the sample variance where the underlying distribution is normal.
    To test deviations of differences between expected and observed frequencies.
    To conduct a The chi-square test (a goodness of fit test).
V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                         Page 16
                ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
      In the above figure, we could see Chi-Square distribution for different degrees of freedom.
  Gaussian distribution:
 The Gaussian distribution, also known as the normal distribution, is one of the most important probability
distributions in statistics.
 It is a continuous probability distribution that is symmetric around its mean, forming a bell-shaped curve when
plotted.
The probability density function (PDF) of the Gaussian distribution is given by:
  Examples:
  Gaussian distribution: plot(Bell curve) & 2D
    V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                         Page 17
                  ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
  HYPOTHESIS TESTING
  Hypothesis testing is a statistical method used to make inferences about population parameters based on
    sample data.
  It involves formulating a hypothesis about the population parameter of interest and using sample data to
    determine whether there is enough evidence to reject the null hypothesis in favor of an alternative hypothesis.
  Statistical hypotheses are of two types:
 Hypothesis Testing for Means:
    Objective: To determine whether there is a significant difference between the mean of a sample and a
      hypothesized population mean.
    Example: Testing whether the mean weight of a sample of apples is significantly different from a
      hypothesized population mean of 150 grams.
    Common Tests: One-sample t-test, independent samples t-test, paired samples t-test.
 Hypothesis Testing for Proportions:
    Objective: To determine whether there is a significant difference between the proportion of successes in a
      sample and a hypothesized population proportion.
    Example: Testing whether the proportion of customers satisfied with a product is significantly different
      from a hypothesized population proportion of 0.70.
    Common Test: One-sample proportion z-test.
      V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                        Page 18
                  ASHOKA WOMENS ENGINEERING COLLEGE (AUTONOMOUS)
 Hypothesis Testing for Variances:
   Objective: To determine whether there is a significant difference between the variances of two populations.
   Example: Testing whether the variance of test scores in one school district is significantly different from
    the variance of test scores in another school district.
   Common Test: F-test for comparing variances.
 Hypothesis Testing for Correlations:
     Objective: To determine whether there is a significant linear relationship between two variables in a
      population.
     Example: Testing whether there is a significant correlation between hours of study and exam scores.
     Common Test: Pearson correlation coefficient test, Spearman's rank correlation coefficient test.
      V. ARUNA KUMARI-Asst. Professor-Dept of MCA                                                       Page 19