Name :                                                                              Groupe:
Exercice 1. A household survey of gas consumption yielded the following results
            Gas consumption Frequency (fi ) med-class (xi ) Cum. Freq. Rel.Cum. Freq.
                 0 9              1              4.5            1           0.01
                10 19             2             14.5            3           0.03
                20 29             1             24.5            4           0.04
                30 39             5             34.5            9           0.09
                40 49             8             44.5           17           0.17
                50 59            16             54.5           33           0.33
                60 69            19             64.5           52           0.52
                                                                                                                         (1)
                70 79            20             74.5           72           0.72
                80 89            17             84.5           89           0.89
                90 99            11             94.5          100             1
  1. Complete the above statistical table.
  2. What is the average gas consumption of these household ? Calculate the variance, standard deviation
     and coe¢ cient of variation.
      Since the data are grouped, we have :
                                               P7
                                                     fi xi   6650             (0.25)
                                        = X = Pi=1 7       =      = 66.5
                                                      f
                                                   i=1 i
                                                             100
                                                       1             X
                                                                     N
                                                                                             37800
                           2
                               = V ar(X) = PN                              fi (xi     )2 =         = 381.82      (0.5)
                                                    i=1 fi       1   i=1
                                                                                              99
      and                          v
                                   u
                                   u       1               X
                                                           N
                                                                                      p
                                 = tP    N
                                                                 fi (xi        )2 =       381:82 = 19.54.    (0.25)
                                         i=1   fi     1    i=1
      The coe¢ cient of variation is given by
                                                                           19:54                            (0.25)
                                  CV =          100 = CV =                          100 = 29.383%
                                                                           66:5
  3. Give the Q1 , Q3 ; D5 and P5 .
      The …rst quartile Q1 is located in the class 50 59 and the third quartle Q3 is in 80 89 and both
      are given by
                                        n                      100
                                            Cfq1                      17                  (0.5)
                          Q1 = L1 + 4                w = 50 +   4
                                                                            10 = 55
                                           fq1                     16
                                                                 1
      and
                                          3n                                    300
                                           4
                                                     Cfq3                        4
                                                                                           72                            (0.5)
                         Q3 = L1 +                              w = 80 +                               10 = 81.765.
                                                fq3                                   17
      Also, D5 and P5 are located in 60              69 and 30       39 respectively, and both are given by
                                          5n                                    500
                                                     Cfd5                                   33
                         D5 = L1 +        10
                                                                w = 60 +         10
                                                                                                       10 = 68.947       (0.5)
                                                fd5                                   19
      and
                                               5n                                     500
                                               100
                                                        Cfp5                          100
                                                                                                 4
                            P5 = L 1 +                               w = 30 +                           10 = 32.        (0.5)
                                                      fp5                                   5
                                      e
      Notice that D5 = Q2 =the Median X
  4. Plot the histogram and the Ogive plot of the above distribution then determine the mode and median
     of the series graphically
                                                                                                 (1)
                  (1)
  5. Give the modale class then calculate the mode value.
      Clearly, The highest frequency is for the class 70                                    b belongs and it’s given by
                                                                       79 to which the mode X
                   b = Lmo +              1                                     (20 19)
                   X                                        w = 70 +                                           10 = 72.5        (0.75)
                                      1   +     2                         (20   19) + (20              17)
Exercice 2. A statistician carries out a study on 200 executives of a company and seeks to study the
link between the age of the executives and the monthly salary (in thousands of DA) that they receive. He
presents his results in a contingency table, the statistical variable X representing age and the statistical
variable Y monthly salary
                         XnY       [18; 22[         [22; 26[   [26; 30[    [30; 34[        [34; 38[      ni:
                        [20; 28[     12                5          4           3               0          24
                        [28; 36[      7               11          9           4               1          32
                        [36; 44[      4                9         10           9               8          40
                        [44; 52[      1                6         12          11              15          45
                        [52; 60[      1                4         15          17              22          59
                          n:j        25               35         50          44              46         200
  1. Give the percentage of executives whose salary is greater than or equal to 26,000 DA.
                               n:3 + n:4 + n:5                       50 + 44 + 46
                                                            100% =                              100% = 70%            (0.5)
                                     N                                   200
                                                                2
2. Among executives aged at least 36, give the percentage of those who receive less than 30,000 DA.
             P5 P3
               i=3   j=1 nij            4 + 9 + 10 + 1 + 6 + 12 + 1 + 4 + 15
                              100% =                                           100% = 31% (0.5)
                   N                                    200
3. Give the conditional distribution of the age variable knowing that the salary received is in class
   [26,30[.
                        XjY 2 [26; 30[     [20; 28[   [28; 36[   [36; 44[   [44; 52[   [52; 60[
                                                                                                  (0.5)
                         F requency           4          9         10         12         15
4. Plot the scatter plot between age class [20; 28[ and age class [52; 60[: Comment.
                                                                                                      (1)
   The scatter plot shows a relatively strong negative linear relation between the two classes (0.5)
5. Calculate the correlation coe¢ cient and give the regression line between the two classes then plot it
   and give the value of the coe¢ cient of determination.
                                   X1 nij of X 2 [20; 28[ 12 5 4 3 0
                                   X2 nij of X 2 [52; 60[ 1 4 15 17 22
   The above scatter plot suggested the presence of linear correlation, hence, we choose pearson’s coef-
   …cient of correlation to measure it.
   We
   P have       P            P                     P                    P
      x1i = 24; x2i = 59; (x1i X1 )2 = 78:8; (x2i X2 )2 = 318:8; (x1i X1 )(x2i X2 ) = 140:2:
                                                               P
                                            Cov(X1 ; X2 )        (x1i X1 )(x2i X2 )
                  (X1 ; X2 ) = (X2 ; X1 ) =               = qP              qP
                                               X1 : X2                    2
                                                               (x1i X1 )       (x2i X2 )2
                                                                                                          (1)
                                   140:2
                           =p        p       = -0.88456
                              78:8     318:8
   which indicates the existence of a strong negative linear correlation between the two variables.
   Set
                            X1 = aX2 + b + "       and       X2 = a0 X1 + b0 + "0
   we have
                                           P
                        Cov(X1 ; X2 )       (x1i X1 )(x2i X2 )      140:2
                    a=                 =      P                =          = -0.43977;
                         V ar (X2 )              (x2i X2 )2        318:8
                                      24                 59
                    b = X1 aX2 =            ( 0:439 77)     = 9. 9893
                                      5                   5
                                                      3
                                        and
                                           Cov(X1 ; X2 )         140:2
                                      a0 =                =             = -1. 7792;
                                            V ar (X1 )           78:8
                                                         59                    24
                                      b0 = X2 aX1 =               ( 1: 779 )      = 20. 339
                                                         5                     5
      thus
                    X1 =      0:439 77X2 + 9: 989 3 + "          and       X2 =       1: 779 2X1 + 20: 339 + "0     (1.5)
      The coe¢ cient of determination in both cases is
                                              VE (X1 )     VE (X2 )
                                   R2 =                 =            = (X1 ; X2 )2 = 0.78245
                                             V ar (X1 )   V ar (X2 )                                    (0.5)
Exercice 3. I A diagnostic test for a disease is such that it (correctly) detects the disease in 90% of
the individuals who actually have the disease. Also, if a person does not have the disease, the test will
report that he or she does not have it with probability .9. Only 1% of the population has the disease in
question. If a person is chosen at random from the population and the diagnostic test indicates that she
has the disease
  1. what is the conditional probability that she does, in fact, have the disease?
      Set H : "a person has the disease" =) P (H) = 1% = 0:01; and E : "the test shows a positive result":
      Thus P (EjH) = 90% = 0:9 and P (EjH c ) = 1% = 0:1:
      We want to …nd P (HjE) ; using bayes’theorem, we have
                                                                                                                  (1.5)
                                 P (EjH) P (H)                                    0:9 0:01
              P (HjE) =                                     =                                     = 0.083333
                        P (EjH) P (H) + P (EjH c ) P (H c )   0:9                 0:01 + 0:1 0:99
  2. Are you surprised by the answer? Would you call this diagnostic test reliable?
      surprisingly, the test does not appear to be reliable, because even with a positive result, the probability
      of having the disease is negligible
II X and Y are independent, discrete random variables whose probability functions are given in the
tables below:
                                     x           1   2   3             y          1    2   3
                                                 1   1   1                        2    1   1
                                 P (X = x)       3   2   6
                                                                  P (Y = y)       3    6   6
  1. Compute the probability P (X + Y = 4).
          P (X + Y = 4) = P (X = 1; Y = 3) + P (X = 2; Y = 2) + P (X = 3; Y = 1)
                        X and Y are indep.
                               = P (X = 1) P (Y = 3) + P (X = 2) P (Y = 2) + P (X = 3) P (Y = 1)
                          11 11 12
                        =   +  +    = 0.25                                              (1.5)
                          36 26 63
  2. Compute the conditional probability P (X                2jX + Y = 4)
                                              P (X 2; X + Y = 4)   P (X = 1; Y = 3) + P (X = 2; Y = 2)
             P (X       2jX + Y = 4) =                           =
                                                  P (X + Y = 4)               P (X + Y = 4)
                        11
                        36
                           + 12 61
                    =              = 0.556                                                                        (1)
                          0:25
                                                             4
3. Give the expectation of g (X; Y ) = X 2 Y:
   Since X and Y are independent, we have E (g (X; Y )) = E (X 2 Y ) = E (X 2 ) E (Y ) with
                                                 1        1        1
                               E X 2 = 12          + 22     + 32     = 3.8333
                                                 3        2        6
                                                2       1        1
                                 E (Y ) = 1       +2      +3       = 1.5              (1.5)
                                                3       6        6
   Therefore, E (g (X; Y )) =5. 75
4. Give the CDF of the min (X; Y )
   Set Z = min (X; Y ) ; then, the CDF of Z is given by
                                FZ (z) = P (Z z) = 1 P (Z > z)
                                       = 1 P (min (X; Y ) > z)
                                       = 1 P (X > z and Y > z)                       (1.5)
                                       = 1 P (X > z) P (Y > z)
                                       = 1 (1 P (X z)) (1 P (Y            z))
                                       = 1 (1 FX (z)) (1 FY (z))