AN INTRODUCTION TO STATISTICS
Errors
                                                                   2
• Error may be defined as a deviation from the absolute value or from
  the true average of a large number of results.
• Absolute error (E) is the difference between an observed or
  measured value (0) and the true (most probable) value (T) , with
  regard to the sign and it is reported in the same units as the
  measurement.       E=0–T
Example 1
• If a 2.62 g sample of material analyzed to be 2.52 g.
• The absolute error (E) = 2.52 – 2.62 = -0.10 g.
• If   the   measured     value   is   the   average   of   several
                                                                      3
  measurements, the error is called the mean error.
Example 2
In the titration of 10 ml of 0.1 N NaOH with laboratory prepared
0.1 N HCl, the true value is 9.9 ml, and we have: 10.1, 9.9, 9.8,
10.2, 10.1 observed values,
So, Mean of the observed values =
summation of observed values / their number
= (10.1 + 9.9 + 9.8 + 10.2 + 10.1) / 5 = 10.02 ml.
And mean error = 10.02 – 9.9 = 0.12 ml.
• OR the mean error can be also calculated by taking the average of
   differences of the individual test results and the true value (with 4
   regard to the sign)
• The average differences = Mean error
= (+0.2 + 0.0 - 0.1 + 0.3 + 0.2)/5 = + 0.6/5 = 0.12 ml.
• Relative Error It is the absolute or mean error (E) expressed as a
  percentage (%) of the true value (T)
• In example,1:
Relative Error % = (-0.10/2.62) x 100% = - 3.8 %
• In example,2:
Relative Error % = (0.12/9.9) x 100% = 1.21%
                            Types of Errors
(A) Determinate or systemic (constant) errors:
                                                                       5
can be determined and avoided. Some sources of these errors are:
1- Errors due to equipments:
Non-calibrated measuring devices (e.g. flasks, weight balances etc.)
2- Errors due to impure reagent:
                                                                6
By using chemical reagents or solvents containing impurities.
3- Personal errors:
Inaccurate analyst (e.g. observation of colour changes).
4- Operational errors:
Such as loss of material during evaporation
or adsorption of moisture during experiments.
5- Errors of the method:
• Solubility of the precipitate in the mother liquor.
• Decomposition during ignition.
• Side reactions or incomplete reaction.
(B) Indeterminate (random, accidental or chance) errors:
These errors occur by accident or due to uncontrolled causes.
• They cannot be determined or corrected because of the natural fluctuations that occur in all measurements.
                                                                                                               7
• These errors follow a random distribution, and mathematical laws of probability
• These errors should follow “normal distribution curve”, or “Gaussian curve” .
• It is apparent that very large errors
are few and there should be an equal
number of positive and negative errors.
                        Minimization of determinate errors
                                                                                                       8
1- Calibration of apparatus:
2- Running a blank determination:
By carrying out a separate determination, in which the sample
being omitted, under exactly the same experimental conditions.
3- Running of parallel determination:
Duplicate or triplicate determinations.
4- Running a control determination:
By carrying out a determination identical to the unknown using standard
substance.
  Results found for standard weight of constituent in the standard
  -------------------------------------- = ---------------------------------------------------------
  Result found for unknown (weight of constituent in the unknown)
5- Use of independent method of analysis:
E.g.1: Determination of HCl by                                                      9
       titration with standard sodium carbonate (acid-base titrimetry)
       precipitation with AgNO3 and weighing (gravimetry).
See other examples in your notes.
6- Standard addition method: (to solve matrix effects)
•   Adding a known amount of standard being determined to the sample, then analyzing
    for the total content.
•   The difference between results of samples with and without the added standard
    amount gives the recovery of the added substance.
•   If the recovery is satisfactory our confidence in the accuracy of the procedure is
    enhanced. If not, so there is an interference from other excipients or combined
    drugs.
7 – Internal standard (I.S.) method:
• Important in chromatographic determinations (e.g. HPLC).                    10
• It involves adding a fixed amount of the I.S. to a series of known
   concentrations of the material to be tested.
• The ratios of the peak area or peak height of the I.S. and the series of known
   concentrations are plotted against the concentration values. This should give a
   straight line.
  Quantitative Analysis                                     Calibration curve
                                                                                             11
External Standard Method
                                          Peak height
                                                               Concentration
  0      10   0     10   0     10   0      10           0      10   0         10   0    10
  100 g/mL   75 g/mL   50 g/mL   25 g/mL            10 g/mL        5 g/mL    Unknown
  Quantitative Analysis                                                Calibration curve
                                                                                                              12
                                               Peak height ratio
Internal Standard Method
                                                                          Concentration
                                                                                         Internal
                                                                                         Standard
                                                                              Compound
  0     10    0     10   0     10   0         10                   0     10      0         10       0    10
  100 g/mL   75 g/mL   50 g/mL       25 g/mL                   10 g/mL          5 g/mL        Unknown
                             Accuracy                                      13
• It is defined as the concordance (agreement) of a measurement with the true
   or most probable value.
Determination of accuracy
• (i) Absolute method
  Accuracy is determined from the relative error;
• In example,1: Relative Error %= (-0.10/2.62) x 100% = -3.8 %
  accuracy = 100.0 – 3.8 = 96.2 %.
• Or simply                 O
              Accuracy = -------- x 100
                           T
In example,1: Accuracy = (2.52/2.62) x 100% = 96.2 %.
Example 3:
• In practical exam of volumetric analysis, three students get the
  following results:                                               14
                  Student no.    Average        Correct
                                volume (ml)   volume (ml)
                       1            10.3          10.0
                       2            14.7          15.0
                       3            5.3           5.0
   .Arrange the three students in order of their accuracy
   Answer      Mean error (student 1) = 10.3 – 10.0 = 0.3 ml
   • Relative error (student 1) = 0.3/10.0 = 0.03
   • Relative error, % (student 1) = 0.03 x 100 = 3.0 %
   • Accuracy, % (student 1) = 100.0 – 3.0 = 97.0 %
Student Average   Correct Mean      Relative   relative Accuracy, Order
  no.   volume    volume Error       mean      error,%     %
          (ml)     (ml)   (ml)       error
   1      10.3     10.0    0.3       0.03       3.0       97.0     2
  2      14.7      15.0    - 0.3     0.02       2.0       98.0     1
  3       5.3      5.0     0.3       0.06       6.0       94.0     3
• (ii) Comparative method
• By comparison between results obtained by the tester or
   suggested method and results obtained by an accurate
   method (reported or official).
• The agreement between at least two different methods
   indicates high accuracy.
                                                                    15
                               Precision
                                                                             16
• It is defined as the concordance of a series of measurements.
• or The agreement between several measurements of the same substance.
• It is usually expressed as: the standard deviation (S.D.) or relative standard
    deviation (R.S.D) of a series of measurements.
• Precision may be a measure of either the degree of reproducibility or
    repeatability of the analytical method.
• Reproducibility refers to the use of the analytical procedure in different
    laboratories, days, analysts or equipment.
•    Repeatability refers to the use of the analytical procedure within a
    laboratory over a short period of time using the same analyst with the
    same equipment.
• Accuracy expresses the correctness of a measurement
                                                                  17
  and precision expresses the reproducibility or the
  repeatability.   Reliability   means    both   accuracy   and
  precision.
• Precision always accompanies accuracy but a high degree
  of precision does not imply accuracy.
This is represented by this Figure.
• Example 4:                                                                18
• A substance was known to contain 49.10 + 0.02 % of a constituent A, the
   results obtained by two analysts using the same substance and the same
   analytical method were as follows:
• Analyst (1):
• % A ; 49.01; 49.25; 49.08; 49.14, The mean is 49.12%
• and the results range from 49.01% to 49.25%.
                            * * *                   *
• Analyst (2):
• % A ; 49.40; 49.44; 49.42; 49.42,The mean is 49.42%
                                                                 19
• and the results range from 49.40% to 49.44%.
                                                          ****
• we can see that:
• Analyst (1): results are accurate (very close to the correct
  result), but less precise than the results of analyst (2) (random
  error).
• Analyst(2): results are very precise but not accurate
  (systematic or constant error)
Determination of precision
                                                  20
The precision of an analytical method is
determined by assaying a sufficient number of
aliquots of a homogeneous sample to be able to
calculate statistically valid estimates of standard
deviation or relative standard deviation
(coefficient of variation).
  Mean (X):
  It is the arithmetic average of all measured values,
                                                                        21
  Where n is the number of measurements; x1, x2, x3 ….etc:
The spread of the values is measured by the standard deviation (S.D.)
 • Variance = The square of the standard deviation = S2
• Relative standard deviation (RSD)
Is the S.D expressed relative to the mean.                           22
•     Coefficient of variation (C.V.)
Is the RSD expressed as a percentage
•   The analytical method is considered precise if the RSD% equal to
    or below 1.5%.
•     The range (w) or “Spread’’: It is the difference between the
    highest measurement and the lowest one.
•     The median: It is the measurement in the middle of the arranged
    measurements where the numbers of higher and lower
    measurements are equal. if the number of measurements are odd
    7.1, 7.5, 7.6. 7.8, 7.9
                  Example 5:                     23
• Analysis of a sample of iron ore gave the
  following % values:
• 7.08, 7.21, 7.12, 7.09, 7.16,
  7.14, 7.07, 7.14, 7.18, 7.11.
• Calculate the mean, standard deviation, the
  variance and coefficient of variation;
• Find also the median and the range for these
  data.
                                                                            24
• Variance (S2) = 0.002
The arranged data are: 7.07, 7.08, 7.09,7.11, 7.12, 7,14, 7,14, 7.16, 7.18, 7.21
  • The median is : (7.12 + 7.14) / 2 = 7.13
  • The range is : 7.21 – 7.07 = 0.14
The mean of several readings will make a more reliable estimate of the true mean than is given by
one observation.
                                                                                            25
The greater the number of measurements (n), the closer to the true mean.
The standard error of the mean (Sx) is given by:
in the above example, and if 100 measurements were made
 Therefore, the precision is improved by increasing the number
 of measurements.
               Rejection of a result (The Q test)
 The Q test is used to determine if an “outlier” is due to a                              26
  determinate error or due to indeterminate error.
 A decision will have to be made whether to reject the result
  or to retain it. This can be done by “the Q test.
The ratio Q is calculated by
(1) Arranging the data in decreasing order of numbers.
(2) The difference (a) between the suspect number (the outlier)
    and its nearest
(3) Then is divided by the range (w), which is the difference
    between the highest number and the lowest number,
                                                                          a
                                                                     Q=   w
                                                                          w
                                                                 a
                                                             x       x        x   x    x
                                              Figure 2: Illustration of the calculation of Q.
• The ratio is compared with the tabulated values of Q (see the Table).
• If it is equal or greater than the tabulated value, the suspected observation
  can be rejected.                                                              27
• If it is smaller than the tabulated value, the suspected observation should be
  retained.
                 Rejection quotient, Q, at different confidence levels
Example 6:
• The following values were obtained for the determination Cd2+ in sample of
   dust: 4.3, 4.1, 4.0 and 3.2 μg/g.
• should the last value, 3.2 μg/g be rejected?
                                                                          28
By application of the Q test:
• Q from the table for a sample size of 4 = 0.829 > Q calculated (0.727)
• Hence the result 3.2 μg/g should be retained.
If, three additional measurements were made with the results: 4.2,
3.9, 4.0 μg/g Then:
Q from the table for a sample size of 7 = 0.568 < Q calculated
(0.636) so, rejection of the value is justified.
                        Correlation coefficient (r)
• To establish whether there is a linear relationship between two variables xi 29
   (concentration) and yi (instrumental signal), use Correlation coefficient
• The value of (r) must lie between +1 and –1.
• The nearer it is to ±1, the greater the probability that a definite linear
   relationship exits between the variables x, y.
• Values close to +1 indicate positive correlation
• and values close to –1 indicate negative correlation.
• Value of (r) that tend towards zero indicates that x and y are not linearly
   related (they may be related in a non-linear fashion).
            Linear regression and correlation
                                                                    30
• When using instrumental methods it is often necessary to carry
  out a calibration procedure by using a series of standards each
  having a known concentration of the analyte to be determined.
• A calibration curve is constructed by measuring the instrumental
  signal (e.g. absorbance, in spectrophotometry) for each standard
  and plotting this response against concentration.
• The concentration of the test sample may be determined from
  the calibration curve by graphical interpolation.
                          Linear regression
• The best straight line through the data points has to be estimated.   31
• This can be done by visual inspection of the calibration graph,
• But it is far more sensible to evaluate the best straight line by linear
   regression equation (least squares method).
• The equation of a straight line is:   y = a + bx
• where y, instrument response
• x, the analyte concentration.
• a, intercept
• b, slope
• The slope “b” and the intercept “a“ are given by following
 two equations:                                                           32
     n  x 1 y 1   x1  y 1
 b
       n  x1 2  (  x1 ) 2
Example 7:
Quinine was determined by measuring the fluorescence intensity in H 2SO4
solution. Standard solutions gave the following fluorescence values:
(a) Calculate by “the least squares method” the equation of the best straight
line for the calibration curve
(b) What is the concentration of quinine in a sample which produce 16.1
arbitrary units of fluorescence intensity?
(c) Calculate the correlation coefficient r.
The terms for the determination of “r”in the
                                                                        33
equation are found from the following table:
      x1          y1           x12           y12            x1 y1
     0.00        0.00         0.00          0.00           0.00
     0.10        5.20         0.01         27.04           0.52
     0.20        9.90         0.04         98.01           1.98
     0.30       15.30         0.09         234.09          4.59
     0.40       19.10         0.16         364.81          7.64
   x1 = 1.00  y1 = 49.5  x12 = 0.30  y12 = 723.95  x1 y1 = 14.73
 (a) From the above example the following values have been
    determined:                                                               34
• By substituting the values in the following equations: we have
     n  x1 y 1      x y                  5 14.73  1.00 49.5     24.15
b
                        1           1
                                        b                                  48.3
       n  x1   2
                     ( x )1
                                2
                                              5 0.30  1.002       0.5
            x1 1.00                                      y1 49.5
        x          0.2                             y          9.9
            n     5                                       n    5
             a 9.9  (48.3 0.2) 0.24
     • So the equation of the straight line is
                                                            35
     y 48.3 x  0.24
• From the equation the unknown conc. of quinine which
 produce 16.1 arbitrary units of fluorescence intensity =
         16.10 48.3x  0.24
             15.86             1
          x       0.32 8 gmL
             48.30
   x     1
                2
                    1.000             y           1
                                                              2
                                                                  2450.25                     36
 • Substituting the above values in the equation, we have
                        n x1 y1    x y1               1
        r
          {[n x1  ( x1 ) 2     ][ n y  ( y )
                 2                                2                    2
                                              1                   1        ]}1 / 2
                5 14.73  1.00 49.5                                 24.15
r                                                                                  0.9987
     5 0.30  1000 5 723.95    2450.25
                                                      1/ 2
                                                                      584.75
• Hence, r = 0.9987 there is a linear relation exits between
 fluorescence intensity and concentration.
• Example: Using the data for analysis of aspirin spectrophotometrically
 in the following table calculate the correlation coefficient, slope and
 intercept
      Concentration in µg     Xi      2      4      6          8    10
                                                                           37
         Absorbance           Yi     0.250 0.360   0.470 0.590 0.710
• Solution
                 X             X2           Y             Y2        XY
                 2              4          0.250     0.0625         0.5
                 4             16          0.360     0.1296        1.44
                 6             36          0.470     0.2209        2.82
                 8             64          0.590     0.3481        4.72
                 10            100         0.710     0.5041         7.1
              Σ x = 30      Σ x2 = 220     Σy=     Σ y2            Σxy =
                                           2.380   =1.265          16.58
.
                  5 X 16.58 – 30 X 2.38                                11.5
                                                                                            38
r = ---------------------------------------------------------- = ---------------------- =
       {[ 5 X 220 – (30)2][ 5 X 1.265 – (2.38)2]}1/2            {200 X 0.662}1/2
    = 0.9994
                 11.5
    b =        ---------------   = 0.0575
                 200
a     = 2.38/5 - 0.0575 X 30/5 = 0.476 - 0.345 = 0.131
                                 Sampling
                                                                            39
• Sampling is the process of extracting a small portion from a large quantity
   of material as truly representative of the composition of whole material.
Sampling methods are:
(1) Those in which all the material is examined
This is normally impracticable, expensive and time consuming
(2) Casual sampling
It is totally unscientific as it done only for a particular purpose only without
     planning)
(3) Methods in which portions of the material are selected based upon
   statistical probabilities
It’s the most reliable one
• Sampling procedure                  40
• Bulk materials are non-
 homogeneous, and not uniformly
  distributed
• A number of increments is taken
 randomly from a bulk material.
• The combination of these
 increments forms the gross sample
 which is too large to be analyzed.
• It must be divided to sub-
 samples
from which the analytical sample is
obtained.
             Validation of Analytical Procedures                          41
It is the process by which it is established by laboratory study to ensure that
the method meet the requirements for the intended analytical applications.
It is a systematic evaluation of the method.
            Analytical validation parameters
                                                                           42
1. SPECIFICITY and SELECTIVITY
 “Specificity” is the ability to determine the analyte completely clear and
   definite in the presence of components that may be expected to be present,
   such as impurities, degradation products, matrix components.
“Selectivity” is used when the method is not specific for just one analyte but
   it is selective for a group of analytes having common properties.
e.g: Barium chloride test for sulphate is specific.
 EDTA chelation method is not specific for calcium; it is just selective
    2. LINEARITY AND RANGE
                                                                                       43
• The linearity of an analytical method is: its ability to produce results that are directly
   or indirectly proportional to the concentration of analyte in samples within a given
   range.
• The range of an analytical method is: the interval between the upper and lower
   concentration levels of analyte to be determined with a suitable level of precision,
   accuracy, and linearity.
• The range is normally expressed in the same units as test results.
• For the establishment of linearity, a minimum of five concentrations normally be
   used.
• Test results should be established by appropriate statistical methods (e.g., by least
   squares method). The correlation coefficient, intercept, slope of the regression line
   should be obtained
3. DETECTION LIMIT: (LOD)
                                                                  44
• It is the lowest amount of analyte in a sample that can be
   detected, but not necessarily quantitated, under the stated
   experimental conditions.
4. QUANTITATION LIMIT: (LOQ)
• It is the lowest amount of analyte in a sample that can be
   determined (quantitated) with acceptable precision and accuracy
   under the stated experimental conditions.
• The quantitation limit is expressed as the concentration of analyte
   (e.g., percentage, parts per million) in the sample.
  5. ACCURACY:
                                                                            45
• It is the closeness of test results obtained by that method to the true value.
   Accuracy should be assessed using a minimum of 9 determinations over a
   minimum of 3 concentration levels covering the specified range (e.g. 3
   concentrations/3 replicates).
• It is reported as the percent recovery by the determination of
 the difference between the mean and the accepted true value.
  6. PRECISION:
• It is the degree of agreement among individual test results when the
   method is applied repeatedly to multiple samplings of a homogeneous
   sample, it’s expressed as SD and RSD
• It is usually expressed as the standard deviation or relative standard
  deviation of a series of measurements.                                46
• Precision may be a measure of either the degree of reproducibility or of
  repeatability of the analytical method.
• Reproducibility refers to: the use of the analytical procedure in
  different laboratories, as in a collaborative study.
• Intermediate precision expresses within-laboratory variation, as
  on different days (inter- day precision), or with different analysts or
  equipment within the same laboratory.
• Repeatability refers to the use of the analytical procedure within a
  laboratory over a short period of time (intra-day precision) using the
  same analyst with the same equipment.
7. Ruggedness:
                                                                             47
• It is the degree of reproducibility of test results obtained by the analysis of
   the same samples under a variety of conditions, such as different
   laboratories, different analysts, different instruments, different lots of
   reagents, different elapsed assay times, different assay temperatures,
   different days, etc.
8. Robustness:
• It is a measure of the method capacity to remain unaffected by small
   variations in method parameters.
• It provides an indication of its reliability during normal usage.
• For example:
stability of analytical solutions, and extraction time.
In LC: mobile phase composition, different columns, and temperature.
  9. System Suitability (S.S):
                                                                              48
• S.S tests are based on: the concept that the equipment, electronics,
   analytical operations, and samples to be analyzed constitute an integral
   system that can be evaluated as such.
• S.S depends on the type of method.
  10. Simplicity, Time cycle and Cost:
• Simplicity means that: the method can be carried out in a minimum
   number of steps, with no high skilled persons, and need easily available
   reagents and equipments.
• Time cycle is the time taken to complete the analysis.
   It, as well as the cost of the analysis, is important specially for industrial
  houses.