0% found this document useful (0 votes)
71 views49 pages

Statistics Summer Course

Statistics summer course

Uploaded by

Fatma Moustafa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views49 pages

Statistics Summer Course

Statistics summer course

Uploaded by

Fatma Moustafa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

AN INTRODUCTION TO STATISTICS

Errors
2
• Error may be defined as a deviation from the absolute value or from
the true average of a large number of results.
• Absolute error (E) is the difference between an observed or
measured value (0) and the true (most probable) value (T) , with
regard to the sign and it is reported in the same units as the
measurement. E=0–T
Example 1
• If a 2.62 g sample of material analyzed to be 2.52 g.
• The absolute error (E) = 2.52 – 2.62 = -0.10 g.
• If the measured value is the average of several
3
measurements, the error is called the mean error.
Example 2
In the titration of 10 ml of 0.1 N NaOH with laboratory prepared
0.1 N HCl, the true value is 9.9 ml, and we have: 10.1, 9.9, 9.8,
10.2, 10.1 observed values,
So, Mean of the observed values =
summation of observed values / their number
= (10.1 + 9.9 + 9.8 + 10.2 + 10.1) / 5 = 10.02 ml.
And mean error = 10.02 – 9.9 = 0.12 ml.
• OR the mean error can be also calculated by taking the average of
differences of the individual test results and the true value (with 4
regard to the sign)
• The average differences = Mean error

= (+0.2 + 0.0 - 0.1 + 0.3 + 0.2)/5 = + 0.6/5 = 0.12 ml.


• Relative Error It is the absolute or mean error (E) expressed as a
percentage (%) of the true value (T)
• In example,1:

Relative Error % = (-0.10/2.62) x 100% = - 3.8 %

• In example,2:

Relative Error % = (0.12/9.9) x 100% = 1.21%


Types of Errors
(A) Determinate or systemic (constant) errors:
5
can be determined and avoided. Some sources of these errors are:
1- Errors due to equipments:
Non-calibrated measuring devices (e.g. flasks, weight balances etc.)
2- Errors due to impure reagent:
6
By using chemical reagents or solvents containing impurities.
3- Personal errors:
Inaccurate analyst (e.g. observation of colour changes).
4- Operational errors:
Such as loss of material during evaporation
or adsorption of moisture during experiments.
5- Errors of the method:
• Solubility of the precipitate in the mother liquor.
• Decomposition during ignition.
• Side reactions or incomplete reaction.
(B) Indeterminate (random, accidental or chance) errors:
These errors occur by accident or due to uncontrolled causes.
• They cannot be determined or corrected because of the natural fluctuations that occur in all measurements.
7
• These errors follow a random distribution, and mathematical laws of probability

• These errors should follow “normal distribution curve”, or “Gaussian curve” .

• It is apparent that very large errors

are few and there should be an equal

number of positive and negative errors.


Minimization of determinate errors
8
1- Calibration of apparatus:
2- Running a blank determination:
By carrying out a separate determination, in which the sample
being omitted, under exactly the same experimental conditions.
3- Running of parallel determination:
Duplicate or triplicate determinations.
4- Running a control determination:
By carrying out a determination identical to the unknown using standard
substance.

Results found for standard weight of constituent in the standard


-------------------------------------- = ---------------------------------------------------------
Result found for unknown (weight of constituent in the unknown)
5- Use of independent method of analysis:
E.g.1: Determination of HCl by 9
titration with standard sodium carbonate (acid-base titrimetry)

precipitation with AgNO3 and weighing (gravimetry).

See other examples in your notes.

6- Standard addition method: (to solve matrix effects)


• Adding a known amount of standard being determined to the sample, then analyzing
for the total content.
• The difference between results of samples with and without the added standard
amount gives the recovery of the added substance.
• If the recovery is satisfactory our confidence in the accuracy of the procedure is
enhanced. If not, so there is an interference from other excipients or combined
drugs.
7 – Internal standard (I.S.) method:
• Important in chromatographic determinations (e.g. HPLC). 10

• It involves adding a fixed amount of the I.S. to a series of known


concentrations of the material to be tested.
• The ratios of the peak area or peak height of the I.S. and the series of known
concentrations are plotted against the concentration values. This should give a
straight line.
Quantitative Analysis Calibration curve
11

External Standard Method

Peak height
Concentration

0 10 0 10 0 10 0 10 0 10 0 10 0 10

100 g/mL 75 g/mL 50 g/mL 25 g/mL 10 g/mL 5 g/mL Unknown


Quantitative Analysis Calibration curve
12

Peak height ratio


Internal Standard Method

Concentration

Internal
Standard

Compound

0 10 0 10 0 10 0 10 0 10 0 10 0 10

100 g/mL 75 g/mL 50 g/mL 25 g/mL 10 g/mL 5 g/mL Unknown


Accuracy 13
• It is defined as the concordance (agreement) of a measurement with the true
or most probable value.
Determination of accuracy
• (i) Absolute method

Accuracy is determined from the relative error;


• In example,1: Relative Error %= (-0.10/2.62) x 100% = -3.8 %

accuracy = 100.0 – 3.8 = 96.2 %.


• Or simply O
Accuracy = -------- x 100
T

In example,1: Accuracy = (2.52/2.62) x 100% = 96.2 %.


Example 3:
• In practical exam of volumetric analysis, three students get the
following results: 14

Student no. Average Correct


volume (ml) volume (ml)

1 10.3 10.0

2 14.7 15.0

3 5.3 5.0

.Arrange the three students in order of their accuracy


Answer  Mean error (student 1) = 10.3 – 10.0 = 0.3 ml
• Relative error (student 1) = 0.3/10.0 = 0.03
• Relative error, % (student 1) = 0.03 x 100 = 3.0 %
• Accuracy, % (student 1) = 100.0 – 3.0 = 97.0 %
Student Average Correct Mean Relative relative Accuracy, Order
no. volume volume Error mean error,% %
(ml) (ml) (ml) error
1 10.3 10.0 0.3 0.03 3.0 97.0 2
2 14.7 15.0 - 0.3 0.02 2.0 98.0 1
3 5.3 5.0 0.3 0.06 6.0 94.0 3

• (ii) Comparative method


• By comparison between results obtained by the tester or
suggested method and results obtained by an accurate
method (reported or official).
• The agreement between at least two different methods
indicates high accuracy.
15
Precision
16
• It is defined as the concordance of a series of measurements.

• or The agreement between several measurements of the same substance.

• It is usually expressed as: the standard deviation (S.D.) or relative standard


deviation (R.S.D) of a series of measurements.
• Precision may be a measure of either the degree of reproducibility or
repeatability of the analytical method.
• Reproducibility refers to the use of the analytical procedure in different
laboratories, days, analysts or equipment.
• Repeatability refers to the use of the analytical procedure within a
laboratory over a short period of time using the same analyst with the
same equipment.
• Accuracy expresses the correctness of a measurement
17
and precision expresses the reproducibility or the
repeatability. Reliability means both accuracy and
precision.
• Precision always accompanies accuracy but a high degree
of precision does not imply accuracy.
This is represented by this Figure.
• Example 4: 18
• A substance was known to contain 49.10 + 0.02 % of a constituent A, the
results obtained by two analysts using the same substance and the same
analytical method were as follows:
• Analyst (1):
• % A ; 49.01; 49.25; 49.08; 49.14, The mean is 49.12%
• and the results range from 49.01% to 49.25%.

* * * *
• Analyst (2):
• % A ; 49.40; 49.44; 49.42; 49.42,The mean is 49.42%
19
• and the results range from 49.40% to 49.44%.

****
• we can see that:
• Analyst (1): results are accurate (very close to the correct
result), but less precise than the results of analyst (2) (random
error).
• Analyst(2): results are very precise but not accurate
(systematic or constant error)
Determination of precision
20

The precision of an analytical method is


determined by assaying a sufficient number of
aliquots of a homogeneous sample to be able to
calculate statistically valid estimates of standard
deviation or relative standard deviation
(coefficient of variation).
Mean (X):
It is the arithmetic average of all measured values,
21
Where n is the number of measurements; x1, x2, x3 ….etc:

The spread of the values is measured by the standard deviation (S.D.)

• Variance = The square of the standard deviation = S2


• Relative standard deviation (RSD)
Is the S.D expressed relative to the mean. 22

• Coefficient of variation (C.V.)


Is the RSD expressed as a percentage
• The analytical method is considered precise if the RSD% equal to
or below 1.5%.
• The range (w) or “Spread’’: It is the difference between the
highest measurement and the lowest one.
• The median: It is the measurement in the middle of the arranged
measurements where the numbers of higher and lower
measurements are equal. if the number of measurements are odd
7.1, 7.5, 7.6. 7.8, 7.9
Example 5: 23

• Analysis of a sample of iron ore gave the


following % values:
• 7.08, 7.21, 7.12, 7.09, 7.16,
7.14, 7.07, 7.14, 7.18, 7.11.
• Calculate the mean, standard deviation, the
variance and coefficient of variation;
• Find also the median and the range for these
data.
24

• Variance (S2) = 0.002


The arranged data are: 7.07, 7.08, 7.09,7.11, 7.12, 7,14, 7,14, 7.16, 7.18, 7.21

• The median is : (7.12 + 7.14) / 2 = 7.13


• The range is : 7.21 – 7.07 = 0.14
The mean of several readings will make a more reliable estimate of the true mean than is given by
one observation.
25
The greater the number of measurements (n), the closer to the true mean.

The standard error of the mean (Sx) is given by:

in the above example, and if 100 measurements were made

Therefore, the precision is improved by increasing the number


of measurements.
Rejection of a result (The Q test)
 The Q test is used to determine if an “outlier” is due to a 26
determinate error or due to indeterminate error.
 A decision will have to be made whether to reject the result
or to retain it. This can be done by “the Q test.
The ratio Q is calculated by
(1) Arranging the data in decreasing order of numbers.
(2) The difference (a) between the suspect number (the outlier)
and its nearest
(3) Then is divided by the range (w), which is the difference
between the highest number and the lowest number,
a
Q= w
w
a
x x x x x

Figure 2: Illustration of the calculation of Q.


• The ratio is compared with the tabulated values of Q (see the Table).
• If it is equal or greater than the tabulated value, the suspected observation
can be rejected. 27
• If it is smaller than the tabulated value, the suspected observation should be
retained.

Rejection quotient, Q, at different confidence levels


Example 6:
• The following values were obtained for the determination Cd2+ in sample of
dust: 4.3, 4.1, 4.0 and 3.2 μg/g.
• should the last value, 3.2 μg/g be rejected?
28
By application of the Q test:

• Q from the table for a sample size of 4 = 0.829 > Q calculated (0.727)

• Hence the result 3.2 μg/g should be retained.


If, three additional measurements were made with the results: 4.2,
3.9, 4.0 μg/g Then:

Q from the table for a sample size of 7 = 0.568 < Q calculated


(0.636) so, rejection of the value is justified.
Correlation coefficient (r)
• To establish whether there is a linear relationship between two variables xi 29

(concentration) and yi (instrumental signal), use Correlation coefficient

• The value of (r) must lie between +1 and –1.


• The nearer it is to ±1, the greater the probability that a definite linear
relationship exits between the variables x, y.
• Values close to +1 indicate positive correlation
• and values close to –1 indicate negative correlation.
• Value of (r) that tend towards zero indicates that x and y are not linearly
related (they may be related in a non-linear fashion).
Linear regression and correlation
30
• When using instrumental methods it is often necessary to carry
out a calibration procedure by using a series of standards each
having a known concentration of the analyte to be determined.
• A calibration curve is constructed by measuring the instrumental
signal (e.g. absorbance, in spectrophotometry) for each standard
and plotting this response against concentration.
• The concentration of the test sample may be determined from
the calibration curve by graphical interpolation.
Linear regression
• The best straight line through the data points has to be estimated. 31
• This can be done by visual inspection of the calibration graph,
• But it is far more sensible to evaluate the best straight line by linear
regression equation (least squares method).

• The equation of a straight line is: y = a + bx


• where y, instrument response
• x, the analyte concentration.
• a, intercept
• b, slope
• The slope “b” and the intercept “a“ are given by following
two equations: 32
n  x 1 y 1   x1  y 1
b
n  x1 2  (  x1 ) 2
Example 7:
Quinine was determined by measuring the fluorescence intensity in H 2SO4
solution. Standard solutions gave the following fluorescence values:

(a) Calculate by “the least squares method” the equation of the best straight
line for the calibration curve
(b) What is the concentration of quinine in a sample which produce 16.1
arbitrary units of fluorescence intensity?
(c) Calculate the correlation coefficient r.
The terms for the determination of “r”in the
33
equation are found from the following table:

x1 y1 x12 y12 x1 y1
0.00 0.00 0.00 0.00 0.00
0.10 5.20 0.01 27.04 0.52
0.20 9.90 0.04 98.01 1.98
0.30 15.30 0.09 234.09 4.59
0.40 19.10 0.16 364.81 7.64
 x1 = 1.00  y1 = 49.5  x12 = 0.30  y12 = 723.95  x1 y1 = 14.73
(a) From the above example the following values have been
determined: 34

• By substituting the values in the following equations: we have

n  x1 y 1  x y 5 14.73  1.00 49.5 24.15


b
1 1
b  48.3
n  x1 2
 ( x )1
2
5 0.30  1.002 0.5

 x1 1.00  y1 49.5
x  0.2 y  9.9
n 5 n 5
a 9.9  (48.3 0.2) 0.24
• So the equation of the straight line is
35

y 48.3 x  0.24
• From the equation the unknown conc. of quinine which
produce 16.1 arbitrary units of fluorescence intensity =

16.10 48.3x  0.24


15.86 1
x 0.32 8 gmL
48.30
 x  1
2
1.000  y  1
2
2450.25 36

• Substituting the above values in the equation, we have


n x1 y1  x y1 1
r
{[n x1  ( x1 ) 2 ][ n y  ( y )
2 2 2
1 1 ]}1 / 2

5 14.73  1.00 49.5 24.15


r  0.9987
5 0.30  1000 5 723.95  2450.25
1/ 2
584.75

• Hence, r = 0.9987 there is a linear relation exits between

fluorescence intensity and concentration.


• Example: Using the data for analysis of aspirin spectrophotometrically
in the following table calculate the correlation coefficient, slope and
intercept
Concentration in µg Xi 2 4 6 8 10
37
Absorbance Yi 0.250 0.360 0.470 0.590 0.710

• Solution
X X2 Y Y2 XY
2 4 0.250 0.0625 0.5
4 16 0.360 0.1296 1.44
6 36 0.470 0.2209 2.82
8 64 0.590 0.3481 4.72
10 100 0.710 0.5041 7.1
Σ x = 30 Σ x2 = 220 Σy= Σ y2 Σxy =
2.380 =1.265 16.58
.

5 X 16.58 – 30 X 2.38 11.5


38
r = ---------------------------------------------------------- = ---------------------- =
{[ 5 X 220 – (30)2][ 5 X 1.265 – (2.38)2]}1/2 {200 X 0.662}1/2

= 0.9994
11.5
b = --------------- = 0.0575
200
a = 2.38/5 - 0.0575 X 30/5 = 0.476 - 0.345 = 0.131
Sampling
39
• Sampling is the process of extracting a small portion from a large quantity
of material as truly representative of the composition of whole material.
Sampling methods are:
(1) Those in which all the material is examined
This is normally impracticable, expensive and time consuming
(2) Casual sampling
It is totally unscientific as it done only for a particular purpose only without
planning)
(3) Methods in which portions of the material are selected based upon
statistical probabilities
It’s the most reliable one
• Sampling procedure 40
• Bulk materials are non-
homogeneous, and not uniformly
distributed
• A number of increments is taken
randomly from a bulk material.

• The combination of these


increments forms the gross sample
which is too large to be analyzed.

• It must be divided to sub-


samples
from which the analytical sample is
obtained.
Validation of Analytical Procedures 41
It is the process by which it is established by laboratory study to ensure that
the method meet the requirements for the intended analytical applications.
It is a systematic evaluation of the method.
Analytical validation parameters
42

1. SPECIFICITY and SELECTIVITY


“Specificity” is the ability to determine the analyte completely clear and
definite in the presence of components that may be expected to be present,
such as impurities, degradation products, matrix components.
“Selectivity” is used when the method is not specific for just one analyte but
it is selective for a group of analytes having common properties.
e.g: Barium chloride test for sulphate is specific.
EDTA chelation method is not specific for calcium; it is just selective
2. LINEARITY AND RANGE
43
• The linearity of an analytical method is: its ability to produce results that are directly
or indirectly proportional to the concentration of analyte in samples within a given
range.
• The range of an analytical method is: the interval between the upper and lower
concentration levels of analyte to be determined with a suitable level of precision,
accuracy, and linearity.
• The range is normally expressed in the same units as test results.
• For the establishment of linearity, a minimum of five concentrations normally be
used.
• Test results should be established by appropriate statistical methods (e.g., by least
squares method). The correlation coefficient, intercept, slope of the regression line
should be obtained
3. DETECTION LIMIT: (LOD)
44
• It is the lowest amount of analyte in a sample that can be
detected, but not necessarily quantitated, under the stated
experimental conditions.

4. QUANTITATION LIMIT: (LOQ)


• It is the lowest amount of analyte in a sample that can be
determined (quantitated) with acceptable precision and accuracy
under the stated experimental conditions.
• The quantitation limit is expressed as the concentration of analyte
(e.g., percentage, parts per million) in the sample.
5. ACCURACY:
45
• It is the closeness of test results obtained by that method to the true value.
Accuracy should be assessed using a minimum of 9 determinations over a
minimum of 3 concentration levels covering the specified range (e.g. 3
concentrations/3 replicates).
• It is reported as the percent recovery by the determination of

the difference between the mean and the accepted true value.

6. PRECISION:
• It is the degree of agreement among individual test results when the
method is applied repeatedly to multiple samplings of a homogeneous
sample, it’s expressed as SD and RSD
• It is usually expressed as the standard deviation or relative standard

deviation of a series of measurements. 46

• Precision may be a measure of either the degree of reproducibility or of


repeatability of the analytical method.
• Reproducibility refers to: the use of the analytical procedure in
different laboratories, as in a collaborative study.
• Intermediate precision expresses within-laboratory variation, as

on different days (inter- day precision), or with different analysts or


equipment within the same laboratory.
• Repeatability refers to the use of the analytical procedure within a
laboratory over a short period of time (intra-day precision) using the
same analyst with the same equipment.
7. Ruggedness:
47
• It is the degree of reproducibility of test results obtained by the analysis of
the same samples under a variety of conditions, such as different
laboratories, different analysts, different instruments, different lots of
reagents, different elapsed assay times, different assay temperatures,
different days, etc.
8. Robustness:
• It is a measure of the method capacity to remain unaffected by small
variations in method parameters.
• It provides an indication of its reliability during normal usage.
• For example:

stability of analytical solutions, and extraction time.


In LC: mobile phase composition, different columns, and temperature.
9. System Suitability (S.S):
48
• S.S tests are based on: the concept that the equipment, electronics,
analytical operations, and samples to be analyzed constitute an integral
system that can be evaluated as such.
• S.S depends on the type of method.

10. Simplicity, Time cycle and Cost:


• Simplicity means that: the method can be carried out in a minimum
number of steps, with no high skilled persons, and need easily available
reagents and equipments.
• Time cycle is the time taken to complete the analysis.

It, as well as the cost of the analysis, is important specially for industrial
houses.

You might also like