Biometry
Biometry
INTRODUCTION
In biology, including agricultural sciences, “the laws of nature” are not that simple.
Biological phenomena often show variation that obscures the law we want to establish.
For instance, if we treat two fields in identical way, the yields obtained from the same
variety of a crop will not be the same. Similarly, two cows of the same age, breed and
body weight fed with the same type and amount of fodder will produce different amount
of milk. Thus, variation is a typical feature of biological data.
The extent of statistics makes it difficult to define. It was developed to deal with
problems in which, for the individual observations, laws of cause and effect are not
apparent to the observer and where an objective approach is needed. In such problems,
there must always be some uncertainty about any inference based on a limited number of
observations.
The word statistics also refers to numerical and quantitative data such as statistics of
births, deaths, marriage, production, yield, etc. The application of statistical methods to
the solution of biological problems is biometry, biological statistics or bio-statistics. The
word biometry comes from two Greek roots: 'bios' mean life and metron mean 'to
measure'. Thus, biometry literally means the measurement of life.
A history of statistics throws considerable light on the nature of twenty century statistics.
Historical perspective is also important in pointing to the needs and pressures, which
created it.
The term statistics is an old one. Statistics must have started as a state arithmetic
technique to assist a ruler who needed to know the wealth and number of his subjects in
order to levy a tax or wage a war. Presumably all cultures that intentionally record history
also recorded statistics. We know that Caesar Augustus sent out a decree that the entire
world should be taxed. Consequently, he required that all persons report to the nearest
statistician-in that day the tax collector. One result of this was that Jesus was born in
Bethlehem rather than Nazareth. William, the conqueror ordered a survey of the lands of
England for purposes of taxation and military service. This was called the Domesday
Book.
The normal curve or normal curve of error has been very important in the development of
statistics. The equation of this curve was first published in 1733 by de Moivre. De
Moivre had no idea of applying his result to experimental observations and his paper
remained unknown until Karl Pearson found it in a library in 1924. However, the same
result was later developed by two mathematical astronomers, Laplace (1749-1827) and
Gauss (1777-1855) independently of one another.
Charles Darwin (1809-1882), a biologist, received the second volume of Lyell’s book
while on the Beagle. Darwin formed his theories later and he may have been stimulated
by his reading of this book. Darwin’s work was largely biometrical or statistical in nature
and he certainly renewed enthusiasm in biology. Gregor Mendel (1822-1884) too, with
his studies of plant hybrids published in 1866, had a biometrical or statistical problem.
In the nineteenth century, the need for a sounder basis for statistics became apparent. Karl
Pearson (1857-1936) initially a mathematical physicist applied his mathematics to
evolution as a result of the enthusiasm in biology created by Darwin. Pearson spent
nearly half a century in serious statistical research. In addition, he founded the journal
Biometrica and a school of statistics as result the study of statistics gained impetus.
While Pearson was concerned with large samples, large-sample theory was proved to be
somewhat inadequate for experimenters with necessarily small samples. Among these
was W. S. Gosset (1876-1937), a student of Karl Pearson and a scientist of the Guinness
firm of brewers. Gosset’s mathematics appears to have been insufficient to the task of
finding exact distributions of the sample standard deviation, of the ratio of the sample
mean to the sample standard deviation, and of the correlation coefficient, statistics with
which he was particularly concerned. Consequently, he resorted to drawing shuffled
cards, computing and compiling empirical frequency distributions. Papers on the results
appeared in Biometrika in 1908 under the name of student, Gosset’s pseudonym. Today
Student’s t is a basic tool of statisticians and experimenters. Now that the use of Student’s
t distribution is so widespread, it is interesting to note that the German astronomer,
Helmert, had obtained it mathematically as early as 1875.
R.A. Fisher (1890-1962) was influenced by Karl Pearson and Student and made
numerous and important contributions to statistics. He and his students gave considerable
impetus to the use of statistical procedures in many fields, particularly in agriculture,
biology and genetics.
In this brief history, Abrahm Wald (1902-1950) has contributed two books on Sequential
Analysis and Statistical Decision Functions. Thus, it is in the 20th century that most of the
statistical methods presently used have been developed.
Currently, statistics is used as an analytical tool in many fields of research.
Variable: A property with respect to which individuals in a sample differ in some way,
e.g. length, weight, height, color, etc. Characteristics, which show variations, are called
random variables.
The basic method of collecting the observations in a sample is called simple random
sampling. This is where any observation has the same probability of being collected, e.g.
giving each student in a class equal chance in measuring height. The aim is always to
sample in a manner that does not create a bias in favour of any observation being
selected. Nearly all applied statistical procedures that are concerned with using samples
to make inferences (i.e. draw conclusions) about populations assume some form of
random sampling. If the sampling is not random, then we are never sure as to what
population is represented by our sample. When random sampling from clearly defined
populations is not possible, then interpretation of standard methods of estimation
becomes more difficult.
Populations must be defined at the start of any study and this definition should include the
spatial and temporal limits to the population. Our formal statistical inference is restricted
to these limits. For example, if we sample from a population of animals at a certain
location in December 2010, then our inference is restricted to that location in December
2010. We cannot infer what the population might be like at any other time or in any other
place, although we can speculate or make predictions.
Parameter: A population value, which we generally do not know, but would like to infer
(estimate) about. For example, national average yield of maize in 2010 in Ethiopia.
Parameters are designated using Greek letters such as µ, ∑, π, etc.
Statistics: Are sample estimates of population value (parameters) and designated using
−
Latin letters such as x , s, p, etc.
The population parameters cannot be measured directly because the populations are
usually too large, i.e. they contain too many observations for practical measurement. It is
important to remember that population parameters are usually considered to be fixed, but
unknown, values so they are not random variables and do not have probability
distributions. Sample statistics are random variables, because their values depend on the
outcome of the sampling experiment, and therefore they do have probability distributions,
called sampling distributions.
1.3 Types of Statistics
1. Descriptive (deductive) statistics: Are methods, which are used to describe a set of
data without involving generalization. Deal with the presentation of research data or
any numerical information. Help in summarizing and organizing data so as to make
them readable for users, e.g. mean, median, mode, standard deviation, etc.
6.1 Introduction
The procedure for research is generally known as the scientific method, which, although
difficult to define precisely, it usually involves, the following steps:
a. Formulation of hypothesis: a tentative explanation or solution
b. Planning an experiment to objectively test the hypothesis
c. Careful observation and collection of the data
d. Analysis and interpretation of the experimental results.
The term refers to five interrelated activities required in the investigation. These are:
a. Formulating statistical hypothesis and making plans for laying out, collection and
analysis of data
b. Stating the decision rules to be followed in testing statistical hypothesis
c. Collecting data according to plan
d. Analyzing data according to plan
e. Making decisions based on decision rules
In the study of the effect of different rations on milk production, ration is a treatment and
animal is the experimental unit; while in the study of different fertilizer rates on yield of
maize, the fertilizer rates are treatment and plot of land is experimental unit.
Therefore, every possible effort should be made to reduce the experimental error.
Usually three replications are taken as the minimum number for standard experiments
Two procedures for calculating the number of replications are described below:
N = (CV/SEM)2
Procedure 2: If CV & SEM are not known, then the number of replications can be
arrived at using the principle that the precision of treatment comparisons increases if the
experimental error is kept to minimum. The experimental error can be kept to the
minimum by providing more degrees of freedom for the experimental error. In other
words, a lower number of degrees of freedom for experimental error results in enlarged
experimental error. Based on this principle, the number of degrees of freedom for error
should not be less than 15 (not less than 10 in any case).
When ‘t’ treatments are replicated ‘r’ times, the error is based on (t-1) (r-1) degrees of
freedom in Randomized Complete Block Design (RCBD) and t(r-1) in Completely
Randomized Design (CRD), which should not be less than 15.
No. of treatments: 2 3 4 5 6
Minimum no. of replications (CRD): 9 6 5 4 4
Minimum no. of replications (RCBD): 16 9 6 5 4
Increasing either number of replications or plot size can improve precision, but the
improvement achieved by doubling plot size is almost always less than the improvement
achieved by doubling replications.
Randomization
Assigning the treatments to the experimental units in such away that any unit has equal
chance to receive any treatment, i.e. every treatment should have an equal chance of
being assigned to any experimental units. Thus, a particular treatment should not be
consistently favored or disfavored.
Purposes of randomization
a) To eliminate bias: randomization ensures that no treatment is favored or
discriminated against the systematic assignment to units in a design
b) To ensure independence among the observations. This is necessary to provide valid
significance tests.
Confounding
It occurs when the differences due to experimental treatments, i.e. the contrast specified
in your hypothesis, cannot be separated from other factors that might be causing the
observed differences. Example, if you wished to test the effect of a particular hormone on
some behavioral response of sheep. You create two groups of sheep, males and females,
and inject the hormone into the male sheep and leave the females as the control group.
Even if other aspects of the design are ok, differences between the means of the two
groups cannot be definitely attributed to effects of the hormone alone. The two groups are
also different in gender and this may also be, at least partly, determining the behavioral
responses of the sheep. In this example, the effects of hormone are confounded with the
effects of gender.
Controls
A control is a part of the experiment that is not affected by the factor or factors studied,
but otherwise encounter exactly the same circumstances as the experimental units treated
with the investigated factor(s). For example, when investigating the effect of spraying
micronutrients, the crop being in the control should also be sprayed with the same
amount of water except the micro nutrients. The reason for this way to work is to make
sure that it is only the effect of the substance of interest that is investigated. Otherwise, it
may not be possible to draw conclusions of the reason(s) to the outcome of the
experiment.
Local Control
The principle of local control is another important principle of experimental designs.
Under it the extraneous factor, the known source of variability, is made to vary
deliberately over as wide a range as necessary and this needs to be done in such a way
that the variability it causes can be measured and hence eliminated from the experimental
error. In other words, according to the principle of local control, we first divide the field
into several homogeneous parts, known as blocks, and then each such block is divided
into parts equal to the number of treatments. Then the treatments are randomly assigned
to these parts of a block. Dividing the field into several homogenous parts is known as
‘blocking’. In general, blocks are the levels at which we hold an extraneous factor fixed,
so that we can measure its contribution to the total variability of the data by means of
analysis of variance. In brief, through the principle of local control we can eliminate the
variability due to extraneous factor(s) from the experimental error.
Analysis of variance is used in all fields of research where data are quantitatively
measured and it is used: to estimate and test about population variance and to estimate
and test about population means
Additive effect
Note that the effect of treatments is constant over replications and the effect of
replications is constant over treatments.
Multiplicative effect
Replication Replication Log10 Treatment
effect effect
Treatment I II (II-I) I II
A 10 20 10 1.00 1.30 0.30
B 30 60 30 1.48 1.78 0.30
Treatment effect 20 40 0.48 0.48
(B-A)
Here, the treatment effect is not constant over replications and the effect of replications is
not constant over the treatments.
The multiplicative effects are often encountered in experiments designed to evaluate the
incidence of diseases and insects. This happens because the changes in insect and disease
incidence usually follow a pattern that is in multiple of the initial incidence. When effects
are multiplicative, the logarithmic transformations of the data show the additive effect.
Thus, in such cases conduct the analysis and mean separation using the transformed data,
and in tables present transformed means in parenthesis alongside their back transformed
values out of parenthesis.
Heterogeneity of variances occurs usually when some treatments have errors that are
exceptionally higher or lower than others.
Data such as number of infested plants per plot usually follow poisson distribution and
data such as percent survival of insects or percent plants infected with a disease assume
the binomial distribution.
Treatment effect = Individual treatment mean - Grand mean; e.g. treatment A effect =
6.42-5.965 = 0.455, so on.
Block Effect = Individual block mean - Grand mean; e.g. Block 2 effect = 5.6-5.965 = -
0.365, etc.
Estimated Y value = Grand mean + treatment effect + block effect, e.g. estimated value
for block 1 & treatment 1 = 5.965 + 0.455 + 0.935 = 7.355. Similarly calculate for the
other treatment combinations.
Table of residuals
____________________________________________________________________
Block
Drug 1 2 3 4 5
_____________________________________________________________________
A -0.255 0.045 0.270 -0.080 0.020
B 0.045 -0.255 -0.030 0.120 0.120
C 0.105 0.105 -0.070 -0.320 0.180
D 0.105 0.105 -0.170 0.280 -0.320
_____________________________________________________________________
Plot the residuals from the lowest to the highest with its corresponding frequency and see
its distribution if it is normally distributed or not. If it is not normally distributed, data
transformation may be needed.
In many cases, moderate departures from normality do not have serious effect on the
validity of the results. Most of experimental data in Agricultural Research satisfy the
above assumptions. However, if the data violate the basic assumptions the following
measures can be taken: consider deleting an outlier, transform the data using appropriate
transformation method and use of an appropriate non-parametric method of analysis
6.5 Fixed and random effects models
In general, there are two types of models in ANOVA: Fixed effects models & Random
Effects Models. If the experiment were to be repeated the same treatments would be
included and the goal is to estimate the treatment means and the mean difference. In such
a situation the model is called a fixed effect model. In other situations, the treatments in a
particular experiment may be a random sample from a large population of similar
treatments. The goal here is to estimate the variation among the treatment means and we
are not interested in the means themselves. If the experiment were to be repeated a
different sample of treatments would be included. In this situation the model is called a
random effect model.
Examples:
− Fixed: A scientist develops three new fungicides. His interest is in these fungicides
only. Random: A scientist is interested in the way a fungicide works. He selects at
random three fungicides from a group of similar fungicides to study the action.
− Fixed: Measure the rate of production of five particular machines. Random: Choose
five machines to represent machines as a class.
− Fixed: Conduct an experiment to obtain information about four specific soil types.
Random: Select at random four soil types to represent all soil types.
Random effects models are more common in sample surveys while in designed
experiments, the treatment effects are fixed.
In CRD, the treatments are assigned completely at random over the whole experimental
area so that each experimental unit has the same chance of receiving any one treatment.
In CRD, any difference among the experimental units (plots) receiving the same
treatment is considered as experimental error.
Uses:
1. It is useful when the experimental units (plots) are essentially homogeneous and
where environmental effects are relatively easy to control, e.g. laboratory and
greenhouse experiments. For field experiments where there is generally larger
variation among experimental plots like in soil fertility, slope, etc. the CRD is
rarely used.
2. It is useful if we suspect that large fraction of the units may not respond or may be
lost during the experiment because it is easy to handle missing data in Analysis of
Variance unlike in other designs.
3. It is useful for experiments in which the total number of experimental units is
limited, because it provides maximum degrees of freedom for error
Advantages
1. It is flexible in that the number of treatments and replications can vary, i.e. the
number of replications need not be the same from one treatment to another
2. The statistical analysis is simple even with unequal replications and it is not
complicated by loss of data or missing observations
3. Loss of information due to missing data is small as compared to other designs
4. The design provides the maximum degree of freedom for estimating the
experimental error. This improves the precision of the experiment and is
important with small experiments where degrees of freedom for experimental
error are less than 20.
Disadvantage:
The main objection to the CRD is that it is often inefficient as there is no way of
controlling the experimental error. Since randomization is unrestricted the experimental
error includes the entire variation over the experimental units except that due to
treatment.
In this design, treatments are assigned to the experimental units completely at random.
Assume that we want to do a pot-experiment on the effect of inoculation of 6-strains of
rhizobia on nodulation of common bean using five replications.
Randomization can be done by using either lottery method or table of random numbers.
A. Lottery Method
1. Arrange 30 pots of equal size filled with the same type of soil and assign numbers from
1 to 30 in convenient order.
2. Obtain 30 identical slips of paper, label 5 of them with treatment A, 5 of them with
treatment B, with C, with D, with E and with F (6 treatments × 5 replications). Place
the slips in box or hat, mix thoroughly and pick a piece of paper at random, the
treatment labeled on this paper is assigned to unit 1 (pot 1), without returning the first
slip to box, select another slip and the treatment named on this slip is assigned to unit
2(pot 2) and continue this way until all 30 slips of paper have been drawn.
B. Use of Table of Random Numbers
1. Arrange 30 pots of equal size filled with the same type of soil and assign numbers
from 1 to 30 in convenient order.
2. Locate starting point in table of random numbers by closing your eyes and pointing a
finger to any position of random number
3. Moving up to down or right to left from the staring, record the first 30 three digit
random numbers in sequence (avoid ties). Rank the random numbers from the smallest
(1) to the largest (30). The ranks will represent the pot numbers and assign treatment A
to the first five pot numbers, B to the next five pot numbers, etc.
Raw data of nitrogen content of common bean inoculated with 6 rhizobium strains.
Treatments are designated with letters and nitrogen content (mg) in parenthesis.
1A 2E 3C 4B 5A 6D
(19.4) (14.3) (17.0) (17.7) (32.6) (20.7)
12 B 11C 10A 9E 8D 7E
(24.8) (19.4) (27) (11.8) (21.0) (14.4)
13 F 14A 15F 16D 17B 18.C
(17.3) (32.1) (19.4) (20.5) (27.9) (9.1)
24D 23F 22B 21E 20C 19D
(18.6) (19.1) (25.2) (11.6) (11.9) (18.8)
25C 26E 27B 28F 29A 30F
(15.8) (14.2) (24.3) (16.9) (33.0) (20.8)
2. Arrange the data by treatments and calculate the treatment totals (Ti) and Grand total
(G)
3. Using Yij = jth observation on the ith treatment; Ti= Treatment total; n = (r x t), the total
number of experimental unit (pots), calculate the correction factor and the various sum of
squares
G2 (596.6) 2
− C.F. = = = 11864.38
r ×t 5× 6
− Total Sum of Squares (TSS) = ΣΣyij2 – C.F. (Sum of the square of all
observations- C.F.)
= [(19.4)2 + (17.7)2 +…… + (20.8)2] - 11864.38 = 12994.36–11864.38=1129.98
− C .F .
r
(144.1) 2 + (119.9) 2 + ..... + (93.5) 2
= - 11864.38=847.05
5
− Error Sum of Squares (SSE) = Total SS- Treatment SS= 1129.98-847.05= 282.93
In CRD, treatment sum of squares are usually called between or among groups sum of
squares while the sum of squares among individuals treated alike is called within group
or error sum of squares.
4. Calculate the mean squares (MS) for treatment and error by dividing each sum of
squares by the corresponding degree of freedom
SST 847.05
− Treatment MS = = = 169.41
t −1 (6 − 1)
Error SS 282.93
− Error MS = = = 11.79
t (r − 1) 6(5 − 1)
5. Calculate F-value for testing significance of treatment effects
Treatment MS 169.41
− F-calculated = = = 14.37
Error MS 11.79
6. Obtain the tabulated F-value using treatment degree of freedom (d. f.) as numerator
(n1) and error d. f. as denominator (n2) at 5% and 1% level of significance
F (5, 24) at 5% = 2.60; F (5, 24) at 1% = 3.90
7. Summarize all the values computed on the above steps in the ANOVA- table for quick
assessment of results.
________________________________________________________________________
Source of DF SS MS Computed F Table F
Variation 5% 1%
________________________________________________________________________
Treatment (among strains) (t-1) = 5 847.05 169.41 14.37** 2.60 3.90
Error (within strains) t(r-1) = 24 282.93 11.79
Total (rt-1) = 29 1129.98
________________________________________________________________________
8. Compare the calculated F- value with table F- value and decide on significance among
the treatment effects using the following rules:
a) If F-calculated > F table at 1% level of significance, the difference between
treatments is highly significant. Put two asterisks on F-calculated
b) If F-calculated > F table at 5% level of significance, but ≤ F table at 1%, the
difference between treatments is significant. Put one asterisks on F-calculated.
c) If F-calculated ≤ F table at 5% level of significance, the differences among
treatments is non-significant. Put NS on the F- calculated value in ANOVA
table.
Note that a non-significant F test in the analysis of variance indicates the failure of the
experiment to detect any difference among treatments. It does not, in any way, prove that
all treatments are the same. The failure to detect treatment difference based on non-
significant F-test could be the result of either a very small or nil treatment difference or a
very large experimental error or both. Thus, whenever the F-test is non-significant, the
researcher should examine the size of experimental error and the numerical difference
among the treatment means. If both values are large, the trial may be repeated and efforts
should be made to reduce experimental error so that the differences among treatments, if
any can be detected. On the other hand, if both values are small, the difference among
treatments is probably too small to be of any economic value and, thus, no additional
trials are needed.
For the above example, the computed F value of 14.37 is larger than the tabulated F value
at the 1% level of significance of 3.90. Hence, the treatment difference is said to be
highly significant. In other words, chances are less than 1 in 100 that all the observed
differences among the six treatment means could be due to chance. It should also be
noted that such a significant F test verifies the existence of some differences among the
treatments tested but does not specify the particular pair (or pairs) of treatments that
differ significantly. To obtain this information, procedures for comparing treatment
means are used.
9. Compute the Coefficient of Variation (CV) and standard error (SE) of the treatment
means
Error MS 11.79
- CV= ×100 = × 100= 17.3%
Grand mean(G / rt ) 19.89
- SE± = MSE / r = 11.79 / 5 = 1.53mg
Coefficient of Variation indicates the degree of precision with which the treatments are
compared and it is a good index of the reliability of the experiment. The smaller the CV,
the more reliable the experiment is. The CV values greatly vary with the type of
experiment, experimental material or the character measured (e.g. data on days to
flowering have smaller CV than number of nodules in common bean as within treatment
variation is usually small in the former parameter than the later). In field experiments CV
up to 30% are common and usually lesser CV for laboratory and greenhouse experiments
are expected than field experiments.
Example: Twenty rats were assigned equally at random to four feed types. Unfortunately
one of the rats died due to unknown reason. The data are rat body weight in g after being
raised on these diets for 10 days. We would like to know whether weights of rats are the
same for all four diets at 5%.
________________________________________________________________________
Feed 1 Feed 2 Feed 3 Feed 4
________________________________________________________________________
60.8 68.7 102.6 87.9
57.0 67.7 102.1 84.2
65.0 74.0 100.2 83.1
58.6 66.3 96.5 85.7
61.7 69.8 90.3
________________________________________________________________________
Ti 303.1 346.5 401.4 431.2
ni 5 5 4 5
G 2 (1482.2) 2
- C.F. = = = 115627.20
n 19
- Total Sum of Squares (TSS) = ΣΣyij2 – C.F. (Sum of the square of all
observations- C.F.)
= [(60.8)2 + (57.00)2 +…… + (90.3)2] – 115627.20 = 4354.698
2
T
- Treatment Sum of Squares (SST) = ∑ ri - C.F.
i
MSE 8.557
× 100 = ×100 = 3.75%
Grand mean 1482.2 19
In such situations, designs and layouts can be constructed so that the portion of variability
attributable to the known source can be measured and excluded from experimental error.
Thus, difference among treatment means will contain no contribution to the known
source.
Randomized Complete Block Design (RCBD) can be used when the experimental units
can be meaningfully grouped, the number of units in a group being equal to the number
of treatments. Such a group is called a block and equals to the number of replications.
Each treatment appears an equal number of times usually once, in each block and each
block contains all the treatments.
Blocking (grouping) can be done based on soil heterogeneity in a fertilizer or variety
trials; initial body weight, age, sex, and breed of animals; slope of the field, etc.
During the course of the experiment, all units in a block must be treated as uniformly as
possible. For example,
- if planting, weeding, fertilizer application, harvesting, data recording, etc, operations
cannot be done in one day due to some problems, then all plots in any one block
should be done at the same time.
- if different individuals have to make observations of the experimental plots, then one
individual should make all the observations in a block.
This practice helps to control variation within blocks, and thus variation among blocks is
mathematically removed from experimental error.
Advantages:
1. Precision: More precision is obtained than with CRD because grouping
experimental units into blocks reduces the magnitude of experimental error.
2. Flexibility: Theoretically there is no restriction on the number of treatments or
replications. If extra replication is desired for certain treatments, it can be applied
to two or more units per block.
3. Ease of analysis: The statistical analysis of the data is simple. If as result of
change or misshape, the data from a complete block or for certain treatments are
unusable, the data may be omitted without complicating the analysis. If data from
individual units (plots) are missing, they can be estimated easily so that simplicity
of calculation is not lost.
Disadvantages:
The main disadvantage of RCBD is that when the number of treatments is large (>15),
variation among experimental units within a block becomes large, resulting in a large
error term. In such situations, other designs such as incomplete block designs should be
used.
Step 1: Divide the experimental area (unit) into r-equal blocks, where r is the number of
replications, following the blocking technique. Blocking should be done against the
gradient such as slope, soil fertility, etc.
Step 2: Sub-divide the first block into t-equal experimental plots, where t is the number of
treatments and assign t treatments at random to t-plots using any of the randomization
scheme (random numbers or lottery).
Step 3: Repeat step 2 for each of the remaining blocks.
ENVIRONMENTAL GRADIENT
The major difference between CRD and RCBD is that in CRD, randomization is done
without any restriction to all experimental units but in RCBD, all treatments must appear
in each block and different randomization is done for each block (randomization is done
within blocks).
where, Yij = the observation on the jth block and the ith treatment; µ = common mean
effect; τi = effect of treatment i; βj = effect of block j; and εij = experiment error for
treatments i in block j.
Step 2. Arrange the data by treatments and blocks and calculate treatment totals (Ti),
Block (rep) totals (Bj) and Grand total (G).
Example: Oil content of linseed treated at different stages of growth with N-fertilizes.
Step 3: Compute the correction factor (C.F.) and sum of squares using r as number of
blocks, t as number of treatments, Ti as total of treatment i, and Bj as total of block j.
G 2 (132.7 )
2
a. C.F. = = = 733.72
rt 4× 6
b. Total Sum of Square (TSS) = ΣYij2- C.F. = (4.4)2 + (3.3)2 + …. + (6.7)2 – C.F.
= 788.23 – 733.72 = 54.51
r
∑B
2
j
j =1
c. Block Sum of Squares (SSB) = − C .F .
t
=
(31.6)2 + (30.6)2 + (36.0)2 + (34.5)2 − 733.72 ; 736.86 – 733.72 = 3.14
6
t
∑T i
2
=
(20.4)2 + (17.2)2 + (16)2 + .... + (28.1)2 − 733.72 ; 765.37 – 733.72 = 31.65
4
e. Error Sum of Squares (SSE) = Total SS – SSB – SST
= 54.51 – 3.14 – 31.65 = 19.72
Step 4: Compute the mean squares for block, treatment and error by dividing each sum of
squares by its corresponding d.f.
SSB 3.14
Block Mean Square (MSB) = = = 1.05
r −1 3
SST 31.65
Treatment Mean Square (MST) = = = 6.33
t −1 5
SSE 19.72
Error Mean Square (MSE) = = = 1.31
(r − 1)(t − 1) 15
Step 5: Compute the F-value for testing block and treatment differences.
MSB MST
F-block = for block; F-treatment = for treatments.
MSE MSE
1.05 6.33
F-block = = 0.80 ; F-treatment = = 4.83
1.31 1.31
Step 6: Read table F- and compare the computed F-value with tabulated F-value and
make decision.
- F-table for comparing block effects, use block d. f. as numerator (n1) and error d.f. as
denominator (n2); F (3, 15) at 5% = 3.29 and at 1% = 5.42.
- F-table for comparing treatment effects, use treatment d.f. as numerator and error d.f.
as denominator F (5, 15) at 5% = 2.90 and at 1% = 4.56. If the calculated F-value is
greater than the tabulated F-value for treatments at 1%, it means that there is a highly
significant (real) difference among treatment means.
In the above example, calculated F-value for treatments (4.83) is greater than the
tabulated F-value at 1% level of significance (4.56). Thus, there is a highly significant
difference among the stages of application on nitrogen content of the linseed.
Step 7: Compute the standard error of the mean and coefficient of variability.
- Standard error (SE±)
MSE 1.31
SE ± = = = 0.57 g
r 4
MSE 1.31
CV = × 100 = × 100 = 20.7%
Grand mean 5.53
Step 8: Summarize the results of computations in analysis of variance table for quick
assessment of the result.
Step 1: Determine the level of significance of block variation by computing F-value for
block and test its significance.
MSB 1.05
F-block = = = 0.80
MSE 1.31
By comparing it with tabulated F-value at n1 (r-1) d.f. and n2 error d.f. (r-1) (t-1) = F (3,
15) at 5% = 3.29.
If the computed F-value is greater than the tabulated F-value, blocking is said to be
effective in reducing experimental error. Also the scope of an experiment may have been
increased when blocks are significantly different since the treatments have been tested
over a wider range of experimental conditions.
On the other hand, if block effects are small (calculated F for block < tabulated F value),
it indicates either that the experimenter was not successful in reducing error variance by
grouping of individual units (blocking) or that the units were essentially homogeneous to
start with.
Step 2: Determine the magnitude of the reduction in experimental error due to blocking
by computing the Relative Efficiency (R.E.) as compared to CRD.
R.E. =
(r − 1)MSB + r (t − 1) MSE
(rt − 1)MSE
Where MSB = block mean square; MSE = error mean square; r = number of replications;
t = number of treatments
R.E =
(4 − 1) × (1.05) + 4 × (6 − 1) × 1.31
(4 × 6 − 1) × 1.31
If error degree of freedom of RCBD is less than 20, the R.E. should be multiplied by the
adjustment factor (k) to consider the loss in precision resulting from fewer degrees of
freedom.
K=
[(r − 1)(t − 1) + 1][t (r − 1) + 3]
[(r − 1)(t − 1) + 3][t (r − 1) + 1]
K=
[(4 − 1)(6 − 1) + 1][6(4 − 1) + 3]
[(4 − 1)(6 − 1) + 3][6(4 − 1) + 1]
=
(3 × 5) + 1)(6 × 3 + 3) = 16 × 21 = 336 = 0.98
(3 × 5 + 3)(6 × 3 + 1) 18 × 19 342
Adjusted R.E. = 0.97 × K(0.98) = 0.95.
Sometimes data for certain units may be missing or become unusable. For example,
- when an animal becomes sick or dies but not due to treatment
- when rodents destroy a plot in field
- when a flask breaks in laboratory
- when there is an obvious recording error
A method is available for estimating such data. Note that an estimate of a missing value
does not supply additional information to the experimenter; it only facilitates the analysis
of the remaining data.
The estimated value is entered in the table with the observed values and the analysis of
variance is performed as usual with one d. f. being subtracted from both total and error d.
f. because the estimated value makes no contribution to the error sum of squares.
When all of the missing values are on the same block or treatment the simplest solution is
to consider as if the block or treatment had not been included in the experiment.
Example: In the table given below are yields (kg) of 4-varieties of maize (Al-composite,
Rarree-1, Bukuri, Katumani) in 4-replications planted in RCBD on a plot size of 10m x
10 m of which one plot yield is missing. Estimate the missing value and analyze the data
Treatment
Varieties I II III IV total (Ti)
Bukuri 18.5 15.7 16.2 14.1 64.5
Katumani 11.7 - 12.9 14.4 39(To)
Rarree-1 15.4 16.6 15.5 20.3 67.8
Al-composite 16.5 18.6 12.7 15.7 63.5
Block total 62.1 50.9 (Bo) 57.3 64.5
Go (Grand total) = 234.8
Solution
a. Estimate the missing value
rBo + tTo − Go
Y=
(r − 1)(t − 1)
[(4 × 50.9) + (4 × 39) − 234.8]
Y= = 13.9
(4 − 1)(4 − 1)
b. Enter the estimated value and carry out the analysis following the usual procedure:
- Corrected treatment total = 39 + 13.9 = 52.9
- Corrected block total = 50.9 + 13.9 = 64.8
- Corrected grand total = 234.8 + 13.9 = 248.7
c. Analysis of variance
1. C.F. =
(248.7 )2 = 3865.73
rt (4 × 4 )
2. TSS=
4 4
∑∑ Yij − C.F . = (18.5) + (11.7 ) + ..... + (13.9 ) + ........ + (15.7 ) − C.F . = 79.18
2 2 2 2 2
i =1 j =1
3. Treatment SS =
4
∑ Ti
i =1
2
∑ Bj
i =1
2
d. Compute the correction factor for bias (B) for treatment sum of squares as the
treatment SS is biased upwards.
B=
[Bo − (t − 1) y ]2
t (t − 1)
Bo = Total of observed values in blocks (replication) containing the missing value
Y = estimated value
[50.9 − (4 − 1) × 13.9]2
B= 4 × (4 − 1)
=
[50.9 − 41.7]2
= 7.05
12
e. Subtract the computed B value from Total SS & Treatment SS
- Adjusted Treatment SS = Treatment SS – B
= 31.21 – 7.05= 24.16
- Adjusted Total SS = Total SS-B
= 79.18 – 7.05 = 72.13
f. Subtract 1 from error d. f. and total d. f. and complete the analysis of variance table.
MSE 4 .9
CV= × 100 = × 100 = 14.1%
Grand mean 15.65(234.8 / 15)
The major feature of the Latin Square Design is its capacity to simultaneously handle two
known sources of variation among experimental units unlike Randomized Complete
Block Design (RCBD), which treats only one known source of variation.
The two directional blocking in a Latin Square Design is commonly referred as row
blocking and column blocking. In Latin Square Design the number of treatments is equal
to the number of replications that is why it is called Latin Square.
Advantages:
- Greater precision is obtained than Completely Randomized Design & Randomized
Complete Block Design (RCBD) because it is possible to estimate variation among
row blocks as well as among column blocks and remove them from the experimental
error.
Disadvantages:
- As the number of treatments is equal to the number of replications, when the number
of treatments is large the design becomes impractical to handle. On the other hand,
when the number of treatments is small, the degree of freedom associated with the
experimental error becomes too small for the error to be reliably estimated. Thus, in
practice the Latin Square Design is applicable for experiments in which the number
of treatments is not less than four and not more than eight.
- Randomization is relatively difficult.
Step 1: To randomize a five treatment Latin Square Design, select a sample of 5 x 5 Latin
square plan from appendix of statistical books. We can also create our own basic plan and
the only requirement is that each treatment must appear only once in each row and
column. For our example, the basic plan can be:
A B C D E
B A E C D
C D A E B
D E B A C
E C D B A
Step 2: Randomize the row arrangement of the plan selected in step 1, following one of
the randomization schemes (either using lottery method or table of random numbers).
- Select from table of random numbers, five three digit random numbers avoiding ties
if any
Random numbers: 628 846 475 902 452
Rank: (3) (4) (2) (5) (1)
- Rank the selected random numbers from the lowest (1) to the highest (5)
- Use the ranks to represent the existing row number of the selected plan and the
sequence to represent the row number of the new plan. For our example, the third row
of the selected plan (rank 3) becomes the first row (sequence) of the new plan, the
fourth becomes the second row, etc.
1 2 3 4 5
3 C D A E B
4 D E B A C
2 B A E C D
5 E C D B A
1 A B C D E
Step 3: Randomize the column arrangement using the same procedure. Select five three
digit random numbers.
Random numbers: 792 032 947 293 196
Rank: (4) (1) (5) (3) (2)
The rank will be used to represent the column number of the above plan (row arranged)
in step 2. For our example, the fourth column of the plan obtained in step 2 above
becomes the first column of the final plan, the first column of the plan becomes 2, etc.
Final layout:
E C B A D
A D C B E
C B D E A
B E A D C
D A E C B
Note that each treatment occurs only once in each row and column
Sample layout of three treatments each replicated three times in Latin Square Design
where,
Yijk = the observation on the ith treatment, jth row & kth column
µ = Common mean effect
τi = Effect of treatment i
βj = Effect of row j
ϒk = Effect of column k
εijk = Experiment error (residual) effect
Example: Grain yield of three maize hybrids (A, B, and D) and a check variety, C, from
an experiment with Latin Square Design.
STEPS OF ANALYSIS
Step 1: Arrange the raw data according to their row and column designation, with the
corresponding treatments clearly specified for each observation and compute row total
(R), column total (C), the grand total (G) and the treatment totals (T).
− C .F . =
(5.62)2 + (5.35)2 + (5.225)2 + (5.170)2 − 28.53 = 0.03
t 4
Column SS=
∑C 2
− C .F . =
(6.35)2 + (4.395)2 + (6.145)2 + (4.475)2 − 28.53 = 0.83
t 4
Treatment SS=
∑T 2
− C .F . =
(5.855) + (5.885) + (4.270 ) + (5.355)
2 2 2
− 28.53 = 0.43
2
t 4
Error SS =Total SS–Row SS–Column SS–Treatment SS=1.41–0.03–0.83–0.43 = 0.12
Step 3: Compute the mean squares for each source of variation by dividing the sum of
squares by its corresponding degrees of freedom.
Error MS 0.02
Compute the CV as: = × 100 = × 100 = 10.6%
Grand mean 1.335
Note that although the F-test on the analysis of variance indicates significant differences
among the mean yields of the 4-maize varieties tested, it does not identify the specific
pairs or groups of varieties that differed significantly. For example, the F-test is not able
to answer the question whether every one of the three hybrids gave significantly higher
yield than that of the check variety. To answer these questions, the procedure for mean
comparison should be used.
As in RCBD, where the efficiency of one way blocking indicates the gain in precision
relative to CRD, the efficiencies of both row and column blocking in a Latin Square
Design indicate the gain in precision relative to either the CRD or RCBD, the procedures
are:
i. Compute the F-value for testing the row & column effects; and test their
significance
When the error d. f. in the Latin Square analysis of variance is < 20, the R.E. value
should be multiplied by the adjustment factor (K) defined as:
The results indicate that the additional column blocking made possible by the use of Latin
Square Design is estimated to have increased the experimental precision over that of
RCBD by 290%, whereas the additional row-blocking in the LS design did not increase
precision over the RCBD with column as blocks. Hence, for the above trial, a RCBD with
column as blocks would have been as efficient as a Latin Square Design.
The analysis of variance is performed in the usual manner after entering the estimated
value with one degree of freedom being subtracted from total and error degrees of
freedom for each missing value.
As in the case of RCBD, the treatment sum of squares is biased upward by:
Bias (B) =
[Go − Ro − Co − (t − 1)T0 )]2
[(t − 1)(t − 2)]2
Where Go, Ro, Co, To and t are as described above. Then B is subtracted from treatment
SS & total SS.
Example: Yield (kg) of five rice varieties tested in Latin Square Design from plot size of
100 m2.
Estimate the missing value, complete the analysis of variance and compare the variety C
with D, and A with E at 5% level of significance using LSD test.
Row SS =
∑R 2
− C .F . =
(52.5)2 + (43.0)2 + ...... + (42)2 − 1892.25 = 31.6
t 5
Column SS =
∑C 2
− C .F . =
(40.0) + (43.5)2 + ........ + (43.0)2 − 1892.25
2
= 6.60
t 5
Treatment SS =
∑T − C .F . =
2
(41.0)2 + (33.0)2 + (48.5)2 + ... + (61.0)2 − 1892.25 = 107.6
t 5
Error SS = Total SS–Row SS–Column SS – Treatment SS = 180.0–31.6–6.6–107.6= 34.2
d. Compute the correction factor for bias (B) for treatment sum of squares as the
treatment SS is biased upwards.
Bias (B) =
[Go − Ro − Co − (t − 1)T0 ]2 = [206 − 41 − 32 − (5 − 1)37]2 = 1.56
[(t − 1)(t − 2)]2 [(5 − 1)(5 − 2)]2
e. Subtract the computed B value from Total SS & Treatment SS
Adjusted Treatment SS = Treatment SS – B = 107.6-1.56 = 106.04
Adjusted Total SS = Total SS-B = 180.0-1.56 = 178.44
f. Subtract 1 from error d. f. and total d. f. and complete the analysis of variance table.
Difference between the treatment means of C & D (37/4-34/5) = 9.25-6.8 = 2.45. Since
the difference is less than LSD value, there is no significant difference between
treatments C & D.
To compare the treatments A & E both with equal replication (without missing value)
2 MSE 2 × 3.11
LSD5% = t 0.025(11) × sd , where sd = = = 1.115
r 5
LSD5% = 2.201 × 1.115 = 2.45 kg
Difference between the treatment means of A & E (61/5-41/5) = 12.2-8.2 = 4.00. Since
the difference is greater than LSD value, there is significant difference between
treatments A & E.
Theoretically, the complete block designs (where each block contains all the treatments)
such as Randomized Complete Block and Latin Square are applicable to experiments with
any number of treatments. However, these complete block designs become less efficient as
the number of treatments increases, mainly because block size increases proportionally with
the number of treatments which in turn increases experimental error.
An alternative set of designs for single factor experiments having a large number of
treatments are the incomplete block designs. For example, plant breeders are often interested
in making comparisons among a large number of selections in a single trial. For such trials,
we use incomplete block designs. As the name implies, the experimental units in these
designs are grouped into blocks which are smaller than a complete replication of the
treatments. However, the improved precision with the use of an incomplete block designs
(where the blocks do not contain all the treatments) is achieved with some costs. The major
ones are:
- inflexible number of treatments or replications or both.
- unequal degree of precision in the comparison of treatment means.
- complex data analysis.
Although there is no concrete rule as to how large the number of treatments should be
before the use of an incomplete block design, the following points may be helpful:
b. Computing facilities and services: Data analysis of an incomplete block design is more
complex than that for a complete block design. Thus, the use of an incomplete block
design should be considered only as the last measure.
The lattice designs are the most commonly used incomplete block designs in agricultural
experiments. There is sufficient flexibility in the design to make its applications simpler than
most of the other incomplete block designs. There are two kinds of lattices: balanced lattice
and partially balanced lattice designs.
Note that treatment A occurs with treatment B, with treatment C and with treatment D only
once.
b. The number of treatments (t) must be perfect square such as 16, 25, 36, 49, 64, etc.
c. The block size (k) is equal to the square root of the number of treatments, i.e. k =
t.
d. The number of replications (r) is one more than the block size, i.e. r = k + 1. That is,
the number of replications required is 6 for 25 treatments, 7 for 36 treatments, and so
on.
As balanced lattices require large number of replications, they are not commonly used.
The partially balanced lattice design is more or less similar to the balanced lattice design,
but it allows for a more flexible choice of the number of replications. The partially
balanced lattice design requires that the number of treatments must be a perfect square and
the block size is equal to the square root of the number of treatments. However, any
number of replications can be used in partially balanced lattice design. The partially
balanced lattice design with two replications is called simple lattice, with three replications
is triple lattice and with four replications is quadruple lattice, and so on. However, such
flexibility in the number of replications results in the loss of symmetry in the arrangement
of the treatments over blocks (i.e. some treatment pairs never appear together in the same
incomplete block). Consequently, the treatment pairs that are tested in the same incomplete
block are compared with higher level of precision than for those that are not tested in the
same incomplete block. Thus, partially balanced designs are more difficult to analyze
statistically, and several different standard errors may be possible.
Example 1 (Lattice with adjustment factor): Field arrangement and broad leaved weed
kill (%) at tef fields of Debre zeit research center by 16-herbicides tested in 4 × 4 Triple
Lattice Design (Herbicide numbers in parenthesis).
________________________________________________________________________
Block
Replication Block % kill total (B) M Cb
_________________________________________________________________________
1 1 75(15) 57(16) 71(13) 77(14) 280 789 -51
2 78(12) 66(11) 68(10) 45(9) 257 716 -55
3 40(6) 64(5) 49(8) 42(7) 195 608 23
4 59(3) 53(1) 46(2) 54(4) 212 642 6
Rep total (R1) 944 -77
______________________________________________________________________
2 1 53(16) 66(4) 57(12) 47(8) 223 663 -6
2 80(14) 48(6) 73(10) 52(2) 253 700 -59
3 36(7) 63(11) 67(15) 47(3) 213 676 37
4 68(13) 60(1) 50(9) 76(5) 254 716 -46
Rep total (R2) 943 -74
_________________________________________________________________________
3 1 66(15) 46(2) 58(12) 69(5) 239 754 37
2 46(4) 40(7) 59(13) 55(10) 200 678 78
3 43(9) 55(3) 50(8) 68(14) 216 670 22
4 60(11) 58(1) 48(16) 47(6) 213 653 14
Re total (R3) 868 151
Analysis of variance
Step 1: Calculate the block total (B), the replication total (R) and Grand Total (G) as shown above.
Step 2: Calculate the treatment totals (T) by summing the values of each treatment from the three
replications.
Step 3: Using r as number of replications and k as block size, compute the total sum of squares,
replication sum of squares, treatment (unadjusted) sum of squares as:
G 2 (2755) 2
- Correction Factor (C. F.) = = = 158125.5
rk 2 3 × 16
- Total Sum of Squares = (75)2 + (57)2 + .... + (47)2 – C.F. = 164233- 158125.5 = 6107.5
- C. F.=
(944) 2 + (943) 2 + (868) 2
-158125.5 =
k2 16
158363.06- 158125.5 = 237.6
- C. F.
r
(171) 2 + (144) 2 + .... + (158) 2
= -158125.5 = 163029- 158125.5 = 4903.5
3
Step 4: For each block, calculate block adjustment factor (Cb) as:
Cb = M -rB where M is the sum of treatment totals for all treatments appearing in that particular
block, B is the block total and r is the number of replications. For example, block 2 of replication 2
contains treatments 14, 6, 10, and 2. Hence, the M value for block 2 of replication 2 is: M = T14 +
T6 + T10 + T2 = 225 + 135 + 196 + 144 = 700 and the corresponding Cb value is: Cb = 700 - (3 x
253) = -59. The Cb values for the blocks are presented in the above table. Note that the sum of Cb
values over all replications should add to zero (i. e. -77 + -74 + 151 = 0).
Step 7: Calculate the intra-block error mean square (MS) and block (adj.) mean square (MS) as:
Intra − block error SS 434.13
- Intra-block error MS = = = 20.67
( k − 1)( rk − k − 1) 21
Block (adjust ) SS 532.27
- Block (adjusted) MS = = = 59.14
(r )(k − 1) 9
Note that if the adjusted block mean square is less than intra-block error mean square, no further
adjustment is done for treatment. In this case, the F-test for significance of treatment effect is made
in the usual manner as the ratio of treatment (unadjusted) mean square and intra-block error mean
square and steps 8 to 13 can be ignored. For our example, the MSB value of 59.14 is greater than
the MSE value of 20.67, thus, the adjustment factor is computed.
Step 8: Calculate adjustment factor A. For a triple lattice design, the formula is:
where Eb is the block (adju.) mean square and Ee is the intra-block error mean square, and k is block
size.
Step 9: For each treatment, calculate the adjusted treatment total (T') as:
∑
T' = T + A Cb where the summation runs over all blocks in which that particular treatment
appears. For example, the adjusted treatment totals for treatment number 1 and 2 are computed as:
The adjusted treatment totals (T') and their respective means (adjusted treatment total divided by the
number of replications (3) are presented along with the unadjusted treatment totals (T) in the table
above.
Step 10: Compute the adjusted Treatment Sum of Squares as:
(r )( SSBun )
Treatment (unadjusted) SS – [Ak(r-1) − SSB ]
(r − 1)(1 + kA)
Step 12: Compute the F-value for testing the significance of treatment difference and compare the
computed F-value with the tabulated F-value with k2 - 1 = 15 degree of freedom as numerator and
(k - 1) (rk - k - 1) = 21 as denominator.
Step 13: Compare the computed F-value with the table F-value
F0.05 (15, 21) = 2.18
F0.01 (15, 21) = 3.03
Since the computed F-value of 12.93 is greater than the tabulated F-value at 1% level of
significance (3.03), the differences among the herbicide means are highly significant.
Step 14: Enter all values computed in the analysis of variance table
___________________________________________________________________________
Source of Degrees of Sum of Mean Computed Tabulated F
Variation freedom squares squares F 5% 1%
____________________________________________________________________________
Replication (r-1) 2 237.6 118.8
Block (adj.) r (k-1) 9 532.27 59.14
2
Herbicide (unadj.) k -1 15 4903.5 326.9
Intra-block error (k-1) (rk-k-1) 21 434.13 20.67
Herbicide (adju.) k2-1 (15) (4010.2) 267.35 12.93 2.18 3.03
Total rk2- 1 47 6107.5
____________________________________________________________________________
Step 15: Compute the corresponding Coefficient of Variation (CV) as:
Step 16: Compute the gain in precision of the triple lattice relative to that of Randomized Complete
Block Design as:
( SSB + SSE )
- % Relative precision = 2 / Ee'×100 ; where SSB is Block (adj.) SS, SSE is
(k − 1)(r − 1)
intra-block error SS, r is number of replication, and k is block size.
rkA
- Ee' (effective error mean square) = (1 + ) x Ee, where r is the number of replications,
(k + 1
k is block size, A is adjustment factor, Ee is intra-block error mean square.
-
(532.27 + 434.13) 3 × 4 × 0.0813
- Erb = /(1 + ) × 20.67
(4 − 1)(3 − 1)
2
(4 + 1)
Thus, the relative precision = (32.2/24.7) × 100 = 130.4
This indicates that the precision of this experiment was increased by about 30.4% by using the triple
lattice instead of Randomized Complete Block Design.
2 × 20.67
LSD1% = t 0.005(21) + [1 + (3 × 0.0813) ] = 2.831× 3.88 = 10.97%
3
Randomization
A. Conveniency
For identification purpose, sometimes the checks are assigned at the start, end or in
certain intervals in the block.
b1 = P1 P2 A P3 B P4 D P5 C P6 P7 ..... P10
b2 = D P1 P2 C P3 P5 A B ….....
etc.
One of the blocks can be 10 (test culture) + 4 (checks), while the other 12 (test cultures) +
4 (checks). Missing test entries (Pi) do not create problem in analysis as the analysis can
be done with existing genotypes. But when checks are missing, the analysis becomes
complicated.
Assume we want to test 16 new rice genotypes (test cultures) = P1, P2, ..., P16 in block size
of 4 with 4 checks to screen for early maturity.
b (number of blocks) = 4
c (number of checks) = 4 (A B C D).
Block size = 4 test cultures + 4 checks = 8
Days to maturity of rice genotypes
Block 1
P1 A P2 B P3 C P4 D
120 83 100 77 90 70 85 65
Block 2
P5 B P6 C P7 A P8 D
88 76 130 71 105 84 110 64
Block 3
P9 D P10 A P11 B P12 C
102 63 140 86 135 78 138 69
Block 4
P13 A P14 D P15 C P16 B
84 82 90 63 95 68 103 75
Blocks ni Block Total of test culture No. of test Block T i x Be Block effect
total (Bj) in a block (Bti) cultures in a effect(Be) x Block total
block (Ti)
B1 8 690 395 4 0.375 1.5 258.75
B2 8 728 433 4 0.375 1.5 273.00
B3 8 811 515 4 0.625 2.5 506.87
B4 8 660 372 4 -1.375 -5.5 -907.5
Σ 32 2889 0 0 131.12
1
- b1 = × (690 - 293.5 – 395) = 0.375
4
1
- b2 = × (728 - 293.5 - 433) = 0.375
4
1
- b3 = × (811 - 293.5 - 515) = 0.625
4
1
- b4 = × (660 - 293.5 - 372) = -1.375
4
Where ni is number of entries (test culture + checks) in each block = 4 + 4 = 8, ∑ni = N =
32.
Grand total = ∑Bi (sum of block total) or sum of all observations = 2889.0
1 2889−880.5
Adjusted grand mean = × [2889 - (4 - 1) (293.5) - 0] = = 100.42
4 + 16 20
There are as many checks effects as the number of checks (4 in this case).
Step 3.4. Adjusted progeny value per ith progeny (Pi) as:
Observed (unadjusted) progeny value - effect of block in which the ith progeny is
occurring
P1 (adjusted) = P1 - (block effect) = 120 - (+ 0.375) = 119.62, etc.
Step 3.5. Estimate progeny effect as: Adjusted progeny value - adjusted grand mean;
For example, progeny effect for progeny 1 = 119.62-100.42 = 19.20; etc.
Analysis of variance
G2 (Grand total ) 2 2
1. Correction Factor (C.F.) = = = ( 2889 ) = 260822.53
N No. of observatio ns 32
2. Total SS = ∑Y2 – C.F. = Sum of the squares of all observations – C.F. = (120)2 + (83)2
+ ... + (75)2 - 260822.53 = 276621-260822.53 = 15798.47
2
B (690) 2 (728) 2 (811) 2 (660) 2
3. Crude block SS = ∑ i = + + + = 262425.62
ni 8 8 8 8
4. True block SS: Crude block SS – C.F. = 262425.62 - 260822 .53 = 1603.09
5. Adjusted SS Due to entries (C + P) = (Adjusted grand mean × Observed grand total) +
[( ∑ Block effect × corresponding block total ) +
( ∑ Check effect × corresponding check total ) +
squares)]
= (100.42 × 2889) + [(131.12) + (-30850.58) + 17216– 262425.62] = 14184.3
6. Unadjusted SS due to entries
∑ (each check total )
2
P 16
= 189557 – 183826.56 = 5730.44
P = No. of progenies/test cultures = 16
ii. To compare two progenies/test materials occurring in the same block at 5% level of
significance
−
t0.025(9) x s d = 2.262 × 2 × MSE = LSD5% = 2.262 × 2 × 2.20 = 4.74 days
iii. To compare two test cultures (progenies) occurring at different block at 5% level of
significance
1
LSD 5% = t 0.025 (9) × 2 × MSE (1 + ) ; c = No. of checks
c
1
= 2.262× 2 × 2.20(1 + ) = 5.3 days
4
9. FACTORIAL EXPERIMENTS
Factorial experiments are experiments in which two or more factors are studied together.
Factor is a kind of treatment and in a factorial experiment any factor will supply several
treatments. In factorial experiment, the treatments consist of combinations of two or more
factors each at two or more levels.
Factorial experiment can be done in CRD, RCBD and Latin Square Design as long as the
treatments allow. Thus, the term factorial describes specific way in which the treatments
are formed and it does not refer to the experimental design used, e.g. nitrogen &
phosphorus rates:
N = 0, 50, 100, 150 kg/ha
P = 0, 50, 100, 150 kg/ha
Kinds (noug cake, groundnut cake) and levels of protein supplement (25%, 50%, 75%).
The term level refers to the several treatments within any factor, e.g. if 5-varieties of
sorghum are tested using 3-different row spacing the experiment is called 5 x 3 factorial
experiment with 5 levels of variety factor (A) and three levels of spacing factor (B). An
experiment involving 3 factors (variety, N-rate, weeding method) each at 2 levels is
referred as 2 × 2 × 2 or 23 factors; 3 refers to the number of factors and 2 refers to levels.
Here we have 8 treatment combinations variety (x, y), N-rate (0, 50 kg/ha), weeding
(with or without weeding). The 23 × 3 is a four factor experiment in which three factors
each at 2-levels and the 4th factor at 3 levels.
If the above 23 factorial experiment is done in RCBD, the correct description of the
experiment will be 23 factorial experiment in RCBD.
Interaction
Sometimes the factors act independent of each other. By this we mean that changing the
level of one factor produces the same effect at all levels of another factor. Often,
however, the effects of two or more factors are not independent. Interaction occurs when
the effect of one factor changes as the level of the other factor changes, e.g. if the effect
of 50kg N on variety x is 10 Q/ha and its effect on a variety y is 15 Q/ha, then there is
interaction. When factors interact, the factors are not independent and a single factor
experiment will lead to disconnected or misleading information. However, if there is no
interaction it is concluded that the factors under consideration act independently of each
other. Thus, results from separate single factor experiments are equivalent to those from a
factorial experiment.
Example: A tall maize variety might out yield a short variety in high fertilizer rates due
to high dry matter production.
Interaction is the failure of the differences in response to changes in levels of one factor
to be the same at all levels of another factor or when the effect of one factor changes as
the level of the other factor changes.
Simple effects
- Simple effect of variety at N0: 2-1 = 1
- Simple effect of variety at N1: 4-1 = 3
- Simple effect of N on variety X: 1-1 = 0
- Simple effect of N on variety Y: 4-2 = 2
Interaction
It is calculated as the average of difference between simple effects of A at the two levels
of B or the difference between the simple effects of B at the two levels of A.
= ½ (Simple effect of A at b1 – simple effect of A at b0)
= ½ [(a1b1-a0b1) - (a1b0-a0b0)] = ½ [(4-1) - (2-1)] = 1
or
= ½ (Simple effect of B at a1 – simple effect of B at a0)
= ½ (a1b1-a1b0) - (a0b1-a0b0)] = ½ [(4-2) - (1-1)] = 1
He conducted this experiment using Randomized Complete Block Design with four
blocks of six plots each.
Block-I
T2P2 T2P1 T1P1 T2P3 T1P3 T1P2
8.3 11.0 11.5 15.7 18.2 17.1
Block-II
T2P1 T2P2 T2P3 T1P2 T1P1 T1P3
11.2 10.5 16.7 17.6 13.6 17.6
Block-III
T1P2 T1P1 T2P1 T1P3 T2P3 T2P2
17.6 14.3 12.1 18.2 16.6 9.1
Block-IV
T1P3 T2P2 T2P3 T2P1 T1P2 T1P1
18.9 12.8 17.5 12.6 18.1 14.5
1. Construct two way table for factors and calculate factor A total, Factor B total and
grand total
_____________________________________________________________________
Phosphorus (Factor B)
_________________________________________________
Variety (Factor A) P1 P2 P3 Factor A total (A)
_____________________________________________________________________
T1 (indeterminate) 53.9 70.4 72.9 197.2
T2 (determinate) 46.9 40.7 66.5 154.1
Block total
Block I II III IV
Total 81.8 87.2 87.9 94.4
G2 (351.3) 2
- C.F. = = = 5142.15; where r is number of replications, a is level of
rab 4× 2×3
factor A and b is level of factor B
- Factor A (variety) SS = ∑A 2
(197.2) 2 + (154.1) 2
- C.F = - 5142.15 = 77.40
rb 4×3
- Factor B (P-rate ) SS = ∑ B2
- C.F =
(100.8) 2 + (111.1) 2 + (139.4) 2
– 5142.15 = 99.87
ra 4× 2
- A × B SS = Treatment SS – Factor A SS – Factor B SS =221.38 –77.40 – 99.87 = 44.11
ANOVA TABLE
______________________________________________________________________________
Source DF SS MS F-calcul. F-table
5% 1%
______________________________________________________________________________
Block r-1(4-1) = 3 13.32 4.44 7.65** 3.29 5.42
Variety (V) a-1 (2-1) = 1 77.40 77.40 133.45** 4.54 8.68
Phosphorus (P) b-1 (3-1) = 2 99.87 49.93 86.09 3.68 6.36
V×P (a-1) (b-1) = 2 44.11 22.05 38.03 3.68 6.36
Error (r-1) (ab-1) = 15 8.68 0.58
Total rab -1= 23 243.38
______________________________________________________________________________
Error MS 0.58
CV = × 100; CV = × 100 = 5.2%
Grand mean 14.64
Interpretation of a factorial experiment
The interpretation of the results of factorial experiment depends on the outcome of the
significance tests. If factor A × factor B interaction is significant, the main effects have
no real meaning whether significant or not. In our case, since A × B interaction is highly
significant, the results of experiment are best summarized in a two way table means of
various A × B combinations. If interaction is not significant, then all of the information in
the trial is contained in the significant main effects. In this case the results may be
summarized in tables of mean for factors with significant main effects.
Mean Comparisons
_____________________________________________________________________
Phosphorus (B)
_________________________________________________
Variety (A) P1 P2 P3 Variety mean
_____________________________________________________________________
T1 (indeterminate) 13.47 17.60 18.22 16.43
T2 (determinate) 11.72 10.17 16.62 12.84
−
Standard error of mean differences ( Sd )
− − 2 × MSE 2 × 0.58
- Sd to compare any two factor A means: Sd A = = = 0.31 Q
rb 4×3
− − 2 × MSE 2 × 0.58
- Sd to compare any two factor B means: Sd B = = = 0.38 Q
ra 4× 2
− − 2 × MSE
- Sd to compare any two factor combination (treatment) means: Sd AB =
r
2 × 0.58
= = 0.54 Q
4
9.5 Split-Plot Design
Split-plot design is frequently used for factorial experiments where the nature of
experimental material makes it difficult to handle all factor combination. The principle
underlying is that the levels of one factor are assigned at random to large experimental
units. The large units are then divided into smaller units and then the levels of the second
factor are assigned at random to small units within large units.
The large units are called the whole units or main-plots whereas the small units are called
the split-plots or sub-plots (units). Thus, each main plot becomes a block for the sub-plot
treatments. In split-plot design, the main plot factor effects are estimated from larger
units, while the sub-plot factor effects and the interactions of the main-plot and sub-plot
factors are estimated from small units.
As there are two sizes of experimental units, there are two types of experimental error,
one for the main plot factor and the other for the sub-plot factor. Generally, the error
associated with the sub-plots is smaller than that for the whole plots due to the fact that
error degrees of freedom for the main plot are usually less than those for the sub-plots.
In split-plot design, the precision for the measurement of the effect of main plot factor is
sacrificed to improve the precision of the measurement of the sub-plot factors.
c. When greater precision is desired for comparison of certain factors than others.
Since in a split-plot design, plot size and precision of measurement of the effects are not
the same for both factors, the assignment of a particular factor to either the main-plot or
to the sub-plot is extremely important.
b. Relative size of the main effect: If the main effect of one factor (factor A) is
expected to be much larger and easier to detect than factor B, then factor A can be
assigned to the main unit and factor B to the sub-unit. For instance, in fertilizer
and variety experiments, the researcher may assign variety to the sub-unit and
fertilizer rate to the main-unit, because he expects fertilizer effect to be much
large and easier to detect than the varietal effect.
Advantages
a. It permits the efficient use of some factors, which require large experimental units
in combination with other factors, which require small experimental units.
b. It provides increased precision in comparison of some of the factors (sub-plot
factors).
c. It promotes the introduction of new treatments into an experiment, which is
already in progress.
Disadvantages:
a. Statistical analysis is complicated because different factors have different error
mean squares.
b. Low precision for the main plot factor can result in large differences being non-
significant, while small differences on the sub-plot factor may be statically
significant even though they are of no practical significance.
There are two separate randomization process in split-plot design, one for the main plot
factor and another for the sub-plot factor.
In each block, the main plot factors are first randomly applied to the main plots followed
by random assignment of the sub-plot factors. Each of the randomization is done by any
of the randomization schemes.
Example: An experiment was designed to test the effect of feeding four forage crops
(Rhodes grass, Vetch, Alfalfa and Oat) on weight gain (kg/month) of the two breeds of
cows (Zebu, Holstein). At the start of the experiment, it was assumed that breeds of cows
would respond differently to the feed stuffs. Therefore, it was decided to use factorial
experiment. The objective of the experiment was to compare the effect of forage crops as
precisely as possible. Therefore, the experimenter assigned the breeds of animals to the
main-plot and the four forage crops to the sub-plots. The experiment was replicated in
three blocks (barns) based on initial body weight of animals as a blocking factor.
Procedures of randomization
Step 1: Divide the experimental area into r = 3 blocks, and divide each block into two
main plots. Then randomly assign the two breeds of animals (H, Z) in each of the blocks.
Note that the arrangement of the main-plot factor can follow any of the designs: CRD,
RCBD and LATIN square.
Step 2: Divide each of the main plot (unit) into 4-sub plots (units) and randomly assign
the four feed stuffs (A, V, O, R) to each of the six-main plots (units).
Note:
Each main-plot factor is tested r-times where r is the number of blocks while each sub-
plot factor is tested a × r times where a is level of factor A and r is the number of blocks.
This is the primary reason for more precision for the sub-plot factors as compared to the
main-plot factors.
The layout and the weight gain (kg/month) of the animals for feeding are given below:
Block I Block II Block III
H Z H Z Z H
A R O V O V
25.9 15.5 18.0 22.7 13.2 28.4
V A A O A A
25.3 18.9 26.7 13.5 19.6 27.6
O O V R V R
19.3 13.8 24.8 15.0 22.3 25.4
R V R A R O
22.2 21.0 24.2 18.3 15.2 20.5
The F-test (ANOVA) shows whether there is significant difference among treatments or
not. But, it does not show as which means are different from each other. There are many
ways to compare the means of treatments tested in an experiment. One of these is pair
comparison, the simplest and most commonly used comparisons in agricultural research.
A. Planned pair comparison: In which the specific pair of treatments to be compared are
identified before the start of the experiment, e.g. comparing the control treatment with
each of the other treatments.
The most commonly used test procedures for pair comparison in agricultural research are
the Least Significant Difference and Tukey’s test which are suitable for planned pair
comparison and Duncan’s Multiple Range Test (DMRT) which is applicable to an
unplanned pair comparison.
The simplest and the most commonly used procedure for making pair comparisons. The
procedure provides a single value at a prescribed level of significance, which serves as
the boundary between significant and non-significant differences between any pair of
treatment means. That is, two treatments are declared significantly different at a
prescribed level of significance if their mean difference exceed the computed lsd value,
otherwise they are not significantly different.
The LSD test is not valid for comparing all possible pair of means especially when the
number of treatments is large. This is so because the number of possible pairs of
treatment means increase rapidly as the number of treatments increase. In experiments
where no real difference exists among all treatments, the numerical difference between
the largest and smallest treatment means is expected to exceed the LSD value when the
number of treatments is large.
To avoid this problem, the LSD test is used only when the F-test for treatment effect is
significant and the number of treatments is not too large (less than six).
The procedure for applying the LSD test to compare any two treatments means
1. Rank the treatment means from the largest to the smallest in the column and from the
smallest to largest in rows.
2. Compute all possible differences between the two treatment means to be compared.
3. Compute the LSD value at α level of significance
LSDα = tα/2 (n) × s d
where s d = standard error of the treatment mean difference; tα/2 (n) is the table t-value at
α/2 level of significance and with n error degree of freedom
Example: Oil content (g) of linseed treated at different stages of growth with N-fertilizes
tested in RCBD in four replications with error mean square of 1.31.
2 MSE
LSD5% = t0.025(15) × , where MSE is error mean square; r is the number of
r
2 × 1.31
replications = 2.131 × = 1.72 g
4
2 MSE 2 × 1.31
LSD1% = t 0.005(15) × = 2.947 × = 2.39 g
r 4
4. Compare the mean difference (d) in step 2 with LSD value computed in step (3) using
the following rule:
- if /d/ >LSD value at 1% level of significance, there is highly significant difference
between the two treatment means compared (put two asterisks on differences).
- if /d/>LSD value at 5% level of significance but < LSD value at 1% level of
significance, there is significant difference between the two treatment means
compared (put one asterisks on differences)
- if /d/ ≤ LSD value at 5% level of significance, the two treatment means compared are
not significantly different (put n.s.)
Thus, the differences between T6 & T3, T6 & T2, T4 & T3, T4 & T2 are highly
significant; while the differences between T3 & T5, T1 & T6, T2 & T5 are significant.
t (t − 1)
Note that there are possible (unplanned) pair comparisons and (t-1) planned pair
2
comparisons where t is the number of treatments. In the above example, 15 unplanned
pair comparisons and five planned pair comparisons are possible.
Table __. Mean oil content of linseed treated with nitrogen fertilizer at different stages
___________________________________________
Stage of application Oil content (g)
___________________________________________
Seedling 5.10
Early blooming 4.30
Half-blooming 4.00
Full- blooming 6.70
Ripening 6.05
Unfertilized 7.03
___________________________________________
LSD(0.05) 1.72 g
CV (%) 20.7
It is most widely used to make all possible pair comparisons. The procedure for applying
the DMRT is similar to LSD test but it requires progressively larger values for
significance between the treatment means as they are more widely separated in the array.
The test is more appropriate when the total number of treatments is large. It involves the
calculation of the shortest significant difference (SSD).
The SSD is calculated for all possible relative positions (P) between the treatment means
when the means are arranged in order of magnitude (in decreasing or increasing order).
Procedure
Step 1: Arrange all the treatment means in increasing or decreasing order. Data such as
milk & crop yield are usually arranged from the highest to the lowest.
Example: Yields (kg/plot) of wheat varieties grown in 4 by 4 Latin Square Design with
error mean square of 0.45:
B (12.3) A (12.00) C (10.8) D (6.7)
Step 2: Calculate sd (the standard error of the treatment mean difference) as:
2 MSE 2 × 0.45
sd = = sd = = 0.47 kg
r 4
Step 3: Calculate the shortest significant difference (SSD) for relative positions (P) in the
array of means. Since we have four treatment means they can be 2, 3 and 4 distance apart.
B and A are 2 distance apart (P = 2); B and C are 3 distance apart (P = 3); B and D are 4
distance apart (P = 4); A and D are 3 distance apart (P = 3); etc.
For the above example, the R values with error d. f. of 6 at 1% level of significance are
found from R-table.
P= 2 3 4
R0.01 = 5.24 5.51 5.65
SSD = 1.74 1.83 1.88
−
R × sd
SSD (Shortest Significant Difference) =
2
5.24 × 0.47
SSD at P = 2 = = 1.74
2
5.51 × 0.47
SSD at P = 3 = = 1.83
2
5.65 × 0.47
SSD at P = 4 = = 1.88
2
Note that SSD values increase as the distance between treatments (P) to be compared
increases.
Step 4:Test the difference between treatment means in the following order.
− Largest – Smallest = 12.3 – 6.7 = 5.6 compare with SSD value at (P = 4) = 1.88; d
(5.6) > SSD at P = 4 (1.88); thus the difference is significant at 1% level of
significance.
− Largest – 2nd smallest = 12.3 – 10.8 = 1.5 compare with SSD at (P=3) = 1.84; d
(1.5) < SSD at P= 3 (1.84); thus, the difference is non-significant at 1% level of
significance.
− Largest – 2nd largest = 12.3 – 12.0 = 0.3 < SSD at P = 2 (1.75); thus, the difference
is non-significant at 1% level of significance.
− 2nd largest – smallest = 12.0 - 6.7 = 5.3 compared with SSD at P = 3 (1.84); d (5.3)
> SSD (1.84) at P = 3; significant at 1% level of significance
− 2nd smallest – smallest = 10.8 – 6.7 = 4.1 compared with SSD at P = 2 (1.75); d
(4.1) > SSD (1.75) at P = 2; significant
− etc
Note that SSD value at (P=2) is equals to LSD value
________________________________________________________
B (12.3) A (12.0) C (10.8) D (6.7)
_____________________________________________
D (6.7) 5.6** (P=4) 5.3** (P=3) P = 4.1** (P=2) -
ns ns
C (10.8) 1.5 (P=3) 1.2 (P=2) -
A (12.0) 0.3ns (P=2) -
B (12.3) -
________________________________________________________
Treatments B & D, A & D, C & D are significantly different at 1%, while treatments B &
C, B & A, and A & C are not significantly different at 1% level of significance.
Step 5: Present the test result in one of the following two ways.
A. Use a line notation if the sequence of results can be arranged according to their ranks.
B. Use the alphabet notation if the desired sequence of the results is not based on their
rank which is commonly used.
Any two means underscored by the same line are not significantly different at 1% level of
significance according to DMRT.
B(12.3) A(12.0) C(10.8) D(6.7)
The alphabet notation can be derived from line notation simply by assigning the same
alphabet to all treatment means connected by the same horizontal line. It is usual practice
to assign letter a for the first line, b for second line, c for third and so on. Note that letter a
can be for the largest or smallest treatment mean depending on the rank of arrangement.
Table Mean yields of wheat varieties planted at Debrezeit Agricultural Research Center.
It is more conservative than LSD test because it requires the largest treatment mean
differences for significance.
It is computed in a manner similar to the LSD test except standard error of the mean is
used instead of standard error of the mean difference ( sd ).
Example: The following analysis of variance table is from CRD with six varieties
replicated four times in glass house (mean rust incidence)
__________________________________________________________________
Source d. f. MS F-cal. F-table (5%)
____________
Variety (t-1) = 5 2976.44 24.80** 2.77
Error t(r-1) = 18 120.00
___________________________________________________________________
Variety: 1 2 3 4 5 6
Mean stem rust incidence (%): 50.3 69.0 24.0 94.0 75.0 95.3
120
CD = qα × MSE / r = 4.495 × = 24.62%
4
Difference between means
________________________________________________________________________
24.0(3) 50.3(1) 69.0(2) 75(5) 94(4) 95.3(6)
_________________________________________________________
95.3(6) 71.3* 45.0* 26.3* 20.3ns 1.3ns -
94(4) 70.0* 43.7* 25.0* 19.0ns -
ns
75(5) 51.0* 24.7* 6.0 -
69(2) 45.0* 18.7ns -
50.3(1) 26.3* -
24.0(3) -
________________________________________________________________________
Thus, differences between varieties 6&3, 4&3, 5&3, 2&3, etc. are significant while
differences between varieties 2&1, 5&2, etc. are non-significant.
In applying the LSD test and DMRT, it is important that the appropriate standard error of
the mean difference (s d ) should be used. s d is affected by the experimental design
used, the number of replications of the two treatments being compared, and the specific
type of means to be compared.
A). In CRD, RCBD and Latin Square Design where the number of replications for all
treatments is equal, the s d for any pair of treatment means is computed as:
2 MSE
sd =
r
Where, MSE is mean square for error; r = number of replications that is common to all
treatments.
2 × MSE
Thus, lsdα = tα / 2 (n ) where n is error degree freedom.
r
B). When the two treatments do not have the same number of replications in CRD. sd is
computed as:
1 1
sd = MSE +
r r
i j
where MSE is mean square error; ri and rj are the number of replications of the two
treatment means (i & j) to be compared.
1 1
Thus, LSDα= tα / 2 (n) MSE +
r r
i j
Example: CRD with an unequal replications, effect of 4 – types of feedstuff on weight
gain of chicks.
Treatment A = given to 5- chicks (5-replications) = 43.8 g
Treatment B = given to 4 chicks (4 replications) = 73.0 g
Treatment C = given to 3 chicks (3 replications) = 73.33 g
Treatment D = given to 5 chicks (5 replications) = 142.8 g
Given error mean square of 843.1 and error degree of freedom of 13, test if there is
significant difference between treatments B & D.
C). sd for the treatments with a single missing value and that of any other treatment
without missing values.
− 2 t −
a) For RCBD: sd = MSE + . Thus, LSD =t
α α/2 (error d.f.) × s d where
r r(r −1)(t −1)
MSE is mean square of error; t = no. of treatments; r = no. of replications.
− 2 1 −
b) Latin Square Design: sd = MSE + . Thus, LSD α= t α/2 (error d.f.) × s d
r (t −1)(t −2)
where MSE is mean square error of the analysis of variance of Latin Square
Design with a single missing value; r = number of replications.
Covariance analysis can be applied to any number of covariates and to any type of
functional relationships between variables. In this section, however, we will deal with the
case of a single covariate whose relationships to character of primary interest is linear.
The experimental error is reduced and the precision for comparing treatment increased,
e.g. in a cattle feeding experiment to compare the effects of several rations on weight
gain, animals assigned to any one block may vary in initial weight. Now if the initial
weight is correlated with gain in weight, a portion of experimental error for gain can be
the result of differences in initial weight. By covariance analysis, a contribution, which
can be attributed to differences in initial weight, can be computed and eliminated from
experimental error.
The following data show ascorbic acid content of ten varieties of common bean. From the
previous experience it was known that increase in maturity resulted in decrease in vitamin
C content. Since all varieties were not of the same level of maturity on the same day, it
was not possible to harvest all plots at the same stage of maturity. Hence, the percentage
of dry matter based on 100 g of freshly harvested beans was observed as an index of
maturity and used as a covarite.
Ascorbic acid content (ASAC, mg/100 g of seed) and percentage of dry matter (% DM)
for common bean varieties:
___________________________________________________________
Block I Block 2 Block 3
_____________ _____________ ______________
Variety % DM ASAC % DM ASAC % DM ASAC
____________________________________________________________
1 34 93 33 95 35 92
2 40 47 40 51 51 33
3 32 81 30 100 34 72
4 38 67 38 74 40 65
5 25 119 24 128 25 125
6 30 106 29 111 32 99
7 33 106 34 107 35 97
8 34 61 31 83 31 94
9 31 80 30 106 35 77
10 21 149 25 151 23 170
_______________________________________________________________
Conduct the analysis of covariance & calculate standard error of mean difference.
Steps of Analysis
________________________________________________________________________
Source D. F. SS of ASAC (Y) SS of % DM (X) SS of XY
________________________________________________________________________
Block 2 545.3 42.47 -75.23
Treatment 9 25689.0 972.70 -4633.23
Error 18 1608.7 86.20 -251.77
Treatment + Error 27 27297.7 1058.90 -4885.00
________________________________________________________________________
2. Analyse covariance
G × G y 973× 2839
C.F. = x = = 92078.23
r ×t 3 × 10
Total Sum of Products = ∑ xi × yi − C.F . = (34 × 93) + (40 × 47) + ... + (23 × 170) -
92078.23 = 87118-92078.23 = -4960.23
R.E. =
( Error unadjusted MS of Y ) (1608.7/18)
× 100 = ×100
Treat MS of X 972.7/9
( Error adjusted MS of Y )(1 + ) (51.4) × (1 + )
Error SS of X 86.2
89.37
= × 100 = 77.15%
115.84
Thus, the result indicates that the use of % dry matter as the covariate has not increased
precision in ascorbic acid content which would have been obtained had the ANOVA is
done without covariance.
Mean comparison
−
S d to compare two adjusted treatment means:
2 ( xi − xj ) 2
Adjusted Error Mean Square of Y [ + ] where xi & xj are the covariate
r Error SS of X
means of ith and jth treatment; r is the number of replications common to both treatments.
For instance, to compare means of T1 & T2:
2 (43.67 - 34) 2
51.4[ + ] where 34 & 43.67 are the covariate means of 1st and 2nd
3 86.2
treatments; 3 is the number of replications common to both treatments.
−
Sd = 90.02 = 9.49
For regression analysis, it is important to clearly distinguish between the dependent and
independent variables.
Examples:
- Weight gain in animals depends on feed
- Number of growth rings in a tree depends on age of the tree
- Grain yield of maize depends on a fertilizer rate
In the above cases, weight gain, number of growth rings and grain yield are dependent
variables, while feed, age and fertilizer rates are independent variables.
Correlation analysis, on the other hand, provides a measure of the degree of association
between the variables, e.g. the association between height and weight of students; body
weight of cows and milk production; grain yield of maize and thousand kernel weight.
Linear Relationships
The relationship between any two variables (independent and dependent) is linear if the
change in y is constant as x changes through out the range of x under consideration.
When there are more than one independent variables as say k-independent variables (x1,
x2, ………, xk), the simple linear regression equation y = a + βx can be extended to the
multiple linear functional form of:
y = α + β1x1 + β2x2 +……. + βkxk
where α is the y intercept (the value of y when all x’s are 0); β1, β2, ….. βk are partial
regression coefficients associated with the independent variables.
The simple linear regression analysis deals with the estimation and test of significance
concerning the two parameter α and β in the equation:
Y = α + βx
The data required for the application of the simple linear regression are the n-pairs (with
n >2) of y and x values.
b=
∑ xy = 70.8 = 0.27 cm / day ; a = y − bx
∑ x 262
2
This is the estimated linear functional relationship between age (days) and wing length
(cm). Thus, wing length increases by 0.27 cm every day.
( xy )
∑y − ∑ x
2
2
S2 yx =
∑ 2
= 19.66 −
(70.80)2 / 13 − 2 =0.05
n−2 262.00
The residual mean square denotes the variance of y after taking into account the
dependence of y on x.
Step 4: Compare the calculated tb value with tabulated t-value at α/2 level of
significance, at n-2 (13-2) = 11 d.f.; where n is pair of observations.
The simple linear correlation analysis deals with the estimation and test of significance of
the simple linear correlation coefficient (r), which is a measure of the degree of linear
association between two variables x and y (there is no need to have a dependent and
independent variable).
The value of r lies within the range of –1 to +1, with extreme values indicating the perfect
linear association and the mid-value of 0 indicates no-linear association between the two
variables. The value of r is negative when a positive change in one variable is associated
with a negative change in another and positive when the values of two variables change
in the same direction (increase or decrease).
Even though the zero r value indicates the absence of linear association between two
variables, it does not indicate the absence of association between them. It is possible for
the two variables to have a non-linear association such as quadratic form. The procedure
for the estimation and test of significance of a simple linear correlation coefficient
between two variables x and y are:
Step 1: Compute the means ( x, y ), the sum of square of the deviates ( ∑ x 2 , and ∑y 2
),
and the sum of the cross product of deviates (∑ xy ) of the two variables.
Step 2: Compute the simple linear correlation coefficient for the above example as:
r=
∑ xy =
70.8
=
70.80
= 0.98
(∑ x ) (∑ y
2 2
) 262 × 19.66 71.77
Step 3: Test the significance of the simple linear correlation coefficient (r) by comparing
the computed r-value with the tabulated r-value at n-2 d.f. The simple linear correlation
coefficient (r) is declared significant at α level of significance if the absolute value of the
computed r-value > the corresponding tabulated r-value.
The simple linear regression and correlation analysis is applicable only in cases with one
independent variable. However, in many situations Y may be dependent on more than
one independent variables. Linear regression analysis involving more than one
independent variables is called multiple linear regression. The relationship of the
dependent variable Y to the K independent variables X1, X2, ... Xk can be expressed as:
The data required for the application of multiple linear regression analysis involving K
independent variables are (n) (k + 1) observations, where n is number of pairs (n> 2).
Linear regression involving two independent variables can be expressed as: Y = α + β1X1
+ β2X2 where β1 & β2 are partial regression coefficients. β1 measures a change in Y for
unit change in X1, if X2 is held constant. Similarly, β2 measures the rate of change in Y
for a unit change in X2 where X1 is held constant. α (sometimes designated as β0) is the
value of Y when both X1 & X2 are zero.
Example: The following data show the weight gain, initial body weight & age of five
chicks fed with certain type of rations for a month.
X1 X2 Y y2 x1 2 x2 2 x1 y x2 y x1x2
5 10 5 1.96 0 19.36 0 6.16 0
5 15 6 0.16 0 0.36 0 -0.24 0
5 12 7 0.36 0 5.76 0 -1.44 0
4 15 8 2.56 1 0.36 -1.6 0.96 -0.6
6 20 6 0.16 1 31.36 -0.4 -2.24 5.6
Sum 25 72 32 5.2 2 57.2 -2 3.2 5
Mean 5 14.4 6.4
(∑ x1 )(∑ x2 y ) − (∑ x1 x2 )(∑ x1 y )
2
(2 × 3.2) − (5 × −2) 16.4
b2 = ; = = 0.18
(∑ x1 )(∑ x2 ) − (∑ x1 x2 ) 2 (2 × 57.2) − (5) 2
2 2
89.4
− − −
a = Y – b1 x1 -b2 x2
Thus, the estimated multiple linear regression equation for initial age (days) and initial
body weight (g) with weight gain (g) is: Ŷ= 11.1 - 1.46 X1 + 0.18 X2 for 4 ≤ X1 ≤ 6; and
10 ≤ X2 ≤20.
Step 4: Compute:
k
The sum of squares due to regression (SSR) = ∑ (b )(∑ x y)
i =1
i i
Residual (error) SS = ∑y 2
− SSR = 5.2 − 3.496 = 1.704
SSR 3.496
Coefficient of determination (R2) = = = 0.67
∑y 2
5 .2
The larger the R2 value, the more important the regression equation in characterizing Y.
On the other hand, if the value of R2 is low, even if the F-test is significant, the estimated
linear regression equation may not be useful. For example an R2 value of 0.26, even if
significant indicates that only 26% of the total variation in the dependent variable (Y) is
explained by the linear function of the independent variables considered.