0% found this document useful (0 votes)
41 views76 pages

Biometry

This document provides definitions and a brief history of statistics. It defines statistics as the study of numerical data based on variation in nature, involving the collection, classification, analysis and interpretation of quantitative data. The key terms defined include data, variables, variate, population and sample. Population refers to all possible observations of interest, while a sample is a subset selected from the population. The document traces the early history of statistics from its origins assisting rulers with taxation and census data, to developments in probability theory and normal distribution curves. It outlines contributions from important figures like Darwin, Mendel, Pearson and Fisher that advanced the field of statistics and its applications.

Uploaded by

Hulushum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views76 pages

Biometry

This document provides definitions and a brief history of statistics. It defines statistics as the study of numerical data based on variation in nature, involving the collection, classification, analysis and interpretation of quantitative data. The key terms defined include data, variables, variate, population and sample. Population refers to all possible observations of interest, while a sample is a subset selected from the population. The document traces the early history of statistics from its origins assisting rulers with taxation and census data, to developments in probability theory and normal distribution curves. It outlines contributions from important figures like Darwin, Mendel, Pearson and Fisher that advanced the field of statistics and its applications.

Uploaded by

Hulushum
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

1.

INTRODUCTION

1.1 Definition and Brief History of Statistics

In biology, including agricultural sciences, “the laws of nature” are not that simple.
Biological phenomena often show variation that obscures the law we want to establish.
For instance, if we treat two fields in identical way, the yields obtained from the same
variety of a crop will not be the same. Similarly, two cows of the same age, breed and
body weight fed with the same type and amount of fodder will produce different amount
of milk. Thus, variation is a typical feature of biological data.

The extent of statistics makes it difficult to define. It was developed to deal with
problems in which, for the individual observations, laws of cause and effect are not
apparent to the observer and where an objective approach is needed. In such problems,
there must always be some uncertainty about any inference based on a limited number of
observations.

Statistics is defined as study of numerical data based on variation in nature. It is a


science, which deals with collection, classification, tabulation, summary and analysis of
quantitative data or numerical facts. Statistics is the science of creating, developing, and
applying techniques such that the uncertainty of inductive inferences may be evaluated.

The word statistics also refers to numerical and quantitative data such as statistics of
births, deaths, marriage, production, yield, etc. The application of statistical methods to
the solution of biological problems is biometry, biological statistics or bio-statistics. The
word biometry comes from two Greek roots: 'bios' mean life and metron mean 'to
measure'. Thus, biometry literally means the measurement of life.

A history of statistics throws considerable light on the nature of twenty century statistics.
Historical perspective is also important in pointing to the needs and pressures, which
created it.

The term statistics is an old one. Statistics must have started as a state arithmetic
technique to assist a ruler who needed to know the wealth and number of his subjects in
order to levy a tax or wage a war. Presumably all cultures that intentionally record history
also recorded statistics. We know that Caesar Augustus sent out a decree that the entire
world should be taxed. Consequently, he required that all persons report to the nearest
statistician-in that day the tax collector. One result of this was that Jesus was born in
Bethlehem rather than Nazareth. William, the conqueror ordered a survey of the lands of
England for purposes of taxation and military service. This was called the Domesday
Book.

Several centuries after the Domesday Book, we find an application of empirical


probability in ship insurance, which seems to have been available to Flemish shipping in
the fourteenth century. This can be little more than speculation or gambling, but it
developed in to the very respectable form of statistics called insurance. Gambling, in the
form of games of chance, led to the theory of probability originated by Pascal and
Fermat, about the middle of the seventeenth century, because of their interest in the
gambling experiences of the Chevalier de Mere. To the statistician and the experimental
scientist, the theory contains much of practical use for the processing of data.

The normal curve or normal curve of error has been very important in the development of
statistics. The equation of this curve was first published in 1733 by de Moivre. De
Moivre had no idea of applying his result to experimental observations and his paper
remained unknown until Karl Pearson found it in a library in 1924. However, the same
result was later developed by two mathematical astronomers, Laplace (1749-1827) and
Gauss (1777-1855) independently of one another.

Charles Darwin (1809-1882), a biologist, received the second volume of Lyell’s book
while on the Beagle. Darwin formed his theories later and he may have been stimulated
by his reading of this book. Darwin’s work was largely biometrical or statistical in nature
and he certainly renewed enthusiasm in biology. Gregor Mendel (1822-1884) too, with
his studies of plant hybrids published in 1866, had a biometrical or statistical problem.

In the nineteenth century, the need for a sounder basis for statistics became apparent. Karl
Pearson (1857-1936) initially a mathematical physicist applied his mathematics to
evolution as a result of the enthusiasm in biology created by Darwin. Pearson spent
nearly half a century in serious statistical research. In addition, he founded the journal
Biometrica and a school of statistics as result the study of statistics gained impetus.

While Pearson was concerned with large samples, large-sample theory was proved to be
somewhat inadequate for experimenters with necessarily small samples. Among these
was W. S. Gosset (1876-1937), a student of Karl Pearson and a scientist of the Guinness
firm of brewers. Gosset’s mathematics appears to have been insufficient to the task of
finding exact distributions of the sample standard deviation, of the ratio of the sample
mean to the sample standard deviation, and of the correlation coefficient, statistics with
which he was particularly concerned. Consequently, he resorted to drawing shuffled
cards, computing and compiling empirical frequency distributions. Papers on the results
appeared in Biometrika in 1908 under the name of student, Gosset’s pseudonym. Today
Student’s t is a basic tool of statisticians and experimenters. Now that the use of Student’s
t distribution is so widespread, it is interesting to note that the German astronomer,
Helmert, had obtained it mathematically as early as 1875.

R.A. Fisher (1890-1962) was influenced by Karl Pearson and Student and made
numerous and important contributions to statistics. He and his students gave considerable
impetus to the use of statistical procedures in many fields, particularly in agriculture,
biology and genetics.

In this brief history, Abrahm Wald (1902-1950) has contributed two books on Sequential
Analysis and Statistical Decision Functions. Thus, it is in the 20th century that most of the
statistical methods presently used have been developed.
Currently, statistics is used as an analytical tool in many fields of research.

1.2 Definition of Some Basic Terms

Data: Qualitative or quantitative information taken on a certain character. For example,


data of height, weight, color, etc.

Variable: A property with respect to which individuals in a sample differ in some way,
e.g. length, weight, height, color, etc. Characteristics, which show variations, are called
random variables.

Variables can be:


1. Measurement variables: Variables that can be expressed in a numerical order. They
are of two types:
a) Continuous variables: Variables which can assume an infinite number of values
between any two fixed points, e.g. height, grain yield, score of students, etc.
b) Discontinuous/Meristic/Discrete variables: Variables that take any certain fixed
numerical values with no intermediate values in between, e.g. number of seeds per
pod, number of plants in a quadrat, number of students taking the course biometry,
etc.
2. Ranked variables: Some variables cannot be measured but at least can be ordered or
ranked by their magnitude, e.g. disease score; no infection(0), mild infection(1), high
infection(2), severe infection(3)
Here we cannot say that the difference between 1 and 2 is identical or equal to the
difference between 2 and 3. Such assumption is made for the measurement variables.

3. Attribute or nominal variables: Variables that cannot be measured but are


expressed qualitatively, e.g. color of common bean seed: white, black, red; sex of
animals could be male or female, blood group of humans could be A, AB, O, etc.
When such attribute data are combined with frequency (number of occurrences), they can
be treated statistically. Such data are called enumeration data. Suppose we have 18 mixed
bean seeds, they can be grouped as: black (3), white (5) and red (10).

Variate (datum): A single reading, score or observation of a given variable. If we


measure height of 5 plants from a plot, each of the 5 readings of height will be a variate:
10 cm, 5 cm, 20 cm, 25 cm, 15 cm.

Population and sample


Biologists usually wish to make inferences (draw conclusions) about a population, which
is defined as the collection of all the possible observations of interest. Sample is a subset
from the population selected by a specific procedure. In an experiment, we can rarely
include the whole population because it is costly, time consuming and sometimes
impossible to determine. The idea is then to use the sample data to make statements
(decisions) about the population. Thus, collection of observations we take from the
population is called a sample and the number of observations in the sample is called the
sample size (usually indicated by the symbol n).

The basic method of collecting the observations in a sample is called simple random
sampling. This is where any observation has the same probability of being collected, e.g.
giving each student in a class equal chance in measuring height. The aim is always to
sample in a manner that does not create a bias in favour of any observation being
selected. Nearly all applied statistical procedures that are concerned with using samples
to make inferences (i.e. draw conclusions) about populations assume some form of
random sampling. If the sampling is not random, then we are never sure as to what
population is represented by our sample. When random sampling from clearly defined
populations is not possible, then interpretation of standard methods of estimation
becomes more difficult.

Populations must be defined at the start of any study and this definition should include the
spatial and temporal limits to the population. Our formal statistical inference is restricted
to these limits. For example, if we sample from a population of animals at a certain
location in December 2010, then our inference is restricted to that location in December
2010. We cannot infer what the population might be like at any other time or in any other
place, although we can speculate or make predictions.

Parameter: A population value, which we generally do not know, but would like to infer
(estimate) about. For example, national average yield of maize in 2010 in Ethiopia.
Parameters are designated using Greek letters such as µ, ∑, π, etc.

Statistics: Are sample estimates of population value (parameters) and designated using

Latin letters such as x , s, p, etc.

The population parameters cannot be measured directly because the populations are
usually too large, i.e. they contain too many observations for practical measurement. It is
important to remember that population parameters are usually considered to be fixed, but
unknown, values so they are not random variables and do not have probability
distributions. Sample statistics are random variables, because their values depend on the
outcome of the sampling experiment, and therefore they do have probability distributions,
called sampling distributions.
1.3 Types of Statistics

1. Descriptive (deductive) statistics: Are methods, which are used to describe a set of
data without involving generalization. Deal with the presentation of research data or
any numerical information. Help in summarizing and organizing data so as to make
them readable for users, e.g. mean, median, mode, standard deviation, etc.

2. Inferential (inductive) statistics: It is a statistics, which helps, in drawing


conclusion about the whole (population) based on data from some of its parts
(samples).

6. PRINCIPLES OF EXPERIMENTAL DESIGN

6.1 Introduction

In research, a scientist identifies solution to problems through experimentation. Research


can be broadly defined as systematic investigation in to a subject to discover new facts or
principles or to confirm or deny the results of previous finding. Such investigation will
help in decision making such as recommending a new procedure, a new fertilizer rate, a
new pesticide, etc.

The procedure for research is generally known as the scientific method, which, although
difficult to define precisely, it usually involves, the following steps:
a. Formulation of hypothesis: a tentative explanation or solution
b. Planning an experiment to objectively test the hypothesis
c. Careful observation and collection of the data
d. Analysis and interpretation of the experimental results.

Experiment is an important tool of research. Some important characteristics of well-


planned experiments are:
a. Simplicity: The selection of treatments and the experimental arrangement should be
as simple as possible, and consistent with the objectives of the experiment.
b. Degree of precision: The probability should be high so that the experiment will be
able to measure differences with the degree of precision the experimenter desires.
This implies an appropriate design and sufficient replication.
c. Absence of systematic error: The experiment must be planned to ensure that
experimental units receiving one treatment in no systematic way differ from those
receiving another treatment so that an unbiased estimate of each treatment effect
can be obtained.
d. Range of validity of conclusions: Conclusions should have as wide range of
validity as possible. An experiment replicated in time and space would increase the
range of validity of the conclusions that could be drawn from it. A factorial set of
treatments is another way for increasing the range of validity of an experiment. In a
factorial experiment, the effects of one factor are evaluated under varying levels of
a second factor.
e. Calculation of degree of uncertainty: In any experiment there, is always some
degree of uncertainty as to the validity of the conclusions. The experiment should
be designed so that it is possible to calculate the probability of obtaining the
observed results by chance alone.

6.2 Design of Experiments

The term refers to five interrelated activities required in the investigation. These are:
a. Formulating statistical hypothesis and making plans for laying out, collection and
analysis of data
b. Stating the decision rules to be followed in testing statistical hypothesis
c. Collecting data according to plan
d. Analyzing data according to plan
e. Making decisions based on decision rules

Purposes of experimental designs:


a. To provide estimates of a treatment effects or differences among treatment effects
b. To provide an efficient way of testing hypothesis about the response to treatments
c. To assess the reliability of estimates and assumptions
d. To estimate the variability of the experimental material
e. To increase precision by eliminating extraneous external source of variation from
the comparisons of interest
f. To provide a systematic, and efficient pattern of conducting an experiment

6.3 Concepts Commonly Used in Experimental Design

Treatment: It is an amount of material or a method that is to be tested in the experiment


such as crop varieties, insecticides, feedstuffs, fertilizer rates, method of land preparation,
irrigation frequency, etc.

Experimental unit: It is an object on which the treatment is applied to observe an effect,


e.g. cows, plot of land, petri-dishes, pots, etc

In the study of the effect of different rations on milk production, ration is a treatment and
animal is the experimental unit; while in the study of different fertilizer rates on yield of
maize, the fertilizer rates are treatment and plot of land is experimental unit.

Experimental error: It is a measure of the variation, which exists among observations


on experimental units treated alike. Variation generally comes from two main sources:
1. Inherent variability that exists in the experimental material to which treatments
are applied.
2. Lack of uniformity in the physical conduct of an experiment or failure to
standardize the experimental techniques such as lack of accuracy in measurement,
recording data on different days, etc.

Therefore, every possible effort should be made to reduce the experimental error.

Methods aimed at reducing the experimental error


a) Increase the size of experiment either through provision of more replicates or by
inclusion of additional treatments
b) Refine the experimental technique
- have uniformity in the application of treatments such as equally spreading of
fertilizers, recording data on the same day, etc.
- control should be done over external influences so that all treatments produce their
effects under comparable conditions, e.g. protect against diseases, insects, etc. as their
effects are not uniform on all plots.
c) Blocking: Dividing the field into several homogenous parts. Blocks are the levels at
which we hold an extraneous factor fixed, so that we can measure its contribution to the
total variability of the data by means of analysis of variance.

Replication: A situation where a treatment appears more than once in an experiment, it is


said to be replicated. The functions of replication are:
a) It provides an estimate of experimental error because it provides several observations
on experimental units receiving the same treatment. For an experiment on which each
treatment appears only once, no estimate of experimental error is possible and when
there is no method of estimating the experimental error, there is no way to determine
whether observed differences indicate the real differences or due to inherent
variability.
b) It improves the precision of an experiment: As the number of replicates increase, the
estimates of population means as observed treatment means become closer to the true
value.
c) It increases the scope of inference and conclusion of the experiments: Field
experiments are normally repeated over years and locations because conditions vary
from year to year and location to location. The purpose of replication in space and
time is to increase the scope of inference. The results of an experiment are applicable
only to conditions that are similar to that condition.

Factors determining the number of replications are:


a) The degree of precision required: the higher the precision desired the greater is the
number of replicates (the smaller the departure from null hypothesis to be measured
or detected, the greater the number of replicates)
b) Variability of experimental units: certain experimental materials are more variable
than others. For the same precision, less replication is required on uniform
experimental units (soils) than on variable experimental unit (soils).
c) The number of treatments: more number of replications are needed with few
treatments than with many treatments
d) The type of experimental designs also affect the precision of an experiment and the
required number of replications, e.g in Latin Square Design the number of
replications should be equal to the number of treatments while in balanced lattice
design the number of replications is one more that the square root of the number of
treatments, i.e. r = t + 1 where r is number of replications and t is number of
treatments.
e) Fund and time availability also determines the number of replicates: If there are
adequate fund and time, more replications can be used.

Usually three replications are taken as the minimum number for standard experiments

Identifying the Number of Replications

Two procedures for calculating the number of replications are described below:

Procedure 1: This method takes into consideration the number of variability of


experimental material and field, which is measured in terms of coefficient of variation
(CV) and standard error of means (SEM). To calculate the number of replications we can
use the following formula:

N = (CV/SEM)2

Where N = Number of replications; CV = Coefficient of variation; SEM = Standard Error


of Means.

Example: CV (%) = 12; SEM ± = 6; N = (12/6)2 = 4

Procedure 2: If CV & SEM are not known, then the number of replications can be
arrived at using the principle that the precision of treatment comparisons increases if the
experimental error is kept to minimum. The experimental error can be kept to the
minimum by providing more degrees of freedom for the experimental error. In other
words, a lower number of degrees of freedom for experimental error results in enlarged
experimental error. Based on this principle, the number of degrees of freedom for error
should not be less than 15 (not less than 10 in any case).

When ‘t’ treatments are replicated ‘r’ times, the error is based on (t-1) (r-1) degrees of
freedom in Randomized Complete Block Design (RCBD) and t(r-1) in Completely
Randomized Design (CRD), which should not be less than 15.

No. of treatments: 2 3 4 5 6
Minimum no. of replications (CRD): 9 6 5 4 4
Minimum no. of replications (RCBD): 16 9 6 5 4

Increasing either number of replications or plot size can improve precision, but the
improvement achieved by doubling plot size is almost always less than the improvement
achieved by doubling replications.
Randomization
Assigning the treatments to the experimental units in such away that any unit has equal
chance to receive any treatment, i.e. every treatment should have an equal chance of
being assigned to any experimental units. Thus, a particular treatment should not be
consistently favored or disfavored.

Purposes of randomization
a) To eliminate bias: randomization ensures that no treatment is favored or
discriminated against the systematic assignment to units in a design
b) To ensure independence among the observations. This is necessary to provide valid
significance tests.

Randomization is usually done by using tables of random numbers or by drawing cards or


lots.

Confounding
It occurs when the differences due to experimental treatments, i.e. the contrast specified
in your hypothesis, cannot be separated from other factors that might be causing the
observed differences. Example, if you wished to test the effect of a particular hormone on
some behavioral response of sheep. You create two groups of sheep, males and females,
and inject the hormone into the male sheep and leave the females as the control group.
Even if other aspects of the design are ok, differences between the means of the two
groups cannot be definitely attributed to effects of the hormone alone. The two groups are
also different in gender and this may also be, at least partly, determining the behavioral
responses of the sheep. In this example, the effects of hormone are confounded with the
effects of gender.

Controls
A control is a part of the experiment that is not affected by the factor or factors studied,
but otherwise encounter exactly the same circumstances as the experimental units treated
with the investigated factor(s). For example, when investigating the effect of spraying
micronutrients, the crop being in the control should also be sprayed with the same
amount of water except the micro nutrients. The reason for this way to work is to make
sure that it is only the effect of the substance of interest that is investigated. Otherwise, it
may not be possible to draw conclusions of the reason(s) to the outcome of the
experiment.

Local Control
The principle of local control is another important principle of experimental designs.
Under it the extraneous factor, the known source of variability, is made to vary
deliberately over as wide a range as necessary and this needs to be done in such a way
that the variability it causes can be measured and hence eliminated from the experimental
error. In other words, according to the principle of local control, we first divide the field
into several homogeneous parts, known as blocks, and then each such block is divided
into parts equal to the number of treatments. Then the treatments are randomly assigned
to these parts of a block. Dividing the field into several homogenous parts is known as
‘blocking’. In general, blocks are the levels at which we hold an extraneous factor fixed,
so that we can measure its contribution to the total variability of the data by means of
analysis of variance. In brief, through the principle of local control we can eliminate the
variability due to extraneous factor(s) from the experimental error.

Degrees of Freedom (D.F.)


Is the number of observations in a sample that are “free to vary” after knowing the mean
when determining the variance. After determined the mean, then only n-1 observations
are free to vary because knowing the mean and n-1 observations, the last observation is
fixed. A simple example – say we have a sample of observations, with values 3, 4 and 5.
We know the sample mean (4) and we wish to estimate the variance. Knowing the mean
and one of the observations does not tell us what the other two must be. But if we know
the mean and two of the observations (e.g. 3 and 4), the final observation is fixed (it must
be 5). So, knowing the mean, only two observations (n-1) are free to vary. As a general
rule, the d.f. is the number of observations minus the number of parameters included in
the formula for the variance.

6.4 Analysis of Variance

Analysis of variance was first introduced by R.A Fisher in 1930s. It is defined as an


arithmetic technique whereby the total variation in a set of data is partitioned into several
components.

Analysis of variance is used in all fields of research where data are quantitatively
measured and it is used: to estimate and test about population variance and to estimate
and test about population means

6.4.1 General procedures in analysis of variance


− State the model: symbolic representation that describes data under consideration
− State the hypothesis
− Make calculations
− Construct analysis of variance table, which is used to summarize the calculations in
table form for quick assessment of the results
− Make decision: decide either to accept or reject the hypothesis

6.4.2 General assumptions underlying the analysis of variance


I. The treatment and replication effects should be additive
The effect of treatment for all replications and the effect of replication for all treatments
should remain constant.
A hypothetical set of data with additive and multiplicative effect of treatments and
replications.

Additive effect

Treatment Replication Replication effect


I II (I-II)
A 180 120 60
B 160 100 60
Treatment effect (A-B) 20 20

Note that the effect of treatments is constant over replications and the effect of
replications is constant over treatments.

Multiplicative effect
Replication Replication Log10 Treatment
effect effect
Treatment I II (II-I) I II
A 10 20 10 1.00 1.30 0.30
B 30 60 30 1.48 1.78 0.30
Treatment effect 20 40 0.48 0.48
(B-A)

Here, the treatment effect is not constant over replications and the effect of replications is
not constant over the treatments.

The multiplicative effects are often encountered in experiments designed to evaluate the
incidence of diseases and insects. This happens because the changes in insect and disease
incidence usually follow a pattern that is in multiple of the initial incidence. When effects
are multiplicative, the logarithmic transformations of the data show the additive effect.
Thus, in such cases conduct the analysis and mean separation using the transformed data,
and in tables present transformed means in parenthesis alongside their back transformed
values out of parenthesis.

II. Experimental errors (residuals) must be independent


The experimental error of one treatment should not be related to or dependent upon that
of another treatment. One example, where residuals may be dependent is in the study of
pesticides: a high level of pesticide on one plot of land may spread to adjacent plots,
thereby changing the conditions on the following plots. The assumption of independence
of errors is usually assured by the use of proper randomization, i. e treatments are applied
(assigned) at random to experimental units. However, in systematic designs where the
treatments are assigned systematically instead of randomly, the assumption of
independence of errors is usually violated. The simplest way to detect independence of
error is to check the experimental layout if it is random or not.

III. Experimental errors (variances) must be homogeneous (homoscedasticity)


The variances of the treatments must be the same. When some treatments have errors that
are exceptionally higher or lower it is called heteroscedasticity. First type of
heterogeneity of variances is usually associated with data whose distribution is not
normal. Count data, such as the number of infected plants per plot usually follow a

poisson distribution where the variance equals to the mean ( s 2 = x ).

Heterogeneity of variances occurs usually when some treatments have errors that are
exceptionally higher or lower than others.

IV. Experimental errors are normally distributed


It can be assumed that the observations within each ‘group’ or treatment combinations
come from a normally distributed population.

Data such as number of infested plants per plot usually follow poisson distribution and
data such as percent survival of insects or percent plants infected with a disease assume
the binomial distribution.

Calculation of the residuals:

Data of four treatments tested in 5 blocks

Table of observed Y values


________________________________________________________________________
Block
Drug 1 2 3 4 5 Mean Treat. effect
________________________________________________________________________
A 7.1 6.1 6.9 5.6 6.4 6.42 0.455
B 6.7 5.1 5.9 5.1 5.8 5.72 -0.245
C 7.1 5.8 6.2 5.0 6.2 6.06 0.095
D 6.7 5.4 5.7 5.2 5.3 5.66 -0.305
Mean 6.9 5.6 6.175 5.225 5.925 5.965

Block effect 0.935 -0.365 0.210 -0.740 -0.040


________________________________________________________________________

Treatment effect = Individual treatment mean - Grand mean; e.g. treatment A effect =
6.42-5.965 = 0.455, so on.
Block Effect = Individual block mean - Grand mean; e.g. Block 2 effect = 5.6-5.965 = -
0.365, etc.
Estimated Y value = Grand mean + treatment effect + block effect, e.g. estimated value
for block 1 & treatment 1 = 5.965 + 0.455 + 0.935 = 7.355. Similarly calculate for the
other treatment combinations.

Table of Estimated Y values


________________________________________________________________________
Block
Drug 1 2 3 4 5
________________________________________________________________________
A 7.355 6.055 6.630 5.680 6.380
B 6.655 5.355 5.930 4.980 5.680
C 6.995 5.695 6.270 5.320 6.020
D 6.595 5.295 5.870 4.920 5.620
________________________________________________________________________

Residuals = Observed Y - Expected Y, e.g. residual for treatment C in block 2 = 5.8-


5.695 = 0.105. Similarly calculate for the other treatment combinations.

Table of residuals
____________________________________________________________________
Block
Drug 1 2 3 4 5
_____________________________________________________________________
A -0.255 0.045 0.270 -0.080 0.020
B 0.045 -0.255 -0.030 0.120 0.120
C 0.105 0.105 -0.070 -0.320 0.180
D 0.105 0.105 -0.170 0.280 -0.320
_____________________________________________________________________

Plot the residuals from the lowest to the highest with its corresponding frequency and see
its distribution if it is normally distributed or not. If it is not normally distributed, data
transformation may be needed.

If the functional relationships (poisson or binominal) of data are known, a transformation


can be done that will make the errors nearly normal and analysis of variance can be done
on transformed data.

In many cases, moderate departures from normality do not have serious effect on the
validity of the results. Most of experimental data in Agricultural Research satisfy the
above assumptions. However, if the data violate the basic assumptions the following
measures can be taken: consider deleting an outlier, transform the data using appropriate
transformation method and use of an appropriate non-parametric method of analysis
6.5 Fixed and random effects models

In general, there are two types of models in ANOVA: Fixed effects models & Random
Effects Models. If the experiment were to be repeated the same treatments would be
included and the goal is to estimate the treatment means and the mean difference. In such
a situation the model is called a fixed effect model. In other situations, the treatments in a
particular experiment may be a random sample from a large population of similar
treatments. The goal here is to estimate the variation among the treatment means and we
are not interested in the means themselves. If the experiment were to be repeated a
different sample of treatments would be included. In this situation the model is called a
random effect model.

Examples:
− Fixed: A scientist develops three new fungicides. His interest is in these fungicides
only. Random: A scientist is interested in the way a fungicide works. He selects at
random three fungicides from a group of similar fungicides to study the action.
− Fixed: Measure the rate of production of five particular machines. Random: Choose
five machines to represent machines as a class.
− Fixed: Conduct an experiment to obtain information about four specific soil types.
Random: Select at random four soil types to represent all soil types.

Random effects models are more common in sample surveys while in designed
experiments, the treatment effects are fixed.

7. EXPERIMENTAL DESIGNS FOR SINGLE FACTOR EXPERIMENTS

7.1 Completely Randomized Design (CRD)

7.1.1 Uses, advantages and disadvantages

In CRD, the treatments are assigned completely at random over the whole experimental
area so that each experimental unit has the same chance of receiving any one treatment.
In CRD, any difference among the experimental units (plots) receiving the same
treatment is considered as experimental error.

Uses:
1. It is useful when the experimental units (plots) are essentially homogeneous and
where environmental effects are relatively easy to control, e.g. laboratory and
greenhouse experiments. For field experiments where there is generally larger
variation among experimental plots like in soil fertility, slope, etc. the CRD is
rarely used.
2. It is useful if we suspect that large fraction of the units may not respond or may be
lost during the experiment because it is easy to handle missing data in Analysis of
Variance unlike in other designs.
3. It is useful for experiments in which the total number of experimental units is
limited, because it provides maximum degrees of freedom for error

Advantages
1. It is flexible in that the number of treatments and replications can vary, i.e. the
number of replications need not be the same from one treatment to another
2. The statistical analysis is simple even with unequal replications and it is not
complicated by loss of data or missing observations
3. Loss of information due to missing data is small as compared to other designs
4. The design provides the maximum degree of freedom for estimating the
experimental error. This improves the precision of the experiment and is
important with small experiments where degrees of freedom for experimental
error are less than 20.

Disadvantage:
The main objection to the CRD is that it is often inefficient as there is no way of
controlling the experimental error. Since randomization is unrestricted the experimental
error includes the entire variation over the experimental units except that due to
treatment.

7.1.2 Randomization and layout

In this design, treatments are assigned to the experimental units completely at random.
Assume that we want to do a pot-experiment on the effect of inoculation of 6-strains of
rhizobia on nodulation of common bean using five replications.

Randomization can be done by using either lottery method or table of random numbers.

A. Lottery Method
1. Arrange 30 pots of equal size filled with the same type of soil and assign numbers from
1 to 30 in convenient order.
2. Obtain 30 identical slips of paper, label 5 of them with treatment A, 5 of them with
treatment B, with C, with D, with E and with F (6 treatments × 5 replications). Place
the slips in box or hat, mix thoroughly and pick a piece of paper at random, the
treatment labeled on this paper is assigned to unit 1 (pot 1), without returning the first
slip to box, select another slip and the treatment named on this slip is assigned to unit
2(pot 2) and continue this way until all 30 slips of paper have been drawn.
B. Use of Table of Random Numbers
1. Arrange 30 pots of equal size filled with the same type of soil and assign numbers
from 1 to 30 in convenient order.
2. Locate starting point in table of random numbers by closing your eyes and pointing a
finger to any position of random number
3. Moving up to down or right to left from the staring, record the first 30 three digit
random numbers in sequence (avoid ties). Rank the random numbers from the smallest
(1) to the largest (30). The ranks will represent the pot numbers and assign treatment A
to the first five pot numbers, B to the next five pot numbers, etc.

Example: Three treatments each replicated four times in CRD

Raw data of nitrogen content of common bean inoculated with 6 rhizobium strains.
Treatments are designated with letters and nitrogen content (mg) in parenthesis.

1A 2E 3C 4B 5A 6D
(19.4) (14.3) (17.0) (17.7) (32.6) (20.7)
12 B 11C 10A 9E 8D 7E
(24.8) (19.4) (27) (11.8) (21.0) (14.4)
13 F 14A 15F 16D 17B 18.C
(17.3) (32.1) (19.4) (20.5) (27.9) (9.1)
24D 23F 22B 21E 20C 19D
(18.6) (19.1) (25.2) (11.6) (11.9) (18.8)
25C 26E 27B 28F 29A 30F
(15.8) (14.2) (24.3) (16.9) (33.0) (20.8)

7.1.3 Analysis of variance of CRD with equal replication

1. State the model


The model for CRD expressing the relationship between the response to the treatment
and the effect of other factors unaccounted for:
Yi j = µ + τ i + ε ij
where, Yij = the jth observation on the ith treatment; µ = General mean; τi = Effect of
treatment i; and ε ij = Experimental error (effect due to chance)

2. Arrange the data by treatments and calculate the treatment totals (Ti) and Grand total
(G)

Nitrogen content of common bean inoculated with 6 rhizobium strains (mg)


___________________________________________________
Treatments (R-strains) Reps Treatment
total (Ti)
___________________________________________________
3 Dok 1 19.4 32.6 27.0 32.1 33.0 144.1
3 Dok 5 17.7 24.8 27.9 25.2 24.3 119.9
3 Dok 4 17.0 19.4 9.1 11.9 15.8 73.2
3 Dok 7 20.7 21.0 20.5 18.8 18.6 99.6
3 Dok 15 14.3 14.4 11.8 11.6 14.2 66.3
Composite 17.3 19.4 19.1 16.9 20.8 93.5
___________________________________________________
Grand total (G) 596.6

3. Using Yij = jth observation on the ith treatment; Ti= Treatment total; n = (r x t), the total
number of experimental unit (pots), calculate the correction factor and the various sum of
squares
G2 (596.6) 2
− C.F. = = = 11864.38
r ×t 5× 6
− Total Sum of Squares (TSS) = ΣΣyij2 – C.F. (Sum of the square of all
observations- C.F.)
= [(19.4)2 + (17.7)2 +…… + (20.8)2] - 11864.38 = 12994.36–11864.38=1129.98

− Treatment Sum of Squares (SST) =


∑ Ti
2

− C .F .
r
(144.1) 2 + (119.9) 2 + ..... + (93.5) 2
= - 11864.38=847.05
5
− Error Sum of Squares (SSE) = Total SS- Treatment SS= 1129.98-847.05= 282.93
In CRD, treatment sum of squares are usually called between or among groups sum of
squares while the sum of squares among individuals treated alike is called within group
or error sum of squares.

4. Calculate the mean squares (MS) for treatment and error by dividing each sum of
squares by the corresponding degree of freedom
SST 847.05
− Treatment MS = = = 169.41
t −1 (6 − 1)
Error SS 282.93
− Error MS = = = 11.79
t (r − 1) 6(5 − 1)
5. Calculate F-value for testing significance of treatment effects
Treatment MS 169.41
− F-calculated = = = 14.37
Error MS 11.79
6. Obtain the tabulated F-value using treatment degree of freedom (d. f.) as numerator
(n1) and error d. f. as denominator (n2) at 5% and 1% level of significance
F (5, 24) at 5% = 2.60; F (5, 24) at 1% = 3.90
7. Summarize all the values computed on the above steps in the ANOVA- table for quick
assessment of results.
________________________________________________________________________
Source of DF SS MS Computed F Table F
Variation 5% 1%
________________________________________________________________________
Treatment (among strains) (t-1) = 5 847.05 169.41 14.37** 2.60 3.90
Error (within strains) t(r-1) = 24 282.93 11.79
Total (rt-1) = 29 1129.98
________________________________________________________________________

8. Compare the calculated F- value with table F- value and decide on significance among
the treatment effects using the following rules:
a) If F-calculated > F table at 1% level of significance, the difference between
treatments is highly significant. Put two asterisks on F-calculated
b) If F-calculated > F table at 5% level of significance, but ≤ F table at 1%, the
difference between treatments is significant. Put one asterisks on F-calculated.
c) If F-calculated ≤ F table at 5% level of significance, the differences among
treatments is non-significant. Put NS on the F- calculated value in ANOVA
table.
Note that a non-significant F test in the analysis of variance indicates the failure of the
experiment to detect any difference among treatments. It does not, in any way, prove that
all treatments are the same. The failure to detect treatment difference based on non-
significant F-test could be the result of either a very small or nil treatment difference or a
very large experimental error or both. Thus, whenever the F-test is non-significant, the
researcher should examine the size of experimental error and the numerical difference
among the treatment means. If both values are large, the trial may be repeated and efforts
should be made to reduce experimental error so that the differences among treatments, if
any can be detected. On the other hand, if both values are small, the difference among
treatments is probably too small to be of any economic value and, thus, no additional
trials are needed.

For the above example, the computed F value of 14.37 is larger than the tabulated F value
at the 1% level of significance of 3.90. Hence, the treatment difference is said to be
highly significant. In other words, chances are less than 1 in 100 that all the observed
differences among the six treatment means could be due to chance. It should also be
noted that such a significant F test verifies the existence of some differences among the
treatments tested but does not specify the particular pair (or pairs) of treatments that
differ significantly. To obtain this information, procedures for comparing treatment
means are used.

9. Compute the Coefficient of Variation (CV) and standard error (SE) of the treatment
means
Error MS 11.79
- CV= ×100 = × 100= 17.3%
Grand mean(G / rt ) 19.89
- SE± = MSE / r = 11.79 / 5 = 1.53mg

Coefficient of Variation indicates the degree of precision with which the treatments are
compared and it is a good index of the reliability of the experiment. The smaller the CV,
the more reliable the experiment is. The CV values greatly vary with the type of
experiment, experimental material or the character measured (e.g. data on days to
flowering have smaller CV than number of nodules in common bean as within treatment
variation is usually small in the former parameter than the later). In field experiments CV
up to 30% are common and usually lesser CV for laboratory and greenhouse experiments
are expected than field experiments.

7.1.4 Analysis of variance of CRD with unequal replication

Example: Twenty rats were assigned equally at random to four feed types. Unfortunately
one of the rats died due to unknown reason. The data are rat body weight in g after being
raised on these diets for 10 days. We would like to know whether weights of rats are the
same for all four diets at 5%.

________________________________________________________________________
Feed 1 Feed 2 Feed 3 Feed 4
________________________________________________________________________
60.8 68.7 102.6 87.9
57.0 67.7 102.1 84.2
65.0 74.0 100.2 83.1
58.6 66.3 96.5 85.7
61.7 69.8 90.3
________________________________________________________________________
Ti 303.1 346.5 401.4 431.2
ni 5 5 4 5

G 2 (1482.2) 2
- C.F. = = = 115627.20
n 19
- Total Sum of Squares (TSS) = ΣΣyij2 – C.F. (Sum of the square of all
observations- C.F.)
= [(60.8)2 + (57.00)2 +…… + (90.3)2] – 115627.20 = 4354.698
2
T
- Treatment Sum of Squares (SST) = ∑ ri - C.F.
i

(303.1) 2 (346.5) 2 (401.4) 2 (431.2) 2


+ + + − 115627.20 = 4226.348
5 5 4 5
- Error Sum of Squares (SSE) = Total Sum of Squares – Treatment Sum of
Squares: 4354.698 - 4226.348 = 128.35
________________________________________________________________________
Source of DF SS MS Computed F F-table (1%)
Variation
________________________________________________________________________
Treatment (among groups) (t-1) = 3 4226.348 1408.788 164.6** 5.42
Error (within groups)
(total d.f. – treatment d.f.) (18-3) = 15 128.35 8.557
Total (n-1) = 18 4354.698
________________________________________________________________________
** Highly significant

MSE 8.557
× 100 = ×100 = 3.75%
Grand mean 1482.2 19

7.2 Randomized Complete Block Design (RCBD)

7.2.1 Uses, advantages and disadvantages

It is the most frequently used experimental design in field experiments. Completely


Randomized Design is appropriate when no sources of variation, other than treatment
effects are known or anticipated, i.e. the experimental units should be similar. However,
in many experiments certain experimental units if treated alike will behave differently,
e.g in field experiments adjacent plots are more alike in response than those some distant
part. Heaviest animals in a group of the same age may show a different rate of weight
gain than lighter animals.

In such situations, designs and layouts can be constructed so that the portion of variability
attributable to the known source can be measured and excluded from experimental error.
Thus, difference among treatment means will contain no contribution to the known
source.

Randomized Complete Block Design (RCBD) can be used when the experimental units
can be meaningfully grouped, the number of units in a group being equal to the number
of treatments. Such a group is called a block and equals to the number of replications.
Each treatment appears an equal number of times usually once, in each block and each
block contains all the treatments.
Blocking (grouping) can be done based on soil heterogeneity in a fertilizer or variety
trials; initial body weight, age, sex, and breed of animals; slope of the field, etc.

The primary purpose of blocking is to reduce experimental error by eliminating the


contribution of known sources of variation among experimental units. By blocking
variability within each block is minimized and variability among blocks is maximized.

During the course of the experiment, all units in a block must be treated as uniformly as
possible. For example,
- if planting, weeding, fertilizer application, harvesting, data recording, etc, operations
cannot be done in one day due to some problems, then all plots in any one block
should be done at the same time.
- if different individuals have to make observations of the experimental plots, then one
individual should make all the observations in a block.

This practice helps to control variation within blocks, and thus variation among blocks is
mathematically removed from experimental error.

Advantages:
1. Precision: More precision is obtained than with CRD because grouping
experimental units into blocks reduces the magnitude of experimental error.
2. Flexibility: Theoretically there is no restriction on the number of treatments or
replications. If extra replication is desired for certain treatments, it can be applied
to two or more units per block.
3. Ease of analysis: The statistical analysis of the data is simple. If as result of
change or misshape, the data from a complete block or for certain treatments are
unusable, the data may be omitted without complicating the analysis. If data from
individual units (plots) are missing, they can be estimated easily so that simplicity
of calculation is not lost.

Disadvantages:
The main disadvantage of RCBD is that when the number of treatments is large (>15),
variation among experimental units within a block becomes large, resulting in a large
error term. In such situations, other designs such as incomplete block designs should be
used.

7.2.2 Randomization and layout

Step 1: Divide the experimental area (unit) into r-equal blocks, where r is the number of
replications, following the blocking technique. Blocking should be done against the
gradient such as slope, soil fertility, etc.

Step 2: Sub-divide the first block into t-equal experimental plots, where t is the number of
treatments and assign t treatments at random to t-plots using any of the randomization
scheme (random numbers or lottery).
Step 3: Repeat step 2 for each of the remaining blocks.

Three treatments each replicated four times

ENVIRONMENTAL GRADIENT

Block 1 Block 2 Block 3 Block 4

The major difference between CRD and RCBD is that in CRD, randomization is done
without any restriction to all experimental units but in RCBD, all treatments must appear
in each block and different randomization is done for each block (randomization is done
within blocks).

7.2.3 Analysis of variance of RCBD

Step 1. State the model


The linear model for RCBD:
Yij = µ + τi + βj + εij

where, Yij = the observation on the jth block and the ith treatment; µ = common mean
effect; τi = effect of treatment i; βj = effect of block j; and εij = experiment error for
treatments i in block j.

Step 2. Arrange the data by treatments and blocks and calculate treatment totals (Ti),
Block (rep) totals (Bj) and Grand total (G).

Example: Oil content of linseed treated at different stages of growth with N-fertilizes.

Oil content (g) from sample of 20 g seed

Treatment (Stages Block 1 Block 2 Block 3 Block 4 Treat.


of application) Total (Ti)
Seeding 4.4 5.9 6.0 4.1 20.4
Early blooming 3.3 1.9 4.9 7.1 17.2
Half blooming 4.4 4.0 4.5 3.1 16.0
Full blooming 6.8 6.6 7.0 6.4 26.8
Ripening 6.3 4.9 5.9 7.1 24.2
Unfertilized 6.4 7.3 7.7 6.7 28.1
(control)
Block total (Bj) 31.6 30.6 36.0 34.5
Grand total 132.7

From this table we can test:


1. Does fertilizer application have effect on oil content? Comparing treatment 6 versus 1-
5.
2. Which stage of application is best in increasing oil content in linseed? Comparing the
difference between 1, 2, 3, 4 and 5

Step 3: Compute the correction factor (C.F.) and sum of squares using r as number of
blocks, t as number of treatments, Ti as total of treatment i, and Bj as total of block j.
G 2 (132.7 )
2
a. C.F. = = = 733.72
rt 4× 6
b. Total Sum of Square (TSS) = ΣYij2- C.F. = (4.4)2 + (3.3)2 + …. + (6.7)2 – C.F.
= 788.23 – 733.72 = 54.51
r

∑B
2
j
j =1
c. Block Sum of Squares (SSB) = − C .F .
t

=
(31.6)2 + (30.6)2 + (36.0)2 + (34.5)2 − 733.72 ; 736.86 – 733.72 = 3.14
6
t

∑T i
2

d. Treatment Sum of Squares (SST) = i =1


− C .F .
r

=
(20.4)2 + (17.2)2 + (16)2 + .... + (28.1)2 − 733.72 ; 765.37 – 733.72 = 31.65
4
e. Error Sum of Squares (SSE) = Total SS – SSB – SST
= 54.51 – 3.14 – 31.65 = 19.72

Step 4: Compute the mean squares for block, treatment and error by dividing each sum of
squares by its corresponding d.f.

SSB 3.14
Block Mean Square (MSB) = = = 1.05
r −1 3
SST 31.65
Treatment Mean Square (MST) = = = 6.33
t −1 5
SSE 19.72
Error Mean Square (MSE) = = = 1.31
(r − 1)(t − 1) 15
Step 5: Compute the F-value for testing block and treatment differences.
MSB MST
F-block = for block; F-treatment = for treatments.
MSE MSE
1.05 6.33
F-block = = 0.80 ; F-treatment = = 4.83
1.31 1.31

Step 6: Read table F- and compare the computed F-value with tabulated F-value and
make decision.
- F-table for comparing block effects, use block d. f. as numerator (n1) and error d.f. as
denominator (n2); F (3, 15) at 5% = 3.29 and at 1% = 5.42.
- F-table for comparing treatment effects, use treatment d.f. as numerator and error d.f.
as denominator F (5, 15) at 5% = 2.90 and at 1% = 4.56. If the calculated F-value is
greater than the tabulated F-value for treatments at 1%, it means that there is a highly
significant (real) difference among treatment means.

In the above example, calculated F-value for treatments (4.83) is greater than the
tabulated F-value at 1% level of significance (4.56). Thus, there is a highly significant
difference among the stages of application on nitrogen content of the linseed.

Step 7: Compute the standard error of the mean and coefficient of variability.
- Standard error (SE±)

MSE 1.31
SE ± = = = 0.57 g
r 4
MSE 1.31
CV = × 100 = × 100 = 20.7%
Grand mean 5.53

Step 8: Summarize the results of computations in analysis of variance table for quick
assessment of the result.

Analysis of variance for oil content of the linseed.

Source of Degree of Sum of Mean Computed Tabulated F


variation Freedom Squares Squares F 5% 1%
Block (r-1) = 3 3.14 1.05 0.80 3.29 5.42
Treatment (t-1) = 5 31.65 6.33 4.83** 2.90 4.56
Error (r-1) (t-1) = 15 19.72 1.31
Total (rt-1) = 23 54.51

7.2.4 Block efficiency


Blocking maximizes the difference among blocks and reduces the difference among plots
of the same block (within blocks) as small as possible. Thus, the result of every RCBD
should be examined to see whether this objective has been achieved. The procedure to
measure block efficiency is:

Step 1: Determine the level of significance of block variation by computing F-value for
block and test its significance.
MSB 1.05
F-block = = = 0.80
MSE 1.31
By comparing it with tabulated F-value at n1 (r-1) d.f. and n2 error d.f. (r-1) (t-1) = F (3,
15) at 5% = 3.29.

If the computed F-value is greater than the tabulated F-value, blocking is said to be
effective in reducing experimental error. Also the scope of an experiment may have been
increased when blocks are significantly different since the treatments have been tested
over a wider range of experimental conditions.

On the other hand, if block effects are small (calculated F for block < tabulated F value),
it indicates either that the experimenter was not successful in reducing error variance by
grouping of individual units (blocking) or that the units were essentially homogeneous to
start with.

Step 2: Determine the magnitude of the reduction in experimental error due to blocking
by computing the Relative Efficiency (R.E.) as compared to CRD.

R.E. =
(r − 1)MSB + r (t − 1) MSE
(rt − 1)MSE
Where MSB = block mean square; MSE = error mean square; r = number of replications;
t = number of treatments

R.E =
(4 − 1) × (1.05) + 4 × (6 − 1) × 1.31
(4 × 6 − 1) × 1.31

(3 × 1.05) + (4 × 5 × 1.31) 3.15 + 26.2 29.35


R.E. (RCBD to CRD) = = = = 0.97
23 × 1.31 30.13 30.13

If error degree of freedom of RCBD is less than 20, the R.E. should be multiplied by the
adjustment factor (k) to consider the loss in precision resulting from fewer degrees of
freedom.
K=
[(r − 1)(t − 1) + 1][t (r − 1) + 3]
[(r − 1)(t − 1) + 3][t (r − 1) + 1]

K=
[(4 − 1)(6 − 1) + 1][6(4 − 1) + 3]
[(4 − 1)(6 − 1) + 3][6(4 − 1) + 1]
=
(3 × 5) + 1)(6 × 3 + 3) = 16 × 21 = 336 = 0.98
(3 × 5 + 3)(6 × 3 + 1) 18 × 19 342
Adjusted R.E. = 0.97 × K(0.98) = 0.95.

In this case, information is sacrificed in theory by using Randomized Complete Block


Design, since 95 replicates in a completely randomized design give as much information
as 100 blocks or replications of a Randomized Complete Block Design.

7.2.5 Missing Data in RCBD

Sometimes data for certain units may be missing or become unusable. For example,
- when an animal becomes sick or dies but not due to treatment
- when rodents destroy a plot in field
- when a flask breaks in laboratory
- when there is an obvious recording error

A method is available for estimating such data. Note that an estimate of a missing value
does not supply additional information to the experimenter; it only facilitates the analysis
of the remaining data.

Case 1: When a single value is missing


When a single value is missing in RCBD, an estimate of the missing value can be
calculated as:
rBo + tTo − Go
Y =
(r − 1)(t − 1) where;

Y = estimate of the missing value


t = No. of treatments
r = No. of replications or blocks
Bo = total of observed values in block (replication) containing the missing value
To = total of observed values in treatment containing the missing value
Go = Grand total of all observed values.

The estimated value is entered in the table with the observed values and the analysis of
variance is performed as usual with one d. f. being subtracted from both total and error d.
f. because the estimated value makes no contribution to the error sum of squares.

When all of the missing values are on the same block or treatment the simplest solution is
to consider as if the block or treatment had not been included in the experiment.
Example: In the table given below are yields (kg) of 4-varieties of maize (Al-composite,
Rarree-1, Bukuri, Katumani) in 4-replications planted in RCBD on a plot size of 10m x
10 m of which one plot yield is missing. Estimate the missing value and analyze the data

Treatment
Varieties I II III IV total (Ti)
Bukuri 18.5 15.7 16.2 14.1 64.5
Katumani 11.7 - 12.9 14.4 39(To)
Rarree-1 15.4 16.6 15.5 20.3 67.8
Al-composite 16.5 18.6 12.7 15.7 63.5
Block total 62.1 50.9 (Bo) 57.3 64.5
Go (Grand total) = 234.8

Solution
a. Estimate the missing value
rBo + tTo − Go
Y=
(r − 1)(t − 1)
[(4 × 50.9) + (4 × 39) − 234.8]
Y= = 13.9
(4 − 1)(4 − 1)
b. Enter the estimated value and carry out the analysis following the usual procedure:
- Corrected treatment total = 39 + 13.9 = 52.9
- Corrected block total = 50.9 + 13.9 = 64.8
- Corrected grand total = 234.8 + 13.9 = 248.7
c. Analysis of variance

1. C.F. =
(248.7 )2 = 3865.73
rt (4 × 4 )
2. TSS=
4 4

∑∑ Yij − C.F . = (18.5) + (11.7 ) + ..... + (13.9 ) + ........ + (15.7 ) − C.F . = 79.18
2 2 2 2 2

i =1 j =1

3. Treatment SS =
4

∑ Ti
i =1
2

(64.5)2 + (52.9)2 + (67.8)2 + (63.5)2


− C .F . = − 3865.73 = 31.21
r 4
4

∑ Bj
i =1
2

(62.1) + (64.8) + (57.3) + (64.5) 2


2 2 2
4. Block SS = − C .F . = − 3865.73 = 9.02
t 4
5. Error SS= Total SS- Treatment SS- Block SS = 79.18 – 31.21 – 9.02 = 38.95

d. Compute the correction factor for bias (B) for treatment sum of squares as the
treatment SS is biased upwards.
B=
[Bo − (t − 1) y ]2
t (t − 1)
Bo = Total of observed values in blocks (replication) containing the missing value

Y = estimated value
[50.9 − (4 − 1) × 13.9]2
B= 4 × (4 − 1)

=
[50.9 − 41.7]2
= 7.05
12
e. Subtract the computed B value from Total SS & Treatment SS
- Adjusted Treatment SS = Treatment SS – B
= 31.21 – 7.05= 24.16
- Adjusted Total SS = Total SS-B
= 79.18 – 7.05 = 72.13
f. Subtract 1 from error d. f. and total d. f. and complete the analysis of variance table.

Source of Degree of Sum of Mean Computed Table F


variation freedom squares squares F 5% 1%
Block 3 9.02 3.01 0.61ns 4.07 7.59
Treatment 3 24.16 8.05 1.64ns 4.07 7.59
Error (t-1) (r-1)-1 = 8 38.95 4.9
Total rt-1-1= 14 72.13

MSE 4 .9
CV= × 100 = × 100 = 14.1%
Grand mean 15.65(234.8 / 15)

7.3 Latin Square Design

7.3.1 Uses, advantages and disadvantages

The major feature of the Latin Square Design is its capacity to simultaneously handle two
known sources of variation among experimental units unlike Randomized Complete
Block Design (RCBD), which treats only one known source of variation.

The two directional blocking in a Latin Square Design is commonly referred as row
blocking and column blocking. In Latin Square Design the number of treatments is equal
to the number of replications that is why it is called Latin Square.

e.g. Soil fertility gradient Soil fertility gradient


Slope gradient Soil fertility gradient

Advantages:
- Greater precision is obtained than Completely Randomized Design & Randomized
Complete Block Design (RCBD) because it is possible to estimate variation among
row blocks as well as among column blocks and remove them from the experimental
error.
Disadvantages:
- As the number of treatments is equal to the number of replications, when the number
of treatments is large the design becomes impractical to handle. On the other hand,
when the number of treatments is small, the degree of freedom associated with the
experimental error becomes too small for the error to be reliably estimated. Thus, in
practice the Latin Square Design is applicable for experiments in which the number
of treatments is not less than four and not more than eight.
- Randomization is relatively difficult.

7.3.2 Randomization and layout

Step 1: To randomize a five treatment Latin Square Design, select a sample of 5 x 5 Latin
square plan from appendix of statistical books. We can also create our own basic plan and
the only requirement is that each treatment must appear only once in each row and
column. For our example, the basic plan can be:

A B C D E
B A E C D
C D A E B
D E B A C
E C D B A

Step 2: Randomize the row arrangement of the plan selected in step 1, following one of
the randomization schemes (either using lottery method or table of random numbers).

- Select from table of random numbers, five three digit random numbers avoiding ties
if any
Random numbers: 628 846 475 902 452
Rank: (3) (4) (2) (5) (1)
- Rank the selected random numbers from the lowest (1) to the highest (5)
- Use the ranks to represent the existing row number of the selected plan and the
sequence to represent the row number of the new plan. For our example, the third row
of the selected plan (rank 3) becomes the first row (sequence) of the new plan, the
fourth becomes the second row, etc.
1 2 3 4 5
3 C D A E B
4 D E B A C
2 B A E C D
5 E C D B A
1 A B C D E

Step 3: Randomize the column arrangement using the same procedure. Select five three
digit random numbers.
Random numbers: 792 032 947 293 196
Rank: (4) (1) (5) (3) (2)

The rank will be used to represent the column number of the above plan (row arranged)
in step 2. For our example, the fourth column of the plan obtained in step 2 above
becomes the first column of the final plan, the first column of the plan becomes 2, etc.

Final layout:

E C B A D
A D C B E
C B D E A
B E A D C
D A E C B

Note that each treatment occurs only once in each row and column

Sample layout of three treatments each replicated three times in Latin Square Design

7.3.3 Analysis of variance


There are four sources of variation in Latin Square Design, two more than that of CRD
and one more than that for the RCBD. The sources of variation are row, column,
treatment and experimental error.

The linear model for Latin Square Design:


Yijk = µ + τi + βj + ϒk + εijk

where,
Yijk = the observation on the ith treatment, jth row & kth column
µ = Common mean effect
τi = Effect of treatment i
βj = Effect of row j
ϒk = Effect of column k
εijk = Experiment error (residual) effect

Example: Grain yield of three maize hybrids (A, B, and D) and a check variety, C, from
an experiment with Latin Square Design.

Grain yield (t/ha)

Row number C1 C2 C3 C4 Row total (R)


R1 1.640(B) 1.210(D) 1.425(C) 1.345(A) 5.620
R2 1.475(C) 1.15(A) 1.400(D) 1.290(B) 5.350
R3 1.670(A) 0.710(C) 1.665(B) 1.180(D) 5.225
R4 1.565(D) 1.290(B) 1.655(A) 0.660(C) 5.170
Column total(C) 6.350 4.395 6.145 4.475
Grand total (G) 21.365

Treatment Total (T) Mean


A 5.855 1.464
B 5.885 1.471
C 4.270 1.068
D 5.355 1.339

STEPS OF ANALYSIS

Step 1: Arrange the raw data according to their row and column designation, with the
corresponding treatments clearly specified for each observation and compute row total
(R), column total (C), the grand total (G) and the treatment totals (T).

Step 2: Compute the C.F. and the various Sum of Squares


G 2 (21.365)
2
C.F. = 2 = = 28.53
t 16
[ 2 2 2
]
Total SS = ∑y2 – C.F. = (1.64 ) + (1.210 ) + .... + (0.660 ) − 28.53 = 1.41
Row SS =
∑R 2

− C .F . =
(5.62)2 + (5.35)2 + (5.225)2 + (5.170)2 − 28.53 = 0.03
t 4

Column SS=
∑C 2

− C .F . =
(6.35)2 + (4.395)2 + (6.145)2 + (4.475)2 − 28.53 = 0.83
t 4

Treatment SS=
∑T 2

− C .F . =
(5.855) + (5.885) + (4.270 ) + (5.355)
2 2 2
− 28.53 = 0.43
2

t 4
Error SS =Total SS–Row SS–Column SS–Treatment SS=1.41–0.03–0.83–0.43 = 0.12

Step 3: Compute the mean squares for each source of variation by dividing the sum of
squares by its corresponding degrees of freedom.

Row SS 0.03 Column SS 0.83


Row MS = = = 0.01 ; Column MS = = = 0.276
t −1 3 t −1 3
Treatment SS 0.43 Error SS 0.12
Treat. MS = = = 0.143 ; Error MS = = = 0.02
t −1 3 (t − 1)(t − 2 ) 3 × 2
Step 4: Compute the F-value for testing the treatment effect and read table F-value as:
Treatment MS 0.143
F − calculated = = = 7.15
Error MS 0.02
As the computed F-value (7.15) is higher than the tabulated F-value at 5% level of
significance (4.76), but lower than the tabulated F-value at the 1% level (9.78), the
treatment difference is significant at the 5% level of significance.

Error MS 0.02
Compute the CV as: = × 100 = × 100 = 10.6%
Grand mean 1.335
Note that although the F-test on the analysis of variance indicates significant differences
among the mean yields of the 4-maize varieties tested, it does not identify the specific
pairs or groups of varieties that differed significantly. For example, the F-test is not able
to answer the question whether every one of the three hybrids gave significantly higher
yield than that of the check variety. To answer these questions, the procedure for mean
comparison should be used.

Step 5: Summarize the results of the analysis in ANOVA table

Source D.F. SS MS Computed F Table F


5% 1%
Row (t-1)=3 0.03 0.01 0.50ns 4.76 9.78
Column (t-1)=3 0.83 0.275 13.75** 4.76 9.78
Treatment (t-1)=3 0.43 0.142 7.15* 4.76 9.78
Error (t-1)(t-2)=6 0.13 0.02
Total (t2-1) =15 1.41
7.3.4 Relative efficiency

As in RCBD, where the efficiency of one way blocking indicates the gain in precision
relative to CRD, the efficiencies of both row and column blocking in a Latin Square
Design indicate the gain in precision relative to either the CRD or RCBD, the procedures
are:
i. Compute the F-value for testing the row & column effects; and test their
significance

Row MS 0.01 0.275


F (row) = = = 0.50; F (column) = = 13.75**
Error MS 0.02 0.02
ii. Compute the relative efficiency of LS design relative to CRD & RCBD

The relative efficiency of LS design as compared to CRD

MS row + MS column + (t − 1) × MS error


R.E (CRD) =
(t + 1) × ( MS error )

0.01 + 0.275 + (4 − 1) × 0.02 0.345


= = = 3.45
(4 + 1) × 0.02 0.100
This indicates that the use of Latin Square Design in the present example is estimated to
increase the experimental precision by 245% as compared to CRD. This result implies
that if the CRD is used an estimated 2.45 times more replication would have been
required to detect the treatment difference of the same magnitude as that detected with
the Latin Square Design.

- The R.E. of LS design as compared to RCBD can be computed in two ways:

- When row is used as blocking factor


MS row + (t − 1) × MS error 0.01 + (4 − 1) × 0.02 0.07
R.E. (row) = = = = 0.875
t × ( MS error ) 4 × 0.02 0.08
- When column used as blocking factor
MS column + (t − 1) × MS error 0.275 + (4 − 1) × 0.02 0.335
R.E. (column) = = = =
t × ( MS error ) 4 × 0.02 0.08
4.19

When the error d. f. in the Latin Square analysis of variance is < 20, the R.E. value
should be multiplied by the adjustment factor (K) defined as:

[(t − 1)(t − 2) + 1][(t − 1) 2 + 3] = [(4 − 1)(4 − 2) + 1][(9 + 3)] = 7 x 12 84


[(t − 1)(t − 2) + 3][(t − 1) 2 + 1]
K= = = 0.93
[(4 − 1)(4 − 2) + 3][10] 9 x 10 90
The adjusted R.E. values are computed as:

R.E. (row) = 0.875 × 0.93 = 0.81


R.E. (column) = 4.19 × 0.93 = 3.90

The results indicate that the additional column blocking made possible by the use of Latin
Square Design is estimated to have increased the experimental precision over that of
RCBD by 290%, whereas the additional row-blocking in the LS design did not increase
precision over the RCBD with column as blocks. Hence, for the above trial, a RCBD with
column as blocks would have been as efficient as a Latin Square Design.

7.3.5 Missing data


The formula for a single missing observation
t ( Ro + C o + To ) − 2G0
Y=
(t − 1)(t − 2)
Where Ro, Co, and To are the totals of the observed values for the row, column, and
treatment containing the missing value, respectively, and Go is the grand total of the
observed values, and t is the number of treatments.

The analysis of variance is performed in the usual manner after entering the estimated
value with one degree of freedom being subtracted from total and error degrees of
freedom for each missing value.

As in the case of RCBD, the treatment sum of squares is biased upward by:

Bias (B) =
[Go − Ro − Co − (t − 1)T0 )]2
[(t − 1)(t − 2)]2
Where Go, Ro, Co, To and t are as described above. Then B is subtracted from treatment
SS & total SS.

Example: Yield (kg) of five rice varieties tested in Latin Square Design from plot size of
100 m2.

E(12) C (-) B(11) A (10) D (8)


A (7) D (8) C(8) B(7) E (13)
C (12) B (6) D(7) E (11) A (9)
B (4) E (10) A (7) D (6) C (8)
D (5) A (8) E (15) C(9) B (5)

Estimate the missing value, complete the analysis of variance and compare the variety C
with D, and A with E at 5% level of significance using LSD test.

a. Estimate the missing value


t ( Ro + C o + To ) − 2G0
Y=
(t − 1)(t − 2)
5(41 + 32 + 37) − 2 × 206
Y= = 11.5
(5 − 1)(5 − 2)
b. Enter the estimated value and carry out the analysis following the usual procedure:
- Corrected row total = 41.0 + 11.5 = 52.5
- Corrected column total = 32.0 + 11.5 = 43.5
- Corrected treatment total = 37+ 11.5 = 48.5
- Corrected grand total = 206.0 + 11.5 = 217.50
c. Compute the C.F. and the various Sum of Squares
G 2 (217.5)
2
C.F. = 2 = = 1892.25
t 25
2
[ 2 2
]
Total SS = ∑y2 – C.F. = (12.0 ) + (11.5) + .... + (5.0 ) − 1892.25 = 180.0

Row SS =
∑R 2

− C .F . =
(52.5)2 + (43.0)2 + ...... + (42)2 − 1892.25 = 31.6
t 5

Column SS =
∑C 2

− C .F . =
(40.0) + (43.5)2 + ........ + (43.0)2 − 1892.25
2
= 6.60
t 5

Treatment SS =
∑T − C .F . =
2
(41.0)2 + (33.0)2 + (48.5)2 + ... + (61.0)2 − 1892.25 = 107.6
t 5
Error SS = Total SS–Row SS–Column SS – Treatment SS = 180.0–31.6–6.6–107.6= 34.2

d. Compute the correction factor for bias (B) for treatment sum of squares as the
treatment SS is biased upwards.

Bias (B) =
[Go − Ro − Co − (t − 1)T0 ]2 = [206 − 41 − 32 − (5 − 1)37]2 = 1.56
[(t − 1)(t − 2)]2 [(5 − 1)(5 − 2)]2
e. Subtract the computed B value from Total SS & Treatment SS
Adjusted Treatment SS = Treatment SS – B = 107.6-1.56 = 106.04
Adjusted Total SS = Total SS-B = 180.0-1.56 = 178.44

f. Subtract 1 from error d. f. and total d. f. and complete the analysis of variance table.

Source D.F. SS MS Computed Table F


F 5% 1%
Row (5-1)=4 31.6 7.90 2.54 3.36 5.67
Column (5-1)=4 6.6 1.65 0.53 3.36 5.67
Treatment (5-1)=4 106.04 26.51 8.52** 3.36 5.67
Error (5-1)(5-2)-1 = 11 34.2 3.11
Total (t2-1)-1 =23 178.44
MSE 3.11 3.11
CV = × 100 = × 100 = × 100 = 20.5%
Grand mean (206 / 24) 8.58
To compare treatments C & D, LSD5% = t 0.025(11) × sd where
2 1  2 1 
sd = MSE  +  ; sd = 3.11 ×  +  = 1.226
 t (t − 1)(t − 2)   5 (5 − 1)(5 − 2) 

LSD5% = 2.201 x 1.226 = 2.70 kg

Difference between the treatment means of C & D (37/4-34/5) = 9.25-6.8 = 2.45. Since
the difference is less than LSD value, there is no significant difference between
treatments C & D.

To compare the treatments A & E both with equal replication (without missing value)
2 MSE 2 × 3.11
LSD5% = t 0.025(11) × sd , where sd = = = 1.115
r 5
LSD5% = 2.201 × 1.115 = 2.45 kg

Difference between the treatment means of A & E (61/5-41/5) = 12.2-8.2 = 4.00. Since
the difference is greater than LSD value, there is significant difference between
treatments A & E.

8. INCOMPLETE BLOCK DESIGNS

Theoretically, the complete block designs (where each block contains all the treatments)
such as Randomized Complete Block and Latin Square are applicable to experiments with
any number of treatments. However, these complete block designs become less efficient as
the number of treatments increases, mainly because block size increases proportionally with
the number of treatments which in turn increases experimental error.

An alternative set of designs for single factor experiments having a large number of
treatments are the incomplete block designs. For example, plant breeders are often interested
in making comparisons among a large number of selections in a single trial. For such trials,
we use incomplete block designs. As the name implies, the experimental units in these
designs are grouped into blocks which are smaller than a complete replication of the
treatments. However, the improved precision with the use of an incomplete block designs
(where the blocks do not contain all the treatments) is achieved with some costs. The major
ones are:
- inflexible number of treatments or replications or both.
- unequal degree of precision in the comparison of treatment means.
- complex data analysis.
Although there is no concrete rule as to how large the number of treatments should be
before the use of an incomplete block design, the following points may be helpful:

a. Variability in the experimental material: The advantage of an incomplete block design


over complete block designs is enhanced by an increased variability in the
experimental material. In general, whenever the block size in Randomized Complete
Block Design is too large to maintain reasonable level of uniformity among
experimental units within the same block, the use of an incomplete block design
should be seriously considered.

b. Computing facilities and services: Data analysis of an incomplete block design is more
complex than that for a complete block design. Thus, the use of an incomplete block
design should be considered only as the last measure.

8.1. Lattice Designs

The lattice designs are the most commonly used incomplete block designs in agricultural
experiments. There is sufficient flexibility in the design to make its applications simpler than
most of the other incomplete block designs. There are two kinds of lattices: balanced lattice
and partially balanced lattice designs.

Field arrangement & randomization:


- Blocks in the same replication should be made as nearly alike as possible
- Randomize the order of blocks within replication using a separate randomization in
each replication.
- Randomize treatment code numbers separately in each block.

8.1.1 Balanced lattices

The balanced lattice design is characterized by the following basic features:


a. Each treatment occurs together in the same block with every other treatment once.
As result the statistical analysis is relatively simple and each pair of treatments is
compared with the same precision. For example, consider the following design for
four treatments on blocks of size two.

Block Rep I Rep II Rep III

(1) A B (1) A C (1) A D


(2) C D (2) B D (2) B C

Note that treatment A occurs with treatment B, with treatment C and with treatment D only
once.
b. The number of treatments (t) must be perfect square such as 16, 25, 36, 49, 64, etc.
c. The block size (k) is equal to the square root of the number of treatments, i.e. k =
t.
d. The number of replications (r) is one more than the block size, i.e. r = k + 1. That is,
the number of replications required is 6 for 25 treatments, 7 for 36 treatments, and so
on.
As balanced lattices require large number of replications, they are not commonly used.

8.1.2 Partially balanced lattices

The partially balanced lattice design is more or less similar to the balanced lattice design,
but it allows for a more flexible choice of the number of replications. The partially
balanced lattice design requires that the number of treatments must be a perfect square and
the block size is equal to the square root of the number of treatments. However, any
number of replications can be used in partially balanced lattice design. The partially
balanced lattice design with two replications is called simple lattice, with three replications
is triple lattice and with four replications is quadruple lattice, and so on. However, such
flexibility in the number of replications results in the loss of symmetry in the arrangement
of the treatments over blocks (i.e. some treatment pairs never appear together in the same
incomplete block). Consequently, the treatment pairs that are tested in the same incomplete
block are compared with higher level of precision than for those that are not tested in the
same incomplete block. Thus, partially balanced designs are more difficult to analyze
statistically, and several different standard errors may be possible.

Example 1 (Lattice with adjustment factor): Field arrangement and broad leaved weed
kill (%) at tef fields of Debre zeit research center by 16-herbicides tested in 4 × 4 Triple
Lattice Design (Herbicide numbers in parenthesis).

________________________________________________________________________
Block
Replication Block % kill total (B) M Cb
_________________________________________________________________________
1 1 75(15) 57(16) 71(13) 77(14) 280 789 -51
2 78(12) 66(11) 68(10) 45(9) 257 716 -55
3 40(6) 64(5) 49(8) 42(7) 195 608 23
4 59(3) 53(1) 46(2) 54(4) 212 642 6
Rep total (R1) 944 -77
______________________________________________________________________
2 1 53(16) 66(4) 57(12) 47(8) 223 663 -6
2 80(14) 48(6) 73(10) 52(2) 253 700 -59
3 36(7) 63(11) 67(15) 47(3) 213 676 37
4 68(13) 60(1) 50(9) 76(5) 254 716 -46
Rep total (R2) 943 -74
_________________________________________________________________________
3 1 66(15) 46(2) 58(12) 69(5) 239 754 37
2 46(4) 40(7) 59(13) 55(10) 200 678 78
3 43(9) 55(3) 50(8) 68(14) 216 670 22
4 60(11) 58(1) 48(16) 47(6) 213 653 14
Re total (R3) 868 151

Grand Total (G) 2755 0


_________________________________________________________________________

Analysis of variance

Step 1: Calculate the block total (B), the replication total (R) and Grand Total (G) as shown above.
Step 2: Calculate the treatment totals (T) by summing the values of each treatment from the three
replications.

Treat. Treatment A×Cb Adjusted treatment Adjusted treatment


No. total (T) Sum Cb (0.0813×Cb) total (T') (T+Acb) mean (Ti/3)
1 171 -26 -2.11 168.89 56.29
2 144 -16 -1.30 142.70 47.57
3 161 65 5.28 166.28 55.43
4 166 78 6.34 172.34 57.45
5 209 14 1.14 210.14 70.05
6 135 -22 -1.79 133.21 44.40
7 118 138 11.22 129.22 43.07
8 146 39 3.17 149.17 49.72
9 138 -79 -6.42 131.58 43.86
10 196 -36 -2.93 193.07 64.36
11 189 -4 -0.32 188.67 62.89
12 193 -24 -1.95 191.05 63.68
13 198 -19 -1.54 196.45 65.48
14 225 -88 -7.15 217.85 72.61
15 208 23 1.87 209.87 69.96
16 158 -43 -3.49 154.50 51.50

Step 3: Using r as number of replications and k as block size, compute the total sum of squares,
replication sum of squares, treatment (unadjusted) sum of squares as:
G 2 (2755) 2
- Correction Factor (C. F.) = = = 158125.5
rk 2 3 × 16
- Total Sum of Squares = (75)2 + (57)2 + .... + (47)2 – C.F. = 164233- 158125.5 = 6107.5

- Replication Sum of Squares =


∑R 2

- C. F.=
(944) 2 + (943) 2 + (868) 2
-158125.5 =
k2 16
158363.06- 158125.5 = 237.6

- Treatment (unadjusted) Sum of Squares =


∑T 2

- C. F.
r
(171) 2 + (144) 2 + .... + (158) 2
= -158125.5 = 163029- 158125.5 = 4903.5
3
Step 4: For each block, calculate block adjustment factor (Cb) as:

Cb = M -rB where M is the sum of treatment totals for all treatments appearing in that particular
block, B is the block total and r is the number of replications. For example, block 2 of replication 2
contains treatments 14, 6, 10, and 2. Hence, the M value for block 2 of replication 2 is: M = T14 +
T6 + T10 + T2 = 225 + 135 + 196 + 144 = 700 and the corresponding Cb value is: Cb = 700 - (3 x
253) = -59. The Cb values for the blocks are presented in the above table. Note that the sum of Cb
values over all replications should add to zero (i. e. -77 + -74 + 151 = 0).

Step 5: Calculate the Block (adju.) Sum of Squares as:


∑ Cb ∑ Rc
2 2
b
Block (adj.) SS = -
k × r × (r − 1) (k 2 )(r )(r − 1)
(−51) 2 + (−55) 2 + ..... + (14) 2 (−77)2 + (−74) 2 + (151) 2
= - = 888.58-356.31 = 532.27
4 × 3× 2 16 × 3 × 2

Step 6: Calculate the intra-block error sum of squares as:


- Intra-block error SS = Total SS - Rep. SS - Treatment (unadjusted) SS – Block (adjusted) SS
= 6107.5 - 237.6 - 4903.5 - 532.27 = 434.13

Step 7: Calculate the intra-block error mean square (MS) and block (adj.) mean square (MS) as:
Intra − block error SS 434.13
- Intra-block error MS = = = 20.67
( k − 1)( rk − k − 1) 21
Block (adjust ) SS 532.27
- Block (adjusted) MS = = = 59.14
(r )(k − 1) 9
Note that if the adjusted block mean square is less than intra-block error mean square, no further
adjustment is done for treatment. In this case, the F-test for significance of treatment effect is made
in the usual manner as the ratio of treatment (unadjusted) mean square and intra-block error mean
square and steps 8 to 13 can be ignored. For our example, the MSB value of 59.14 is greater than
the MSE value of 20.67, thus, the adjustment factor is computed.

Step 8: Calculate adjustment factor A. For a triple lattice design, the formula is:

( Eb − Ee) (59.14 − 20.67)


A= = = 0.0813
k ( r − 1)( Eb) 4 × 2 × 59.14

where Eb is the block (adju.) mean square and Ee is the intra-block error mean square, and k is block
size.

Step 9: For each treatment, calculate the adjusted treatment total (T') as:

T' = T + A Cb where the summation runs over all blocks in which that particular treatment
appears. For example, the adjusted treatment totals for treatment number 1 and 2 are computed as:

T'1 = 171 + 0.0813(6 + -46 + 14) = 168.89


T'2 = 144 + 0.0813(6 + -59 + 37) = 142.70
.
.
.
etc.

The adjusted treatment totals (T') and their respective means (adjusted treatment total divided by the
number of replications (3) are presented along with the unadjusted treatment totals (T) in the table
above.
Step 10: Compute the adjusted Treatment Sum of Squares as:

(r )( SSBun )
Treatment (unadjusted) SS – [Ak(r-1) − SSB ]
(r − 1)(1 + kA)

- SSBun (unadjusted block sum of squares) =


∑ B2 - C. F. - SSR; where B is Block total,
k
and SSR is Replication Sum of Squares.
( 280) 2 + (257) 2 + ..... + (213) 2
= - 158125.5 - 237.6
4
= 160046.75-158125.5 - 237.6 = 1683.65

Thus, the adjusted treatment sum of squares


(3) × (1683.65)
= 4903.5 – [(0.0813 × 4 × 2) − 532.27] = 4010.20
(2) × (1 + 4 × 0.0813)

Step 11: Compute the treatment (adjusted) mean square as:

Treatment (adjusted ) SS 4010.2


- Treatment (adj.) MS = = = 267.35
k2 −1 15

Step 12: Compute the F-value for testing the significance of treatment difference and compare the
computed F-value with the tabulated F-value with k2 - 1 = 15 degree of freedom as numerator and
(k - 1) (rk - k - 1) = 21 as denominator.

Treatment ( adjus ) MS 267.35


- Computed F = = = 12.93
Intra − block error MS 20.67

Step 13: Compare the computed F-value with the table F-value
F0.05 (15, 21) = 2.18
F0.01 (15, 21) = 3.03
Since the computed F-value of 12.93 is greater than the tabulated F-value at 1% level of
significance (3.03), the differences among the herbicide means are highly significant.

Step 14: Enter all values computed in the analysis of variance table

___________________________________________________________________________
Source of Degrees of Sum of Mean Computed Tabulated F
Variation freedom squares squares F 5% 1%
____________________________________________________________________________
Replication (r-1) 2 237.6 118.8
Block (adj.) r (k-1) 9 532.27 59.14
2
Herbicide (unadj.) k -1 15 4903.5 326.9
Intra-block error (k-1) (rk-k-1) 21 434.13 20.67
Herbicide (adju.) k2-1 (15) (4010.2) 267.35 12.93 2.18 3.03
Total rk2- 1 47 6107.5
____________________________________________________________________________
Step 15: Compute the corresponding Coefficient of Variation (CV) as:

Intra − block error mean square 20.67


CV = ×100 = × 100 = 7.9%
Adjusted Grand mean 57.4
Note that the grand mean is same for adjusted and unadjusted treatment total.

Step 16: Compute the gain in precision of the triple lattice relative to that of Randomized Complete
Block Design as:
 ( SSB + SSE ) 
- % Relative precision =  2  / Ee'×100 ; where SSB is Block (adj.) SS, SSE is
 (k − 1)(r − 1) 
intra-block error SS, r is number of replication, and k is block size.
rkA
- Ee' (effective error mean square) = (1 + ) x Ee, where r is the number of replications,
(k + 1
k is block size, A is adjustment factor, Ee is intra-block error mean square.
-
(532.27 + 434.13) 3 × 4 × 0.0813
- Erb = /(1 + ) × 20.67
(4 − 1)(3 − 1)
2
(4 + 1)
Thus, the relative precision = (32.2/24.7) × 100 = 130.4

This indicates that the precision of this experiment was increased by about 30.4% by using the triple
lattice instead of Randomized Complete Block Design.

To compare between adjusted treatment mean differences


− 2 Ee
sd = + (1 + rA) ; where Ee is intra-block error mean square; r is number
r
of replications and A is adjustment factor

2 × 20.67
LSD1% = t 0.005(21) + [1 + (3 × 0.0813) ] = 2.831× 3.88 = 10.97%
3

8.2 Augmented Block Design

Augmented Block Design is used:


- when there are more number of entries/genotypes
- when there is no enough seed for test entries
- when there is no sufficient fund and land resources for replicated trials
It is not powerful design, but it is used for preliminary screening of genotypes/entries.
There is no assumption of block homogeneity. Any new material is not replicated; it
appears only once in the experiment while check varieties/entries occur as the number of
the blocks.
- Error d. f. = (b-1) (c-1) where b = No. of blocks and c = Number of checks.
Thus, the minimum number of checks is 2 because error d. f. should be 12 or more for
valid comparison. If number of checks is 1, error d.f. becomes zero which is not valid.

Randomization and Layout


Divide the experimental field into blocks. The block size may not be equal, i.e. some
blocks may contain 10 entries while the others may contain 12 entries. Suppose we have
50 test cultures/progenies and 4 checks, we need at least 5 blocks since the error d. f. (c -
1) (b - 1) should be ≥ 12. The higher the number of blocks, the higher the precision. If we
have 5 equal blocks, block size will be 14 (10 test culture + 4 checks), but block size may
vary.

Randomization

A. Conveniency

For identification purpose, sometimes the checks are assigned at the start, end or in
certain intervals in the block.

Block 1 = P1 P2 P3 P4 …….. P10 A B C D


Block 2 = P11 P12 ...... P20 B C D A
Block 3 = P21 P22 ..... P30 C D A B
.
.
.
etc.
where P1, P2, .......... P30 are test entries and A, B, C, D are checks.

B. Randomly allocate the checks in a block.

b1 = P1 P2 A P3 B P4 D P5 C P6 P7 ..... P10
b2 = D P1 P2 C P3 P5 A B ….....
etc.

One of the blocks can be 10 (test culture) + 4 (checks), while the other 12 (test cultures) +
4 (checks). Missing test entries (Pi) do not create problem in analysis as the analysis can
be done with existing genotypes. But when checks are missing, the analysis becomes
complicated.

8.2.2 Augmented block design with equal block size

Assume we want to test 16 new rice genotypes (test cultures) = P1, P2, ..., P16 in block size
of 4 with 4 checks to screen for early maturity.
b (number of blocks) = 4
c (number of checks) = 4 (A B C D).
Block size = 4 test cultures + 4 checks = 8
Days to maturity of rice genotypes
Block 1
P1 A P2 B P3 C P4 D
120 83 100 77 90 70 85 65

Block 2
P5 B P6 C P7 A P8 D
88 76 130 71 105 84 110 64

Block 3
P9 D P10 A P11 B P12 C
102 63 140 86 135 78 138 69

Block 4
P13 A P14 D P15 C P16 B
84 82 90 63 95 68 103 75

Steps of Analysis of Variance


Preliminary steps
1. Calculate block total for each block.
b1 = 690 (120 + 83 + …. + 65); b2 = 728; b3 = 811; b4 = 660
2. Calculate total of progenies/test cultures in a particular block.
P block1 = 395; P block2= 433; P block3 = 515; P block4 = 372
3. Construct check by block two-way table and calculate check total, check mean, check
effect, sum of check totals & total of check means.

Check/ b1 b2 b3 b4 Check Check Check effect (check Check total x


block total mean mean-adjusted grand Check effect
mean)
A 83 84 86 82 335 83.75 -16.67 -5584.45
B 77 76 78 75 306 76.50 -23.92 -7319.52
C 70 71 69 68 278 69.50 -30.92 -8595.76
D 65 64 63 63 255 63.75 -36.67 -9350.85
Sum 1174 -30850.58
Total of check means 293.5

Blocks ni Block Total of test culture No. of test Block T i x Be Block effect
total (Bj) in a block (Bti) cultures in a effect(Be) x Block total
block (Ti)
B1 8 690 395 4 0.375 1.5 258.75
B2 8 728 433 4 0.375 1.5 273.00
B3 8 811 515 4 0.625 2.5 506.87
B4 8 660 372 4 -1.375 -5.5 -907.5

Σ 32 2889 0 0 131.12
1
- b1 = × (690 - 293.5 – 395) = 0.375
4
1
- b2 = × (728 - 293.5 - 433) = 0.375
4
1
- b3 = × (811 - 293.5 - 515) = 0.625
4
1
- b4 = × (660 - 293.5 - 372) = -1.375
4
Where ni is number of entries (test culture + checks) in each block = 4 + 4 = 8, ∑ni = N =
32.

3.1. Estimation of block effect.


1
- bi = × (Total of the ith block - total of all check means - total of all
no. of checks(C )
progenies/test cultures in the block]; ∑bi = 0
3.2. Calculate adjusted grand mean as:
1
×[Grand total - (b - 1) (total of all check
No. of checks + No. progenies / test culture
mean)] – ∑ no. of test cultures in a block × corresponding block effect]

Grand total = ∑Bi (sum of block total) or sum of all observations = 2889.0
1 2889−880.5
Adjusted grand mean = × [2889 - (4 - 1) (293.5) - 0] = = 100.42
4 + 16 20

Step 3.3. Estimate check effect (Ci):


Ci= Check mean - adjusted grand mean
C1= 83.75-100.42 = -16.67; C2= 76.5 - 100.42 =-23.92
C3 = 69.50-100.42 = -30.92; C4= 63.75-100.42 =-36.67

There are as many checks effects as the number of checks (4 in this case).
Step 3.4. Adjusted progeny value per ith progeny (Pi) as:

Observed (unadjusted) progeny value - effect of block in which the ith progeny is
occurring
P1 (adjusted) = P1 - (block effect) = 120 - (+ 0.375) = 119.62, etc.

Progeny/te Observed Block Adjusted Progeny effect (Adjusted Observed


st culture progeny effect progeny progeny value-Adjusted progeny value ×
no. (Pi) value (Po) (bi) value (Po- bi) grand mean progeny effect

1 120 0.375 119.625 19.205 2304.6

2 100 0.375 99.625 -0.795 -79.5

3 90 0.375 89.625 -10.795 -971.6


4 85 0.375 84.625 -15.795 -1343
5 88 0.375 87.625 -12.795 -1126
6 130 0.375 129.625 29.205 3796.7
7 105 0.375 104.625 4.205 441.53
8 110 0.375 109.625 9.205 1012.6
9 102 0.625 101.375 0.955 97.41
10 140 0.625 139.375 38.955 5453.7
11 135 0.625 134.375 33.955 4583.9
12 138 0.625 137.375 36.955 5099.8
13 84 - 85.375 -15.045 -1264
1.375
14 90 - 91.375 -9.045 -814.1
1.375
15 95 - 96.375 -4.045 -384.3
1.375
16 103 - 104.375 3.955 407.37
1.375
Sum 1715 17216

Step 3.5. Estimate progeny effect as: Adjusted progeny value - adjusted grand mean;
For example, progeny effect for progeny 1 = 119.62-100.42 = 19.20; etc.

Analysis of variance
G2 (Grand total ) 2 2
1. Correction Factor (C.F.) = = = ( 2889 ) = 260822.53
N No. of observatio ns 32
2. Total SS = ∑Y2 – C.F. = Sum of the squares of all observations – C.F. = (120)2 + (83)2
+ ... + (75)2 - 260822.53 = 276621-260822.53 = 15798.47
2
B (690) 2 (728) 2 (811) 2 (660) 2
3. Crude block SS = ∑ i = + + + = 262425.62
ni 8 8 8 8
4. True block SS: Crude block SS – C.F. = 262425.62 - 260822 .53 = 1603.09
5. Adjusted SS Due to entries (C + P) = (Adjusted grand mean × Observed grand total) +
[( ∑ Block effect × corresponding block total ) +
( ∑ Check effect × corresponding check total ) +

( ∑ Pr ogeny effect × corresponding observed progeny value) (Crude block sum

squares)]
= (100.42 × 2889) + [(131.12) + (-30850.58) + 17216– 262425.62] = 14184.3
6. Unadjusted SS due to entries
∑ (each check total )
2

= + Crude SS due to progenies − C.F .


b
A 2 + B 2 +C 2 + D 2
= + ( P1 2 + P2 2 +... + P16 2 ) − C.F
b
b= No. of blocks (4)
2
(335) 2 + (306) 2 + (278) 2 + (255)
+ (120)2 + (100)2 + …. + (103)2 – 260822.53
4

= 87042.5 + 189557 - 260822.53 = 15776.97

7. Partition SS due to entries (C + P) to components:

7.1. SS due to checks =


∑ Ci −
2
(∑ Ci ) 2
b bc
(335) + (306) +... + (255)
2 2 2
(335 + 306 + 278 + 255) 2
= −
4 4× 4
2
(348170) (1174)
= - = 87042.5 – 86142.25 = 900.25
4 16
b = No. of blocks; c = No. of checks (4)
7.2. SS due to test cultures/progenies
(∑ Pi ) 2 (1715) 2
= ∑ Pi −2
= (120) + (100) +..... + (103) −
2 2 2

P 16
= 189557 – 183826.56 = 5730.44
P = No. of progenies/test cultures = 16

7.3. SS due to checks × test cultures/progenies = Unadjusted SS due to entries (C + P) -


SS due to checks - SS due to test culture/progenies
= 15776.97 – 900.25 – 5730.44 = 9146.28
8. SS due to error: Total SS – True blocks SS - SS due to entries (adjusted)
= 15798.47 – 1603.09 – 14184.3 = 11.08
9. Summarize the results of analysis in ANOVA table
________________________________________________________________________
Source D.F. SS MS F-cal. F-Table
5% 1%
________________________________________________________________________
Block (b-1) 3 1603.09 534.36
Adjusted entries (C+P-1) 19 14184.3 746.54
Unadjusted entries(C+P-1) 19 15776.97 830.37
. Checks (C-1) 3 900.25 300.08 243.97** 3.86 6.99
. Test culture (P-1) 15 5730.44 382.03 310.59** 3.01 4.96
. Test culture x check 1 9146.28 9146.28 7436.00** 5.12 10.56
Error (b-1) (C-1) 9 11.08 1.23
Total (N – 1) 31 15798.47
________________________________________________________________________

10. Mean Comparison

i. To compare any two check means at 5% level of significance


− 2 × MSE 2 × 2.2
LSD 5% = t 0.025 (9) x s d = 2.262 × = 2.262 × = 2.37 days
b 4

ii. To compare two progenies/test materials occurring in the same block at 5% level of
significance


t0.025(9) x s d = 2.262 × 2 × MSE = LSD5% = 2.262 × 2 × 2.20 = 4.74 days

iii. To compare two test cultures (progenies) occurring at different block at 5% level of
significance

1
LSD 5% = t 0.025 (9) × 2 × MSE (1 + ) ; c = No. of checks
c
1
= 2.262× 2 × 2.20(1 + ) = 5.3 days
4

iv. To compare a progeny/test culture with any check at 5% level of significance


LSD 5% = t 0.025 (9) × MSE (1 + 1 / b + 1 / c + 1 / bc ; b = No. of blocks; C= no. of checks

= 2.262 × 2.20(1 + 1 / 4 + 1 / 4 + 1 / 16 ) = 2.262 × 3.4375 = 4.2 days

9. FACTORIAL EXPERIMENTS

9.1 Simple Effects, Main Effects and Interaction

Factorial experiments are experiments in which two or more factors are studied together.
Factor is a kind of treatment and in a factorial experiment any factor will supply several
treatments. In factorial experiment, the treatments consist of combinations of two or more
factors each at two or more levels.

Factorial experiment can be done in CRD, RCBD and Latin Square Design as long as the
treatments allow. Thus, the term factorial describes specific way in which the treatments
are formed and it does not refer to the experimental design used, e.g. nitrogen &
phosphorus rates:
N = 0, 50, 100, 150 kg/ha
P = 0, 50, 100, 150 kg/ha

Kinds (noug cake, groundnut cake) and levels of protein supplement (25%, 50%, 75%).
The term level refers to the several treatments within any factor, e.g. if 5-varieties of
sorghum are tested using 3-different row spacing the experiment is called 5 x 3 factorial
experiment with 5 levels of variety factor (A) and three levels of spacing factor (B). An
experiment involving 3 factors (variety, N-rate, weeding method) each at 2 levels is
referred as 2 × 2 × 2 or 23 factors; 3 refers to the number of factors and 2 refers to levels.
Here we have 8 treatment combinations variety (x, y), N-rate (0, 50 kg/ha), weeding
(with or without weeding). The 23 × 3 is a four factor experiment in which three factors
each at 2-levels and the 4th factor at 3 levels.

If the above 23 factorial experiment is done in RCBD, the correct description of the
experiment will be 23 factorial experiment in RCBD.

Interaction

Sometimes the factors act independent of each other. By this we mean that changing the
level of one factor produces the same effect at all levels of another factor. Often,
however, the effects of two or more factors are not independent. Interaction occurs when
the effect of one factor changes as the level of the other factor changes, e.g. if the effect
of 50kg N on variety x is 10 Q/ha and its effect on a variety y is 15 Q/ha, then there is
interaction. When factors interact, the factors are not independent and a single factor
experiment will lead to disconnected or misleading information. However, if there is no
interaction it is concluded that the factors under consideration act independently of each
other. Thus, results from separate single factor experiments are equivalent to those from a
factorial experiment.

Example: A tall maize variety might out yield a short variety in high fertilizer rates due
to high dry matter production.

Interaction is the failure of the differences in response to changes in levels of one factor
to be the same at all levels of another factor or when the effect of one factor changes as
the level of the other factor changes.

2 x 2 Factorial data of wheat yield (t/ha)


________________________________________________________________________
N-rate (kg/ha) (Factor B)
_____________________________ Simple effect of
Variety (Factor A) 0 (b0) 50 (b1) nitrogen on variety
________________________________________________________________________
X (a0) 1.0 1.0 (a0b1-a0b0) = 0
Y (a1) 2.0 4.0 (a1b1-a1b0) = 2
Simple effect of
Variety (a1b0-a0b0) = 1 (a1b1-a0b1) = 3
________________________________________________________________________

Simple effects
- Simple effect of variety at N0: 2-1 = 1
- Simple effect of variety at N1: 4-1 = 3
- Simple effect of N on variety X: 1-1 = 0
- Simple effect of N on variety Y: 4-2 = 2

Main effects are the averages of the simple effects


- Main effect of variety = ½ (Simple effect of A at bo + Simple effect of A at b1)
= ½ [(a1b0-a0b0) + (a1b1-a0b1)] = ½ [(2-1) + (4-1)] = 2
- Main effect of nitrogen = ½ (Simple effect of factor B at ao + Simple effect of factor
A at b1) = ½ [(a0b1-a0b0) + (a1b1-a1b0)] = ½ [(1-1) + (4-2)] = 1

Interaction
It is calculated as the average of difference between simple effects of A at the two levels
of B or the difference between the simple effects of B at the two levels of A.
= ½ (Simple effect of A at b1 – simple effect of A at b0)
= ½ [(a1b1-a0b1) - (a1b0-a0b0)] = ½ [(4-1) - (2-1)] = 1
or
= ½ (Simple effect of B at a1 – simple effect of B at a0)
= ½ (a1b1-a1b0) - (a0b1-a0b0)] = ½ [(4-2) - (1-1)] = 1

In factorial experiments, the following points should be noted:


a. An interaction effect between two factors can be measured only if the two factors
are tested together in the same experiment.
b. When interaction is absent, the simple effect of a factor is the same for all levels
of the other factors and equals to the main effect.
c. When interaction is present, the simple effect of a factor changes as the level of
the other factor changes.

Disadvantages of factorial experiments


a. As the number of factors increase, the size of experiment becomes very large, e.g:
with 8 factors each at 2-levels, there are 28, 256 treatment combinations. Thus,
experiments with this many treatments are costly to run.
b. Large factorial experiments are difficult to interpret especially when there are
interactions.

Uses of factorial experiments


a. In exploratory experiments, where the aim is to examine a large number of factors
to determine as which ones are important and which are not.
b. To study relationships among several factors, to determine the presence and
magnitude of interaction
c. In experiments designed to lead to recommend over a wide range of conditions.

9.2 Two Factor Factorial in Randomized Complete Block Design

Example: An agronomist wanted to study the effect of different rates of phosphorus


fertilizer on two varieties of common bean (Phaseolus vulgaris). He thought that the
varieties might respond differently to fertilizer so he decided to use a factorial experiment
with 2- factors: variety at two levels (T1 = Indeterminate T2 = Determinate) and
phosphorus rate at 3 levels (P1 = none, P2 = 25 kg/ha, P3 = 50 kg/ha).

Using the full factorial set of combinations, he had six treatments:


T1P1; T1P2; T1P3; T2P1; T2P2; T2P3,

He conducted this experiment using Randomized Complete Block Design with four
blocks of six plots each.

Field layout and yield of common bean (Q/ha)

Block-I
T2P2 T2P1 T1P1 T2P3 T1P3 T1P2
8.3 11.0 11.5 15.7 18.2 17.1

Block-II
T2P1 T2P2 T2P3 T1P2 T1P1 T1P3
11.2 10.5 16.7 17.6 13.6 17.6

Block-III
T1P2 T1P1 T2P1 T1P3 T2P3 T2P2
17.6 14.3 12.1 18.2 16.6 9.1

Block-IV
T1P3 T2P2 T2P3 T2P1 T1P2 T1P1
18.9 12.8 17.5 12.6 18.1 14.5

The linear model for Two Factor Randomized Block Design:


Yijk = µ + αi + βj + ϒk + αϒik + εijk
where, Yijk = the value of the response variable; µ = Common mean effect; αi = Effect of
factor A; βj = Effect of block; ϒk = Effect of factor B; αϒik = Interaction effect of factor A
& factor B; and εijk = Experiment error (residual) effect

Steps of Analysis of Variance

1. Construct two way table for factors and calculate factor A total, Factor B total and
grand total
_____________________________________________________________________
Phosphorus (Factor B)
_________________________________________________
Variety (Factor A) P1 P2 P3 Factor A total (A)
_____________________________________________________________________
T1 (indeterminate) 53.9 70.4 72.9 197.2
T2 (determinate) 46.9 40.7 66.5 154.1

Factor B total (B) 100.8 111.1 139.4 351.3(G)


_____________________________________________________________________

Block total
Block I II III IV
Total 81.8 87.2 87.9 94.4

2. Using r as number of blocks, a as level of factor A, b level of factor B, compute C.F.,


total SS, block SS, treatment SS and Error SS

G2 (351.3) 2
- C.F. = = = 5142.15; where r is number of replications, a is level of
rab 4× 2×3
factor A and b is level of factor B

- Total SS =(8.3)2+(11)2+... + (14.5)2 – 5142.15 = 243.38


(81.8) 2 + (87.2) 2 +... + (94.4) 2
- Block SS = - 5142.15 = 13.32
6( ab)
(53.9) 2 + (46.9) 2 +... + (66.5) 2
- Treatment SS = - 5142.15= 221.38
4( r )
- Error SS = Total SS– Block SS – Treatment SS= 243.38 -13.32 - 221.38= 8.68

3. Compute the three factorial components of treatment SS [partition treatments SS in to


factor A SS, factor B SS, and A x B (interaction) SS]

- Factor A (variety) SS = ∑A 2
(197.2) 2 + (154.1) 2
- C.F = - 5142.15 = 77.40
rb 4×3

- Factor B (P-rate ) SS = ∑ B2
- C.F =
(100.8) 2 + (111.1) 2 + (139.4) 2
– 5142.15 = 99.87
ra 4× 2
- A × B SS = Treatment SS – Factor A SS – Factor B SS =221.38 –77.40 – 99.87 = 44.11

ANOVA TABLE
______________________________________________________________________________
Source DF SS MS F-calcul. F-table
5% 1%
______________________________________________________________________________
Block r-1(4-1) = 3 13.32 4.44 7.65** 3.29 5.42
Variety (V) a-1 (2-1) = 1 77.40 77.40 133.45** 4.54 8.68
Phosphorus (P) b-1 (3-1) = 2 99.87 49.93 86.09 3.68 6.36
V×P (a-1) (b-1) = 2 44.11 22.05 38.03 3.68 6.36
Error (r-1) (ab-1) = 15 8.68 0.58
Total rab -1= 23 243.38
______________________________________________________________________________

Error MS 0.58
CV = × 100; CV = × 100 = 5.2%
Grand mean 14.64
Interpretation of a factorial experiment

The interpretation of the results of factorial experiment depends on the outcome of the
significance tests. If factor A × factor B interaction is significant, the main effects have
no real meaning whether significant or not. In our case, since A × B interaction is highly
significant, the results of experiment are best summarized in a two way table means of
various A × B combinations. If interaction is not significant, then all of the information in
the trial is contained in the significant main effects. In this case the results may be
summarized in tables of mean for factors with significant main effects.

Mean Comparisons

There are three types of means in a two factor factorial experiment.


- Factor A means
- Factor B means
- Factor combinations (AB) or treatment means

Variety × Phosphorus rate means

_____________________________________________________________________
Phosphorus (B)
_________________________________________________
Variety (A) P1 P2 P3 Variety mean
_____________________________________________________________________
T1 (indeterminate) 13.47 17.60 18.22 16.43
T2 (determinate) 11.72 10.17 16.62 12.84

Phosphorus mean 12.59 13.88 17.42


_____________________________________________________________________


Standard error of mean differences ( Sd )
− − 2 × MSE 2 × 0.58
- Sd to compare any two factor A means: Sd A = = = 0.31 Q
rb 4×3
− − 2 × MSE 2 × 0.58
- Sd to compare any two factor B means: Sd B = = = 0.38 Q
ra 4× 2
− − 2 × MSE
- Sd to compare any two factor combination (treatment) means: Sd AB =
r
2 × 0.58
= = 0.54 Q
4
9.5 Split-Plot Design

9.5.1 Uses, advantages and disadvantages

Split-plot design is frequently used for factorial experiments where the nature of
experimental material makes it difficult to handle all factor combination. The principle
underlying is that the levels of one factor are assigned at random to large experimental
units. The large units are then divided into smaller units and then the levels of the second
factor are assigned at random to small units within large units.

The large units are called the whole units or main-plots whereas the small units are called
the split-plots or sub-plots (units). Thus, each main plot becomes a block for the sub-plot
treatments. In split-plot design, the main plot factor effects are estimated from larger
units, while the sub-plot factor effects and the interactions of the main-plot and sub-plot
factors are estimated from small units.

As there are two sizes of experimental units, there are two types of experimental error,
one for the main plot factor and the other for the sub-plot factor. Generally, the error
associated with the sub-plots is smaller than that for the whole plots due to the fact that
error degrees of freedom for the main plot are usually less than those for the sub-plots.

In split-plot design, the precision for the measurement of the effect of main plot factor is
sacrificed to improve the precision of the measurement of the sub-plot factors.

Situations when to use split-plot design?


a. When the level of one or more of the factors require larger amounts of
experimental units than another. For instance, in field experiments, one of the
factors could be method of land preparation (tractor, oxen, hand) and method of
fertilizer application (broad cast, drill). These factors usually require larger
experimental plots (units). The other factor could be varieties which can be
compared using smaller units (plots). In this case methods of land preparation and
fertilizer application can be assigned to main-plots and the varieties to the sub-
plots.

b. When an additional factor is to be incorporated in an experiment to increase its


scope. For example, if the major purpose of an experiment is to compare the
effect of several vaccines as a protectant against infection from certain disease of
animals, to increase the scope of the experiment, several breeds of animals can be
included which are known to differ in their resistance to disease. Here, the breeds
of animals could be arranged in main units and the vaccines to the subunits.

c. When greater precision is desired for comparison of certain factors than others.
Since in a split-plot design, plot size and precision of measurement of the effects are not
the same for both factors, the assignment of a particular factor to either the main-plot or
to the sub-plot is extremely important.

Guidelines to apply factors either to main-plots or sub-plots:


a. Degree of precision required: Factors which require greater degree of precision
should be assigned to the sub-plot. For example, animal breeder testing three
breeds of dairy cows under different types of feed stuff, will assign the breeds of
animals to sub-units and the feed stuffs to the main unit. On the other hand,
animal nutritionist may assign the feeds to the sub-units and the breeds of animals
to the main-units as he is more interested on feed stuffs than breeds.

b. Relative size of the main effect: If the main effect of one factor (factor A) is
expected to be much larger and easier to detect than factor B, then factor A can be
assigned to the main unit and factor B to the sub-unit. For instance, in fertilizer
and variety experiments, the researcher may assign variety to the sub-unit and
fertilizer rate to the main-unit, because he expects fertilizer effect to be much
large and easier to detect than the varietal effect.

c. Management practice: The factors, which require smaller amounts of


experimental material, should be assigned to sub-plots. For example, in an
experiment to evaluate the frequency of irrigation (5, 10, 15 days), on
performance of different tree seedlings on nursery, the irrigation frequency factor
could be assigned to the main plot and the different tree species to the sub-plots to
minimize water movement to adjacent plots.

Advantages
a. It permits the efficient use of some factors, which require large experimental units
in combination with other factors, which require small experimental units.
b. It provides increased precision in comparison of some of the factors (sub-plot
factors).
c. It promotes the introduction of new treatments into an experiment, which is
already in progress.

Disadvantages:
a. Statistical analysis is complicated because different factors have different error
mean squares.
b. Low precision for the main plot factor can result in large differences being non-
significant, while small differences on the sub-plot factor may be statically
significant even though they are of no practical significance.

9.5.2 Randomization and layout

There are two separate randomization process in split-plot design, one for the main plot
factor and another for the sub-plot factor.
In each block, the main plot factors are first randomly applied to the main plots followed
by random assignment of the sub-plot factors. Each of the randomization is done by any
of the randomization schemes.

Example: An experiment was designed to test the effect of feeding four forage crops
(Rhodes grass, Vetch, Alfalfa and Oat) on weight gain (kg/month) of the two breeds of
cows (Zebu, Holstein). At the start of the experiment, it was assumed that breeds of cows
would respond differently to the feed stuffs. Therefore, it was decided to use factorial
experiment. The objective of the experiment was to compare the effect of forage crops as
precisely as possible. Therefore, the experimenter assigned the breeds of animals to the
main-plot and the four forage crops to the sub-plots. The experiment was replicated in
three blocks (barns) based on initial body weight of animals as a blocking factor.

Procedures of randomization
Step 1: Divide the experimental area into r = 3 blocks, and divide each block into two
main plots. Then randomly assign the two breeds of animals (H, Z) in each of the blocks.

Note that the arrangement of the main-plot factor can follow any of the designs: CRD,
RCBD and LATIN square.

Step 2: Divide each of the main plot (unit) into 4-sub plots (units) and randomly assign
the four feed stuffs (A, V, O, R) to each of the six-main plots (units).

Note:
Each main-plot factor is tested r-times where r is the number of blocks while each sub-
plot factor is tested a × r times where a is level of factor A and r is the number of blocks.
This is the primary reason for more precision for the sub-plot factors as compared to the
main-plot factors.

The layout and the weight gain (kg/month) of the animals for feeding are given below:
Block I Block II Block III
H Z H Z Z H
A R O V O V
25.9 15.5 18.0 22.7 13.2 28.4

V A A O A A
25.3 18.9 26.7 13.5 19.6 27.6

O O V R V R
19.3 13.8 24.8 15.0 22.3 25.4

R V R A R O
22.2 21.0 24.2 18.3 15.2 20.5

- Main-plot factor is breed of animals: Holstein (H), Zebu (Z).


- Split-plot (units) factor is feed stuffs: Alfalfa (A), Vetch (V), Rhodes grass (R), Oat (O)
10. COMPARISON OF TREATMENT MEANS

The F-test (ANOVA) shows whether there is significant difference among treatments or
not. But, it does not show as which means are different from each other. There are many
ways to compare the means of treatments tested in an experiment. One of these is pair
comparison, the simplest and most commonly used comparisons in agricultural research.

There are two types of pair comparisons:

A. Planned pair comparison: In which the specific pair of treatments to be compared are
identified before the start of the experiment, e.g. comparing the control treatment with
each of the other treatments.

B. Unplanned pair comparison: In which no specific comparison is chosen in advance.


Instead, every possible pair of treatment means are compared to identify pairs of
treatments that are significantly different, e.g. variety trials.

The most commonly used test procedures for pair comparison in agricultural research are
the Least Significant Difference and Tukey’s test which are suitable for planned pair
comparison and Duncan’s Multiple Range Test (DMRT) which is applicable to an
unplanned pair comparison.

10.1 Least Significant Difference (LSD) Test

The simplest and the most commonly used procedure for making pair comparisons. The
procedure provides a single value at a prescribed level of significance, which serves as
the boundary between significant and non-significant differences between any pair of
treatment means. That is, two treatments are declared significantly different at a
prescribed level of significance if their mean difference exceed the computed lsd value,
otherwise they are not significantly different.

The LSD test is not valid for comparing all possible pair of means especially when the
number of treatments is large. This is so because the number of possible pairs of
treatment means increase rapidly as the number of treatments increase. In experiments
where no real difference exists among all treatments, the numerical difference between
the largest and smallest treatment means is expected to exceed the LSD value when the
number of treatments is large.

To avoid this problem, the LSD test is used only when the F-test for treatment effect is
significant and the number of treatments is not too large (less than six).

The procedure for applying the LSD test to compare any two treatments means
1. Rank the treatment means from the largest to the smallest in the column and from the
smallest to largest in rows.
2. Compute all possible differences between the two treatment means to be compared.
3. Compute the LSD value at α level of significance
LSDα = tα/2 (n) × s d
where s d = standard error of the treatment mean difference; tα/2 (n) is the table t-value at
α/2 level of significance and with n error degree of freedom

Example: Oil content (g) of linseed treated at different stages of growth with N-fertilizes
tested in RCBD in four replications with error mean square of 1.31.
2 MSE
LSD5% = t0.025(15) × , where MSE is error mean square; r is the number of
r
2 × 1.31
replications = 2.131 × = 1.72 g
4
2 MSE 2 × 1.31
LSD1% = t 0.005(15) × = 2.947 × = 2.39 g
r 4

4. Compare the mean difference (d) in step 2 with LSD value computed in step (3) using
the following rule:
- if /d/ >LSD value at 1% level of significance, there is highly significant difference
between the two treatment means compared (put two asterisks on differences).
- if /d/>LSD value at 5% level of significance but < LSD value at 1% level of
significance, there is significant difference between the two treatment means
compared (put one asterisks on differences)
- if /d/ ≤ LSD value at 5% level of significance, the two treatment means compared are
not significantly different (put n.s.)

No Treatments (stage of application Treatment mean (g)


of N)
1 Seedling 5.10
2 Early blooming 4.30
3 Half blooming 4.00
4 Full Blooming 6.70
5 Ripening 6.05
6 Unfertilized (control) 7.03

Treatments 4.00 4.30 5.10 6.05 6.70 7.03


(T3) (T2) (T1) (T5) (T4) (T6)
7.03 (T6) 3.03** 2.73** 1.93* 0.98ns 0.33ns -
6.70 (T4) 2.70** 2.40** 1.60ns 0.65ns -
6.05 (T5) 2.05* 1.75* 0.95ns -
5.10 (T1) 1.10ns 0.80ns -
4.30 (T2) 0.30ns -
4.00 (T3) -

Thus, the differences between T6 & T3, T6 & T2, T4 & T3, T4 & T2 are highly
significant; while the differences between T3 & T5, T1 & T6, T2 & T5 are significant.

t (t − 1)
Note that there are possible (unplanned) pair comparisons and (t-1) planned pair
2
comparisons where t is the number of treatments. In the above example, 15 unplanned
pair comparisons and five planned pair comparisons are possible.

Presentation of data using LSD

Table __. Mean oil content of linseed treated with nitrogen fertilizer at different stages
___________________________________________
Stage of application Oil content (g)
___________________________________________
Seedling 5.10
Early blooming 4.30
Half-blooming 4.00
Full- blooming 6.70
Ripening 6.05
Unfertilized 7.03
___________________________________________
LSD(0.05) 1.72 g
CV (%) 20.7

10.2 Duncan’s Multiple Range Test (DMRT)

It is most widely used to make all possible pair comparisons. The procedure for applying
the DMRT is similar to LSD test but it requires progressively larger values for
significance between the treatment means as they are more widely separated in the array.

The test is more appropriate when the total number of treatments is large. It involves the
calculation of the shortest significant difference (SSD).

The SSD is calculated for all possible relative positions (P) between the treatment means
when the means are arranged in order of magnitude (in decreasing or increasing order).

Procedure
Step 1: Arrange all the treatment means in increasing or decreasing order. Data such as
milk & crop yield are usually arranged from the highest to the lowest.

Example: Yields (kg/plot) of wheat varieties grown in 4 by 4 Latin Square Design with
error mean square of 0.45:
B (12.3) A (12.00) C (10.8) D (6.7)

Step 2: Calculate sd (the standard error of the treatment mean difference) as:
2 MSE 2 × 0.45
sd = = sd = = 0.47 kg
r 4
Step 3: Calculate the shortest significant difference (SSD) for relative positions (P) in the
array of means. Since we have four treatment means they can be 2, 3 and 4 distance apart.
B and A are 2 distance apart (P = 2); B and C are 3 distance apart (P = 3); B and D are 4
distance apart (P = 4); A and D are 3 distance apart (P = 3); etc.

For the above example, the R values with error d. f. of 6 at 1% level of significance are
found from R-table.

P= 2 3 4
R0.01 = 5.24 5.51 5.65
SSD = 1.74 1.83 1.88

P = the distance in ranks between the pairs of treatment means to be compared.

R= significant student size range at error d. f. (6).


R × sd
SSD (Shortest Significant Difference) =
2
5.24 × 0.47
SSD at P = 2 = = 1.74
2
5.51 × 0.47
SSD at P = 3 = = 1.83
2
5.65 × 0.47
SSD at P = 4 = = 1.88
2

Note that SSD values increase as the distance between treatments (P) to be compared
increases.

Step 4:Test the difference between treatment means in the following order.
− Largest – Smallest = 12.3 – 6.7 = 5.6 compare with SSD value at (P = 4) = 1.88; d
(5.6) > SSD at P = 4 (1.88); thus the difference is significant at 1% level of
significance.
− Largest – 2nd smallest = 12.3 – 10.8 = 1.5 compare with SSD at (P=3) = 1.84; d
(1.5) < SSD at P= 3 (1.84); thus, the difference is non-significant at 1% level of
significance.
− Largest – 2nd largest = 12.3 – 12.0 = 0.3 < SSD at P = 2 (1.75); thus, the difference
is non-significant at 1% level of significance.
− 2nd largest – smallest = 12.0 - 6.7 = 5.3 compared with SSD at P = 3 (1.84); d (5.3)
> SSD (1.84) at P = 3; significant at 1% level of significance
− 2nd smallest – smallest = 10.8 – 6.7 = 4.1 compared with SSD at P = 2 (1.75); d
(4.1) > SSD (1.75) at P = 2; significant
− etc
Note that SSD value at (P=2) is equals to LSD value
________________________________________________________
B (12.3) A (12.0) C (10.8) D (6.7)
_____________________________________________
D (6.7) 5.6** (P=4) 5.3** (P=3) P = 4.1** (P=2) -
ns ns
C (10.8) 1.5 (P=3) 1.2 (P=2) -
A (12.0) 0.3ns (P=2) -
B (12.3) -
________________________________________________________

SSD (P = 4) = 1.88; SSD (P =3) = 1.83; SSD (P =2) = 1.74

Treatments B & D, A & D, C & D are significantly different at 1%, while treatments B &
C, B & A, and A & C are not significantly different at 1% level of significance.

Step 5: Present the test result in one of the following two ways.
A. Use a line notation if the sequence of results can be arranged according to their ranks.
B. Use the alphabet notation if the desired sequence of the results is not based on their
rank which is commonly used.

Any two means underscored by the same line are not significantly different at 1% level of
significance according to DMRT.
B(12.3) A(12.0) C(10.8) D(6.7)

The alphabet notation can be derived from line notation simply by assigning the same
alphabet to all treatment means connected by the same horizontal line. It is usual practice
to assign letter a for the first line, b for second line, c for third and so on. Note that letter a
can be for the largest or smallest treatment mean depending on the rank of arrangement.

Presentation of data using DMRT

Table Mean yields of wheat varieties planted at Debrezeit Agricultural Research Center.

Variety Yield (kg)


A 12.0 a
B 12.3 a
C 10.8 a
D 6.7 b
Note that we have to put a footnote below the table stating that any two means in the
same column followed by the same letter are not significantly different at 1% level of
significance according to DMRT. Note also that both LSD and DMRT are not used in the
same table. Use either of them depending on the appropriateness of the test.

10.3 Tukey’s Test

It is more conservative than LSD test because it requires the largest treatment mean
differences for significance.

It is computed in a manner similar to the LSD test except standard error of the mean is
used instead of standard error of the mean difference ( sd ).

Studentized range (q-table) is used in place of t-table

The procedure involves:


1. Select a value from q table, which depends on number of means (n) and error
degree of freedom (v).
MSE
2. Compute the Critical Difference (CD) as = qα(n, v) × where MSE is error
r
mean square; n is number of means to be compared; v is error degrees of freedom
and r is number of replications.
3. For any pair of means if the absolute value of the difference /d/ > critical value,
the difference is judged to be significant at a prescribed level of significance.

Example: The following analysis of variance table is from CRD with six varieties
replicated four times in glass house (mean rust incidence)
__________________________________________________________________
Source d. f. MS F-cal. F-table (5%)
____________
Variety (t-1) = 5 2976.44 24.80** 2.77
Error t(r-1) = 18 120.00
___________________________________________________________________

Variety: 1 2 3 4 5 6
Mean stem rust incidence (%): 50.3 69.0 24.0 94.0 75.0 95.3

n= 6; V= 18; q0.05 (6, 18) = 4.495

120
CD = qα × MSE / r = 4.495 × = 24.62%
4
Difference between means
________________________________________________________________________
24.0(3) 50.3(1) 69.0(2) 75(5) 94(4) 95.3(6)
_________________________________________________________
95.3(6) 71.3* 45.0* 26.3* 20.3ns 1.3ns -
94(4) 70.0* 43.7* 25.0* 19.0ns -
ns
75(5) 51.0* 24.7* 6.0 -
69(2) 45.0* 18.7ns -
50.3(1) 26.3* -
24.0(3) -
________________________________________________________________________

Thus, differences between varieties 6&3, 4&3, 5&3, 2&3, etc. are significant while
differences between varieties 2&1, 5&2, etc. are non-significant.

10.4. Pair Comparisons with Missing Data

In applying the LSD test and DMRT, it is important that the appropriate standard error of
the mean difference (s d ) should be used. s d is affected by the experimental design
used, the number of replications of the two treatments being compared, and the specific
type of means to be compared.

A). In CRD, RCBD and Latin Square Design where the number of replications for all
treatments is equal, the s d for any pair of treatment means is computed as:
2 MSE
sd =
r

Where, MSE is mean square for error; r = number of replications that is common to all
treatments.
2 × MSE
Thus, lsdα = tα / 2 (n ) where n is error degree freedom.
r

B). When the two treatments do not have the same number of replications in CRD. sd is
computed as:
1 1
sd = MSE  + 
r r 
 i j 

where MSE is mean square error; ri and rj are the number of replications of the two
treatment means (i & j) to be compared.
1 1
Thus, LSDα= tα / 2 (n) MSE  + 
r r 
 i j 
Example: CRD with an unequal replications, effect of 4 – types of feedstuff on weight
gain of chicks.
Treatment A = given to 5- chicks (5-replications) = 43.8 g
Treatment B = given to 4 chicks (4 replications) = 73.0 g
Treatment C = given to 3 chicks (3 replications) = 73.33 g
Treatment D = given to 5 chicks (5 replications) = 142.8 g

To compare treatment B with treatment D:

Given error mean square of 843.1 and error degree of freedom of 13, test if there is
significant difference between treatments B & D.

Difference between treatment mean: (D – B) = d = 142.8-73.0 = 69.8g


 1 1  1 1
LSD5% = t0.025 (13) × MSE  +  = 2.160 × 843.1×  +  = 42.07g
 rB rD   4 5
 1 1  1 1
LSD1% = t0.005 (13) × MSE  +  = 3.012 × 843.1×  +  = 58.67g
 rB rD   4 5
Compare treatment mean difference d with the calculated lsd value. Since d 69.8 >
LSD value at 1% (58.67), there is a highly significant difference between treatments B
and D or treatment D significantly increased the weight of chicks as compared to
treatment B.

C). sd for the treatments with a single missing value and that of any other treatment
without missing values.
− 2 t  −
a) For RCBD: sd = MSE +  . Thus, LSD =t
α α/2 (error d.f.) × s d where
r r(r −1)(t −1)
MSE is mean square of error; t = no. of treatments; r = no. of replications.
− 2 1  −
b) Latin Square Design: sd = MSE +  . Thus, LSD α= t α/2 (error d.f.) × s d
r (t −1)(t −2)
where MSE is mean square error of the analysis of variance of Latin Square
Design with a single missing value; r = number of replications.

11. ANALYSIS OF COVARIANCE

The analysis of covariance simultaneously examines the variance and covariance of


selected variables so that the character of primary interest is more accurately
characterized than by the use of analysis of variance only. Analysis of covariance
requires measurement of the character of interest and the measurement of one or more
variable(s) known as covariate(s). It also requires that the functional relationship of the
co-variates (x) with the character of primary interest (y) is known before hand.
Examples: Consider wheat variety trial in which weed infestation is used as a co-variate
with a known functional relationship between weed incidence and grain yield (the
character of primary interest), the covariance analysis can adjust grain yield in each plot
to a common level of weed incidence. With the covariance analysis, the variation in yield
due to weed incidence is quantified and effectively separated from that due to varieties,
e.g. age or initial body weight of experimental animals can be used as a covariate and
weight gain due to rations as character of interest.

Covariance analysis can be applied to any number of covariates and to any type of
functional relationships between variables. In this section, however, we will deal with the
case of a single covariate whose relationships to character of primary interest is linear.

11.1 Uses of Covariance Analysis

1. To Control Experimental Error


One way to reduce experimental error is by using proper type of blocking. However,
blocking can not cope with certain types of variability such as spotty soil heterogeneity
and unpredictable insect/disease incidence. In such cases heterogeneity between
experimental plots does not follow a definite pattern. Thus, use of covariance analysis
should be considered in experiments in which blocking can not adequately reduce the
experimental error. By measuring an additional variable (i.e covariate) that is known to
be linearly related to the primary variable (y) the source of variation associated with the
covariate can be deducted from experimental error.

The experimental error is reduced and the precision for comparing treatment increased,
e.g. in a cattle feeding experiment to compare the effects of several rations on weight
gain, animals assigned to any one block may vary in initial weight. Now if the initial
weight is correlated with gain in weight, a portion of experimental error for gain can be
the result of differences in initial weight. By covariance analysis, a contribution, which
can be attributed to differences in initial weight, can be computed and eliminated from
experimental error.

2. Adjustment of Treatment Mean


With the covariance analysis, the primary variable (y) can be adjusted linearly up wards
or down wards depending on the relative size of its respective covariate (x). The
treatment means of dependent variable is adjusted to a value that it would have had there
been no differences in the value of co-variate, e.g. age of cows, initial body weight of the
cow, etc. In situations where real differences among treatments for the independent
variable do occur but are not the direct effect of the treatments, adjustment is warranted,
e.g. in variety trial, if seeds may differ widely in germination, not because of inherent
differences of the varieties, adjustment can be done, but if the density is treatment by
itself there is no need to adjust the means.
3. Interpretation of Experimental Results
Covariance analysis aids the experimenter in understanding the principles underlying the
results of an investigation. By examining the primary character of interest (y) together
with other characters (x) whose functional relationship to y are known, the biological
process governing the treatment effects on y can be characterized more clearly.

4. Estimation of Missing Data


The missing data formula technique biases the treatment sum of squares upwards. The
use of covariance analysis to estimate the missing value(s) results in a minimum residual
sum of squares and unbiased treatment sum of squares.

11.2. Computation Procedure

Covariance analysis is an extension of analysis of variance. It is a combination of


analysis of variance and linear regression. Covariance analysis can be used for CRD;
RCBD and split-plot designs, but the computation procedures vary some what.

11.2.2 Computation procedure for RCBD

The following data show ascorbic acid content of ten varieties of common bean. From the
previous experience it was known that increase in maturity resulted in decrease in vitamin
C content. Since all varieties were not of the same level of maturity on the same day, it
was not possible to harvest all plots at the same stage of maturity. Hence, the percentage
of dry matter based on 100 g of freshly harvested beans was observed as an index of
maturity and used as a covarite.

Ascorbic acid content (ASAC, mg/100 g of seed) and percentage of dry matter (% DM)
for common bean varieties:
___________________________________________________________
Block I Block 2 Block 3
_____________ _____________ ______________
Variety % DM ASAC % DM ASAC % DM ASAC
____________________________________________________________
1 34 93 33 95 35 92
2 40 47 40 51 51 33
3 32 81 30 100 34 72
4 38 67 38 74 40 65
5 25 119 24 128 25 125
6 30 106 29 111 32 99
7 33 106 34 107 35 97
8 34 61 31 83 31 94
9 31 80 30 106 35 77
10 21 149 25 151 23 170
_______________________________________________________________
Conduct the analysis of covariance & calculate standard error of mean difference.

Treatment % DM ASAC % DM ASAC % DM ASAC Treatment total


X Y X Y X Y X Y
1 34 93 33 95 35 92 102 280
2 40 47 40 51 51 33 131 131
3 32 81 30 100 34 72 96 253
4 38 67 38 74 40 65 116 206
5 25 119 24 128 25 125 74 372
6 30 106 29 111 32 99 91 316
7 33 106 34 107 35 97 102 310
8 34 61 31 83 31 94 96 238
9 31 80 30 106 35 77 96 263
10 21 149 25 151 23 170 69 470
Block total 318 909 314 1006 341 924 973 2839

Steps of Analysis

1. Conduct analysis of variance for each of the variables, covariance and


sum of treatment and error sum of squares

________________________________________________________________________
Source D. F. SS of ASAC (Y) SS of % DM (X) SS of XY
________________________________________________________________________
Block 2 545.3 42.47 -75.23
Treatment 9 25689.0 972.70 -4633.23
Error 18 1608.7 86.20 -251.77
Treatment + Error 27 27297.7 1058.90 -4885.00
________________________________________________________________________
2. Analyse covariance
G × G y 973× 2839
C.F. = x = = 92078.23
r ×t 3 × 10
Total Sum of Products = ∑ xi × yi − C.F . = (34 × 93) + (40 × 47) + ... + (23 × 170) -
92078.23 = 87118-92078.23 = -4960.23

Sum of Products due to Blocks =


∑ Bx × By - C.F.
t
(318 × 909) + (314 × 1006) + (341 × 924)
- 92078.23 = 92003-92078.23 = -75.23
10
Sum of Products for Treatments:
∑ T ×T x y
- C.F. =
r
(102 × 280) + (131 × 131) + ..... + (69 × 470)
- 92078.23 = -4633.23
3
Error Sum of Squares of Products = Total Sum of Products – Block SS of Products –
Treatment SS of Products = -4960.23-(-75.23)-( -4633.23)= -251.77
( Error SS of xy ) 2
3. Compute the adjusted error SS of Y as = Error SS due to Y –
Error SS of x
(-251.77) 2
= 1608.7 – = 873.34
86.2
4. Compute (treatment + error) adjusted SS of Y as:
(Treatm. + Error SS of xy ) 2
= (Treatment + Error SS of Y) -
Treatm. + Error SS of x
(-4885) 2
= 27297.7- = 4761.84
(1058.9)
5. Compute the treatments adjusted SS of Y = Treatment + Error adjusted SS of Y - Error
adjusted SS of Y = 4761.84-873.34 = 3888.5
_____________________________________________________________________
Source D.F SS MS F-cal F table (1%)
_____________________________________________________________________
Treatment (adjust) (t-1) 10-1 = 9 3888.50 432.0 8.41** 3.68
Error (adjusted) (r-1) (t-1)-1= 17 873.34 51.4
____________________________________________________________________
6. Compute regression coefficient or slope of data
Error Sum of Pr oducts − 251.77
= = = -2.92
Erro SS due to x 86.2
− − −
7. Compute adjusted treatment mean as: Y adjusted = Yi (unadjusted ) – b×( X i – X grand

mean). For example, for treatment 1= Y adjusted = 93.33-(-2.92×1.57) = 97.92
_________________________________________________________________
No Average ASAC Average Deviation of Adjusted
(Unadjusted) % DM (2) % DM (xmean– xgrand mean) ASAC
__________________________________________________________________
1 93.33 34.00 1.57 97.92
2 43.67 43.67 11.24 76.48
3 84.33 32.00 -0.43 83.08
4 68.67 38.67 6.24 86.88
5 124.00 24.67 -7.76 101.33
6 105.33 30.33 -2.10 99.21
7 103.33 34.00 1.57 107.92
8 79.33 32.00 -0.43 78.08
9 87.67 32.00 -0.43 86.41
10 156.67 23.00 -9.43 129.13
______________________________________________________________________

Compute the relative efficiency (R.E.) of covariance analysis compared to standard


analysis of variance

R.E. =
( Error unadjusted MS of Y ) (1608.7/18)
× 100 = ×100
Treat MS of X 972.7/9
( Error adjusted MS of Y )(1 + ) (51.4) × (1 + )
Error SS of X 86.2
89.37
= × 100 = 77.15%
115.84

Thus, the result indicates that the use of % dry matter as the covariate has not increased
precision in ascorbic acid content which would have been obtained had the ANOVA is
done without covariance.

Adjusted Error MS of Y 51.4


CV = x 100 = ×100 = 7.6%
Grand Mean of Y 94.63

Mean comparison

S d to compare two adjusted treatment means:
2 ( xi − xj ) 2
Adjusted Error Mean Square of Y [ + ] where xi & xj are the covariate
r Error SS of X
means of ith and jth treatment; r is the number of replications common to both treatments.
For instance, to compare means of T1 & T2:
2 (43.67 - 34) 2
51.4[ + ] where 34 & 43.67 are the covariate means of 1st and 2nd
3 86.2
treatments; 3 is the number of replications common to both treatments.

Sd = 90.02 = 9.49

12. REGRESSION AND CORRELATION ANALYSIS

12.1 Types of Regression & Correlation

Regression analysis describes the effect of one or more variables (designated as


independent variables) on a single variable (designated as the dependent variable). It
expresses the dependent variable as a function of independent variable(s).

For regression analysis, it is important to clearly distinguish between the dependent and
independent variables.
Examples:
- Weight gain in animals depends on feed
- Number of growth rings in a tree depends on age of the tree
- Grain yield of maize depends on a fertilizer rate

In the above cases, weight gain, number of growth rings and grain yield are dependent
variables, while feed, age and fertilizer rates are independent variables.

The independent variable is designated by x and the dependent variable by y.

Correlation analysis, on the other hand, provides a measure of the degree of association
between the variables, e.g. the association between height and weight of students; body
weight of cows and milk production; grain yield of maize and thousand kernel weight.

Regression and correlation analysis can be classified:


a. Based on the number of independent variables as:
- Simple: one independent variable and one dependent variable.
- Multiple: if more than one independent variables and a dependent variable is involved
b. Based on the form of functional relationship as:
- Linear: if the form of underlying relationship is linear
- Non-linear: if the form of the relationship is non-linear

Thus, regression and correlation analysis can be classified into 4:


− Simple linear regression and correlation analysis
− Multiple linear regression and correlation analysis
− Simple non-linear regression and correlation analysis
− Multiple non-linear regression and correlation analysis

Linear Relationships
The relationship between any two variables (independent and dependent) is linear if the
change in y is constant as x changes through out the range of x under consideration.

The functional form of linear relationship between a dependent variable y and an


independent variable x is represented by the equation.
y = a + bx

y = is the dependent variable


a = the intercept of the line on the y-axis (the value of y when x is 0)
b = linear regression coefficient, is the slope of the line or the amount of change in y
for each unit change in x.

When there are more than one independent variables as say k-independent variables (x1,
x2, ………, xk), the simple linear regression equation y = a + βx can be extended to the
multiple linear functional form of:
y = α + β1x1 + β2x2 +……. + βkxk
where α is the y intercept (the value of y when all x’s are 0); β1, β2, ….. βk are partial
regression coefficients associated with the independent variables.

12.2 Simple Linear Regression and Correlation Analysis

12.2.1 Simple linear regression analysis

The simple linear regression analysis deals with the estimation and test of significance
concerning the two parameter α and β in the equation:
Y = α + βx
The data required for the application of the simple linear regression are the n-pairs (with
n >2) of y and x values.

Steps to estimate α and β

Step 1: Compute the means ( x and y ), deviation from means [ ( x − x ) , ( y − y ) ], square of


the deviates (x2, y2) and product of deviates (xy).

Example: Determine the regression equation for dependence of wing length of 13


sparrows of various ages.

Age (days), Wing Deviations from Square of Product of


x length the mean deviates deviates
(cm), Y
x − x (x ) y − y(y ) x
2
y2 xy

3 1.4 -7 -2.015 49 4.060225 14.105


4 1.5 -6 -1.915 36 3.667225 11.49
5 2.2 -5 -1.215 25 1.476225 6.075
6 2.4 -4 -1.015 16 1.030225 4.06
8 3.1 -2 -0.315 4 0.099225 0.63
9 3.2 -1 -0.215 1 0.046225 0.215
10 3.2 0 -0.215 0 0.046225 0
11 3.9 1 0.485 1 0.235225 0.485
12 4.1 2 0.685 4 0.469225 1.37
14 4.7 4 1.285 16 1.651225 5.14
15 4.5 5 1.085 25 1.177225 5.425
16 5.2 6 1.785 36 3.186225 10.71
17 5 7 1.585 49 2.512225 11.095
∑ = 130 44.4 26 19.6569
2
70.8

Mean = 10.0 3.415

Step 2: Compute the estimates of the regression parameter α and β as:


y = a + bx , where a is the estimate of α (the y intercept) and b is the estimate of β (linear
regression coefficient, slope).

b=
∑ xy = 70.8 = 0.27 cm / day ; a = y − bx
∑ x 262
2

a = 3.415 – 0.27 × 10 = 0.715 cm

Thus, the estimated linear regression equation is:


yˆ = a + bx = yˆ = 0.715 + 0.27 x for 3≤ x ≤17 (avoid extrapolating the regression line
beyond the range of observations)

This is the estimated linear functional relationship between age (days) and wing length
(cm). Thus, wing length increases by 0.27 cm every day.

Step 3: Test the significance of β (linear regression coefficient)


To test β, compute the residual mean square as:

( xy )
∑y − ∑ x
2
2

S2 yx =
∑ 2

= 19.66 −
(70.80)2  / 13 − 2 =0.05
n−2  262.00 

The residual mean square denotes the variance of y after taking into account the
dependence of y on x.

Compute the test statistic (tb) value as:


b 0.27 0.27
tb = = = = 19.5
2
S y. x 0.05 0.0135
∑ x2 262

Step 4: Compare the calculated tb value with tabulated t-value at α/2 level of
significance, at n-2 (13-2) = 11 d.f.; where n is pair of observations.

At 5% level, t 0.025 (11) = 2.201, and at 1% = t 0.005 (11) = 3.106


Since calculated /tb/ (19.5) is > the tabulated t-value at the 1% level of significance, the
linear response of wing length to changes in the days within the range of 3 to 17 days is
highly significant.

12.2.2 Simple linear correlation analysis

The simple linear correlation analysis deals with the estimation and test of significance of
the simple linear correlation coefficient (r), which is a measure of the degree of linear
association between two variables x and y (there is no need to have a dependent and
independent variable).

The value of r lies within the range of –1 to +1, with extreme values indicating the perfect
linear association and the mid-value of 0 indicates no-linear association between the two
variables. The value of r is negative when a positive change in one variable is associated
with a negative change in another and positive when the values of two variables change
in the same direction (increase or decrease).

Even though the zero r value indicates the absence of linear association between two
variables, it does not indicate the absence of association between them. It is possible for
the two variables to have a non-linear association such as quadratic form. The procedure
for the estimation and test of significance of a simple linear correlation coefficient
between two variables x and y are:

Step 1: Compute the means ( x, y ), the sum of square of the deviates ( ∑ x 2 , and ∑y 2
),
and the sum of the cross product of deviates (∑ xy ) of the two variables.
Step 2: Compute the simple linear correlation coefficient for the above example as:

r=
∑ xy =
70.8
=
70.80
= 0.98
(∑ x ) (∑ y
2 2
) 262 × 19.66 71.77

Step 3: Test the significance of the simple linear correlation coefficient (r) by comparing
the computed r-value with the tabulated r-value at n-2 d.f. The simple linear correlation
coefficient (r) is declared significant at α level of significance if the absolute value of the
computed r-value > the corresponding tabulated r-value.

Computed r = 0.98 with d.f. of n-2 = 13-2 =11


r – table at 5% (11) = 0.55, r at 1% (11) = 0.68
Thus, the simple linear correlation coefficient is significant at 1% level of significance
which indicates the presence of a highly significant and positive linear association
between ages and wing length of sparrows.

12.3 Multiple Linear Regression

The simple linear regression and correlation analysis is applicable only in cases with one
independent variable. However, in many situations Y may be dependent on more than
one independent variables. Linear regression analysis involving more than one
independent variables is called multiple linear regression. The relationship of the
dependent variable Y to the K independent variables X1, X2, ... Xk can be expressed as:

Y = α + β1X1 + β2X2 +... + βkXk

The data required for the application of multiple linear regression analysis involving K
independent variables are (n) (k + 1) observations, where n is number of pairs (n> 2).

Linear regression involving two independent variables can be expressed as: Y = α + β1X1
+ β2X2 where β1 & β2 are partial regression coefficients. β1 measures a change in Y for
unit change in X1, if X2 is held constant. Similarly, β2 measures the rate of change in Y
for a unit change in X2 where X1 is held constant. α (sometimes designated as β0) is the
value of Y when both X1 & X2 are zero.

For applying multiple linear regression analysis:


- the effect of each of K independent variables on Y must be linear
- the effect of each Xi on Y is independent of the other X, i.e. no interaction.

Example: The following data show the weight gain, initial body weight & age of five
chicks fed with certain type of rations for a month.

Initial age (days) (X1) 5 5 5 4 6


Initial body weight (g) (X2) 10 15 12 15 20
Weight gain (g) (Y) 5 6 7 8 6

Fit the multiple linear regression model Y = α + β1X1 + β2 X2 to the data

Step 1. Calculate the means and square of the deviates

X1 X2 Y y2 x1 2 x2 2 x1 y x2 y x1x2
5 10 5 1.96 0 19.36 0 6.16 0
5 15 6 0.16 0 0.36 0 -0.24 0
5 12 7 0.36 0 5.76 0 -1.44 0
4 15 8 2.56 1 0.36 -1.6 0.96 -0.6
6 20 6 0.16 1 31.36 -0.4 -2.24 5.6
Sum 25 72 32 5.2 2 57.2 -2 3.2 5
Mean 5 14.4 6.4

Step 2. Solve for b1 & b2

(∑ x2 )(∑ x1 y ) − (∑ x1 x2 )(∑ x2 y ) (57.2 × −2) − (5 × 3.2)


2
− 130.4
b1 = ; = = -1.46
(∑ x1 )(∑ x2 ) − (∑ x1 x2 ) (2 × 57.2) − (5)
2 2 2 2
89.4

(∑ x1 )(∑ x2 y ) − (∑ x1 x2 )(∑ x1 y )
2
(2 × 3.2) − (5 × −2) 16.4
b2 = ; = = 0.18
(∑ x1 )(∑ x2 ) − (∑ x1 x2 ) 2 (2 × 57.2) − (5) 2
2 2
89.4

Step 3. Compute the estimate of the intercept as:

− − −
a = Y – b1 x1 -b2 x2

a = 6.4 – (-1.46 × 5) – 0.18 × 14.4= 6.4 +7.3-2.59 = 11.11

Thus, the estimated multiple linear regression equation for initial age (days) and initial
body weight (g) with weight gain (g) is: Ŷ= 11.1 - 1.46 X1 + 0.18 X2 for 4 ≤ X1 ≤ 6; and
10 ≤ X2 ≤20.

Step 4: Compute:
k
The sum of squares due to regression (SSR) = ∑ (b )(∑ x y)
i =1
i i

SSR = b1 ∑ x1 y + b2 ∑ x2 y = (-1.46 × -2) + (0.18 × 3.2) = 3.496

Residual (error) SS = ∑y 2
− SSR = 5.2 − 3.496 = 1.704

SSR 3.496
Coefficient of determination (R2) = = = 0.67
∑y 2
5 .2

R2 measures the amount of explained variation of Y due to the independent variables.


Thus, in the above example 67% of the total variation in weight gain (g) of chicks can be
accounted for a linear function involving initial age (days) and initial body weight (g).

Step 5: Test the significance of R2


SSR / k 3.496 / 2 1.748
Compute F value as: = = = 2.05
SSE /( n − k − 1) 1.704 /(5 − 2 − 1) 0.852
where k is number of independent variables (2) and n is number of data pairs (5)
Read table F-value as F (k, n-k-1); F (2, 2) 5% = 19.00; F (2, 2) 1% = 99.00
Since the computed F-value (2.05) is less than the table F value at 5% (19.00) the
estimated multiple linear regression Ŷ= 11.1 - 1.46 X1 + 0.18 X2 is not significant at the
5% level. Thus, the combined linear effect of initial age (days) and initial body weight (g)
on weight gain (g) of chicks is not significant.

The larger the R2 value, the more important the regression equation in characterizing Y.
On the other hand, if the value of R2 is low, even if the F-test is significant, the estimated
linear regression equation may not be useful. For example an R2 value of 0.26, even if
significant indicates that only 26% of the total variation in the dependent variable (Y) is
explained by the linear function of the independent variables considered.

You might also like