Biotools
Biotools
February 3, 2016
Type Package
Title Tools for Biometry and Applied Statistics in Agricultural
Science
Version 3.0
Date 2016-02-02
LazyLoad yes
LazyData yes
Author Anderson Rodrigo da Silva
Maintainer Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
Depends R (>= 2.15), rpanel, tkrplot, MASS, lattice, SpatialEpi
Imports utils, stats, graphics, boot, grDevices, datasets
Suggests soilphysics
Description Tools designed to perform and work with cluster analysis (including Tocher's algorithm),
discriminant analysis and path analysis (standard and under collinearity), as well as some
useful miscellaneous tools for dealing with sample size and optimum plot size calculations.
Mantel's permutation test can be found in this package. A new approach for calculating its
power is implemented. biotools also contains the new tests for genetic covariance components.
License GPL (>= 2)
NeedsCompilation no
Repository CRAN
Date/Publication 2016-02-03 22:15:39
R topics documented:
biotools-package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
aer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
boxM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
brazil . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
confusionmatrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
cov2pcov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
creategroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1
2 biotools-package
D2.disc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
D2.dist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
distClust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
findSubsample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
fitplotsize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
garlicdist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
gencovtest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
maize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
mantelPower . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
mantelTest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
moco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
multcor.test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
optimumplotsize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
pathanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
peppercorr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
raise.matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
samplesize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
sHe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
singh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
tocher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Index 34
Description
Tools designed to perform and work with cluster analysis (including Tocher’s algorithm), discrim-
inant analysis and path analysis (standard and under collinearity), as well as some useful miscella-
neous tools for dealing with sample size and optimum plot size calculations. Mantel’s permutation
test can be found in this package. A new approach for calculating its power is implemented. biotools
also contains the new tests for genetic covariance components.
Details
Package: biotools
Type: Package
Version: 3.0
Date: 2016-02-02
License: GPL (>= 2)
aer 3
Note
biotools is an ongoing project. Any and all criticism, comments and suggestions are welcomed.
Author(s)
Anderson Rodrigo da Silva
Maintainer: Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
References
Carvalho, S.P. (1995) Metodos alternativos de estimacao de coeficientes de trilha e indices de se-
lecao, sob multicolinearidade. Ph.D. Thesis, Federal University of Vicosa (UFV), Vicosa, MG,
Brazil.
Cruz, C.D.; Ferreira, F.M.; Pessoni, L.A. (2011) Biometria aplicada ao estudo da diversidade
genetica. Visconde do Rio Branco: Suprema.
Lessman, K. J. & Atkins, R. E. (1963). Optimum plot size and relative efficiency of lattice designs
for grain sorghum yield tests. Crop Sci., 3:477-481.
Mahalanobis, P. C. (1936) On the generalized distance in statistics. Proceedings of The National
Institute of Sciences of India, 12:49-55.
Manly, B.F.J. (2004) Multivariate statistical methods: a primer. CRC Press.
Meier, V. D. & Lessman, K. J. (1971) Estimation of optimum field plot shape and size for testing
yield in Crambe abyssinia Hochst. Crop Sci., 11:648-650.
Morrison, D.F. (1976) Multivariate Statistical Methods.
Rao, R.C. Advanced statistical methods in biometric research. New York: John Wiley & Sons,
1952.
Sharma, J.R. (2006) Statistical and biometrical techniques in plant breeding. Delhi: New Age
International.
Silva, A.R. & Dias, C.T.S. (2013) A cophenetic correlation coefficient for Tocher’s method. Pesquisa
Agropecuaria Brasileira, 48:589-596.
Silva et al. (2013) Path analysis in multicollinearity for fruit traits of pepper. Idesia, 31:55-60.
Singh, D. (1981) The relative importance of characters affecting genetic divergence. Indian Journal
Genetics & Plant Breeding, 41:237-245.
Description
A function to calculate the apparent error rate of two classification vectors, i.e., the proportion of
observed cases incorrectly predicted. It can be useful for evaluating discriminant analysis or other
classification systems.
n
1X
aer = I(yi 6= ŷi )
n i=1
4 boxM
Usage
aer(obs, predict)
Arguments
obs a vector containing the observed classes.
predict a vector with the same length of obs containing the predicted classes.
Value
The apparent error rate, a number between 0 (no agreement) and 1 (thorough agreement).
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
See Also
confusionmatrix, lda
Examples
data(iris)
da <- lda(Species ~ ., data = iris)
pred <- predict(da, dimen = 1)
aer(iris$Species, pred$class)
Description
It performs the Box’s M-test for homogeneity of covariance matrices obtained from multivariate
normal data according to one classification factor. The test is based on the chi-square approximation.
Usage
boxM(data, grouping)
Arguments
data a numeric data.frame or matrix containing n observations of p variables; it is
expected that n > p.
grouping a vector of length n containing the class of each observation; it is usualy a factor.
brazil 5
Value
A list with class "htest" containing the following components:
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
References
Morrison, D.F. (1976) Multivariate Statistical Methods.
Examples
data(iris)
boxM(iris[, -5], iris[, 5])
Description
Lat/Long coordinates within Brazil’s limits.
Usage
data("brazil")
Format
A data frame with 17141 observations on the following 2 variables.
Examples
data(brazil)
plot(brazil, cex = 0.1, col = "gray")
Description
A function to compute the confusion matrix of two classification vectors. It can be useful for
evaluating discriminant analysis or other classification systems.
Usage
confusionmatrix(obs, predict)
Arguments
obs a vector containing the observed classes.
predict a vector with the same length of obs containing the predicted classes.
Value
A square matrix containing the number of objects in each class, observed (rows) and predicted
(columns). Diagonal elements refers to agreement of obs and predict.
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
See Also
aer, lda
Examples
data(iris)
da <- lda(Species ~ ., data = iris)
pred <- predict(da, dimen = 1)
confusionmatrix(iris$Species, pred$class)
Description
Compute a matrix of partial (co)variances for a group of variables with respect to another.
Take Σ as the covariance matrix of dimension p. Now consider dividing Σ into two groups of
variables. The partial covariance matrices are calculate by:
Usage
cov2pcov(m, vars1, vars2 = seq(1, ncol(m))[-vars1])
Arguments
m a square numeric matrix.
vars1 a numeric vector indicating the position (rows or columns in m) of the set of
variables at which to compute the partial covariance matrix.
vars2 a numeric vector indicating the position (rows or columns in m) of the set of
variables at which to adjust the partial covariance matrix.
Value
A square numeric matrix.
Author(s)
Anderson Rodrigo da Silva <anderson.agro at hotmail.com>
See Also
cov
Examples
(Cl <- cov(longley))
cov2pcov(Cl, 1:2)
Description
A function to create homogeneous groups of named objects according to an objective function
evaluated at a covariate. It can be useful to design experiments which contain a fixed covariate
factor.
Usage
creategroups(x, ngroups, sizes, fun = mean, tol = 0.01, maxit = 200)
Arguments
x a numeric vector of a covariate at which to evaluate the objective function.
ngroups the number of groups to create.
sizes a numeric vector of length equal to ngroups containing the group sizes.
fun the objective function, i.e., to create groups with similar fun; default is mean.
tol the tolerance level to define the groups as homogenenous; see details.
maxit the maximum number of iterations; default is 200.
Details
Pngroups
creategroups uses a tol value to evaluate the following statistic: h = j abs(tj+1 −
tj )/ngroups, where tj = f un(groupj ). If h ≤ tol, the groups are considered homogeneous.
Value
A list of
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
D2.disc 9
Examples
x <- rnorm(10, 1, 0.5)
names(x) <- letters[1:10]
creategroups(x, ngroups = 2, sizes = c(5, 5))
creategroups(x, ngroups = 3, sizes = c(3, 4, 3), tol = 0.05)
Description
A function to perform discriminant analysis based on the squared generalized Mahalanobis distance
(D2) of the observations to the center of the groups.
Usage
## Default S3 method:
D2.disc(data, grouping, pooled.cov = NULL)
## S3 method for class 'D2.disc'
print(x, ...)
## S3 method for class 'D2.disc'
predict(object, newdata = NULL, ...)
Arguments
data a numeric data.frame or matrix (n x p).
grouping a vector of length n containing the class of each observation (row) in data.
pooled.cov a grouping-pooled covariance matrix (p x p). If NULL (default), D2.disc will
automatically compute a pooled covariance matrix.
x, object an object of class "D2.disc".
newdata numeric data.frame or matrix of observations to be classified. If NULL (de-
fault), the input data used as argument in D2.disc will be used.
... further arguments.
Value
A list of
call the call which produced the result.
data numeric matrix; the input data.
D2 a matrix containing the Mahalanobis distances between each row of data and
the center of each class of grouping. In addition, the original and the predicted
(lowest distance) class are displayed, as well as a chacater vector indicating
where the misclassification has occured.
10 D2.dist
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
References
Manly, B.F.J. (2004) Multivariate statistical methods: a primer. CRC Press. (p. 105-106).
Mahalanobis, P.C. (1936) On the generalized distance in statistics. Proceedings of The National
Institute of Sciences of India, 12:49-55.
See Also
D2.dist, confusionmatrix, lda
Examples
data(iris)
(disc <- D2.disc(iris[, -5], iris[, 5]))
first10 <- iris[1:10, -5]
predict(disc, first10)
predict(disc, iris[, -5])$class
Description
Function to calculate the squared generalized Mahalanobis distance between all pairs of rows in a
data frame with respect to a covariance matrix. The element of the i-th row and j-th column of the
distance matrix is defined as
2
Dij = (xi − xj )0 Σ−1 (xi − xj )
Usage
D2.dist(data, cov, inverted = FALSE)
Arguments
data a data frame or matrix of data (n x p).
cov a variance-covariance matrix (p x p).
inverted logical. If FALSE (default), cov is supposed to be a variance-covariance matrix.
distClust 11
Value
An object of class "dist".
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
References
Mahalanobis, P. C. (1936) On the generalized distance in statistics. Proceedings of The National
Institute of Sciences of India, 12:49-55.
See Also
dist, singh
Examples
# Manly (2004, p.65-66)
x1 <- c(131.37, 132.37, 134.47, 135.50, 136.17)
x2 <- c(133.60, 132.70, 133.80, 132.30, 130.33)
x3 <- c(99.17, 99.07, 96.03, 94.53, 93.50)
x4 <- c(50.53, 50.23, 50.57, 51.97, 51.37)
x <- cbind(x1, x2, x3, x4)
Cov <- matrix(c(21.112,0.038,0.078,2.01, 0.038,23.486,5.2,2.844,
0.078,5.2,24.18,1.134, 2.01,2.844,1.134,10.154), 4, 4)
D2.dist(x, Cov)
Description
Function to compute a matrix of average distances within and between clusters.
Usage
distClust(d, nobj.cluster, id.cluster)
Arguments
d an object of class "dist" containing the distances between objects.
nobj.cluster a numeric vector containing the numbers of objects per cluster.
id.cluster a numeric vector for identification of the objects per cluster.
12 findSubsample
Value
A squared matrix containing distances within (diagonal) and between (off-diagonal) clusters.
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
See Also
tocher, dist
Description
It allows one to find an optimized (minimized or maximized) numeric subsample according to a
statistic of interest. For example, it might be of interest to determine a subsample whose standard
deviation is the lowest among all of those obtained from all possible subsamples of the same size.
Usage
findSubsample(x, size, fun = sd, minimize = TRUE, niter = 10000)
Arguments
x a numeric vector.
size an integer; the size of the subsample.
fun an object of class function; the statistic at which to evaluate the subsample.
minimize logical; if TRUE (default) findSubsample will find a subsample that minimizes
stat.
niter an integer indicating the number of iterations, i.e., the number of subsamples to
be selected (without replacement) from the original sample, x. The larger is this
number, the more optimized is the subsample to be found, but this also implies
in time-consuming.
Value
A list of
dataname a character.
niter the number of iterations.
fun the objective function.
stat the achieved statistic for the optimized subsample.
criterion a character indicating the type of optimization.
subsample a numeric vector; the optimized subsample.
labels a string containg the labels of the subsample values.
fitplotsize 13
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
See Also
sample, creategroups
Examples
# Example 1
y <- rnorm(40, 5, 2)
findSubsample(x = y, size = 6)
# Example 2
f <- function(x) diff(range(x)) # max(x) - min(x)
findSubsample(x = y, size = 6, fun = f, minimize = FALSE, niter = 20000)
Description
Function to estimate the parameters of the nonlinear Lessman & Atkins (1963) model for determin-
ing the optimum plot size as a function of the experimental coefficient of variation (CV) or as a
function of the residual standard error.
CV = a ∗ plotsize−b .
It creates initial estimates of the parameters a and b by log-linearization and uses them to provide
its least-squares estimates via nls.
Usage
fitplotsize(plotsize, CV)
Arguments
plotsize a numeric vector containing estimates of plot size.
CV a numeric vector of experimental coefficient of variation or residual standard
error.
Value
A nls output.
14 garlicdist
Side Effects
A summary table (summary.nls), if convergence is achieved.
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
References
Lessman, K. J. & Atkins, R. E. (1963) Optimum plot size and relative efficiency of lattice designs
for grain sorghum yield tests. Crop Sci., 3:477-481.
See Also
optimumplotsize
Examples
ps <- c(1, 2, 3, 4, 6, 8, 12)
cv <- c(35.6, 29, 27.1, 25.6, 24.4, 23.3, 21.6)
out <- fitplotsize(plotsize = ps, CV = cv)
predict(out) # fitted.values
plot(cv ~ ps)
curve(coef(out)[1] * x^(-coef(out)[2]), add = TRUE)
Description
The data give the squared generalized Mahalanobis distances between 17 garlic cultivars. The data
are taken from the article published by Silva & Dias (2013).
Usage
data(garlicdist)
Format
An object of class "dist" based on 17 objects.
Source
Silva, A.R. & Dias, C.T.S. (2013) A cophenetic correlation coefficient for Tocher’s method. Pesquisa
Agropecuaria Brasileira, 48:589-596.
gencovtest 15
Examples
data(garlicdist)
tocher(garlicdist)
Description
gencovtest() tests genetic covariance components from a MANOVA model. Two different ap-
proaches can be used: (I) a test statistic that takes into account the genetic and environmental
effects and (II) a test statistic that only considers the genetic information. The first type refers to
tests based on the mean cross-products ratio, whose distribution is obtained via Monte Carlo simu-
lation of Wishart matrices. The second way of testing genetic covariance refers to tests based upon
an adaptation of Wilks’ and Pillai’s statistics for evaluating independence of two sets of variables.
All these tests are described by Silva (2015).
Usage
## S3 method for class 'manova'
gencovtest(obj, geneticFactor, gcov = NULL,
residualFactor = NULL, adjNrep = 1,
test = c("MCPR", "Wilks", "Pillai"),
nsim = 9999,
alternative = c("two.sided", "less", "greater"))
## S3 method for class 'gencovtest'
print(x, digits = 4, ...)
## S3 method for class 'gencovtest'
plot(x, var1, var2, ...)
Arguments
obj an object of class "manova".
geneticFactor a character indicating the genetic factor from which to test covariance compo-
nents. It must be declared as a factor in the manova object.
gcov optional; a matrix containing estimates of genetic covariances to be tested. If
NULL (default), an estimate is obtained via method of moments.
residualFactor optional; a character indicating a source in the manova model to be used as error
term. If NULL (default), the usual term "Residuals" will be used.
adjNrep a correction index for dealing with unbalanced data. See details.
test a character indicating the test. It must be on of the following: "MCPR" - the
empirical type-I test based on Mean Cross-Products Ratios via Wishart simula-
tion, "Wilks" - a type-II test based on the partial Wilks’ Lambda, "Pillai" - a
type-II test based on the partial Pillai’s statistic.
16 gencovtest
nsim the number of Monte Carlo simulations. Used only if test = "MCPR".
alternative the type of alternative hypothesis. Used only if test = "MCPR". So far, only
the option "two.sided" is implemented.
x an object of class "gencovtest".
digits the number of digits to be displayed by the print method.
var1 a character of integer indicating one of the two response variable or its position.
var2 a character of integer indicating one of the two response variable or its position.
... further arguments.
Details
The genetic covariance matrix is currently estimated via method of moments, following the equa-
tion:
G = (M g − M e)/(nrep ∗ adjN rep)
where M g and M e are the matrices of mean cross-products associated with the genetic factor and
the residuals, respectively; nrep is the number of replications, calculated as the ratio between the
total number of observations and the number of levels of the genetic factor; adjN rep is supposed
to adjust nrep, specially when estimating G from unbalanced data.
Value
An object of class gencovtest, a list of
gcov a p-dimensional square matrix containing estimates of the genetic covariances.
gcor a p-dimensional square matrix containing estimates of the genetic correlations.
test the test (as input).
statistics a p-dimensional square matrix containing the test statistics. If test = "MCPR"
the mean cross-products ratios are computed; if test = "Wilks" the Wilks’
Lambda is; and test = "Pillai" results on Pillai’s T n.
p.values a p-dimensional square matrix containing the associated p-values.
alternative the type of alternative hypothesis (as input).
X2 a p-dimensional square matrix containing the Chi-square (D.f. = 1) approxima-
tion for Wilks’s and Pillai’s statistics. Stored only if one of these two tests is
chosen.
simRatio an array consisting of nsim p-dimensional matrices containing the simulated
mean cross-products ratios.
dfg the number of degrees of freedom associated with the genetic factor.
dfe the number of degrees of freedom associated with the residual term.
Warning
When using the MCPR test, be aware that dfg should be equal or greater than the number of
variables (p). Otherwise the simulation of Wishart matrices may not be done.
A collinearity diagnosis is carried out using the condition number (CN), for the inferences may be
affected by the quality of G. Thus, if CN > 100, a warning message is displayed.
maize 17
Author(s)
References
Silva, A.R. (2015) On Testing Genetic Covariance. LAP Lambert Academic Publishing. ISBN
3659716553
See Also
manova
Examples
# MANOVA
data(maize)
M <- manova(cbind(NKPR, ED, CD, PH) ~ family + env, data = maize)
summary(M)
# Example 1 - MCPR
t1 <- gencovtest(obj = M, geneticFactor = "family")
print(t1)
plot(t1, "ED", "PH")
# Example 2 - Pillai
t2 <- gencovtest(obj = M, geneticFactor = "family", test = "Pillai")
print(t2)
plot(t2, "ED", "PH")
Description
Data from and experiment with five maize families carried out in randomized block design, with
four replications (environments).
Usage
data("maize")
18 mantelPower
Format
A data frame with 20 observations on the following 6 variables.
NKPR a numeric vector containing values of Number of Kernels Per cob Row.
ED a numeric vector containing values of Ear Diameter (in cm).
CD a numeric vector containing values of Cob Diameter (in cm).
PH a numeric vector containing values of Plant Heigth (in m).
family a factor with levels 1 2 3 4 5
env a factor with levels 1 2 3 4
Examples
data(maize)
str(maize)
summary(maize)
Description
Power calculation of Mantel’s permutation test.
Usage
mantelPower(obj, effect.size = seq(0, 1, length.out = 50), alpha = 0.05)
Arguments
obj an object of class "mantelTest". See mantelTest.
effect.size numeric; the effect size specifying the alternative hypothesis.
alpha numeric; the significance level at which to compute the power level.
Value
A data frame containing the effect size and its respective power level.
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
References
Silva, A.R.; Dias, C.T.S.; Cecon, P.R.; Rego, E.R. (2015). An alternative procedure for performing
a power analysis of Mantel’s test. Journal of Applied Statistics, doi = 10.1080/02664763.2015.
1014894
mantelTest 19
See Also
mantelTest
Examples
# Mantel test
data(garlicdist)
garlic <- tocher(garlicdist)
coph <- cophenetic(garlic)
mt1 <- mantelTest(garlicdist, coph, xlim = c(-1, 1))
Description
Mantel’s permutation test based on Pearson’s correlation coefficient to evaluate the association be-
tween two distance square matrices.
Usage
mantelTest(m1, m2, nperm = 999, alternative = "greater",
graph = TRUE, main = "Mantel's test", xlab = "Correlation", ...)
Arguments
m1 an object of class "matrix" or "dist", containing distances among n individuals.
m2 an object of class "matrix" or "dist", containing distances among n individuals.
nperm the number of matrix permutations.
alternative a character specifying the alternative hypothesis. It must be one of "greater"
(default), "two.sided" or "less".
graph logical; if TRUE (default), the empirical distribution is plotted.
20 mantelTest
Value
A list of
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
References
Mantel, N. (1967). The detection of disease clustering and a generalized regression approach.
Cancer Research, 27:209–220.
See Also
mantelPower
Examples
# Distances between garlic cultivars
data(garlicdist)
garlicdist
# Tocher's clustering
garlic <- tocher(garlicdist)
garlic
# Cophenetic distances
coph <- cophenetic(garlic)
coph
# Mantel's test
mantelTest(garlicdist, coph,
xlim = c(-1, 1))
Description
Data set of...
Usage
data("moco")
Format
A data frame with 206 observations (sampling points) on the following 20 variables (coordinates
and markers).
Source
...
22 multcor.test
References
...
Examples
data(moco)
str(moco)
Description
It performs multiple correlation t-tests from a correlation matrix based on the statistic:
p
t = r ∗ (df /(1 − r2 ))
where, in general, df = n − 2.
Usage
multcor.test(x, n = NULL, Df = NULL,
alternative = c("two.sided", "less", "greater"), adjust = "none")
Arguments
x a correlation matrix.
n the number of observations; if NULL (default), the argument Df must be passed.
Df the number of degrees of freedom of the t statistic; if NULL (default), the argu-
ment n must be passed and, in this case, multcor.test considers Df = n − 2.
alternative the alternative hypothesis. It must be one of "two.sided", "greater" or "less".
You can specify just the initial letter. "greater" corresponds to positive associa-
tion,"less" to negative association. The default is "two.sided".
adjust The adjustment method for multiple tests. It must be one of "holm", "hochberg",
"hommel", "bonferroni", "BH", "BY", "fdr", "none" (default). For more infor-
mation, see p.adjust.
Value
A list with class "multcor.test" containing the following components:
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
See Also
cor, cor.test, p.adjust
Examples
data(peppercorr)
multcor.test(peppercorr, n = 20)
Description
The Meier & Lessman (1971) method to determine the maximum curvature point for optimum plot
size as a function of the experimental coefficient of variation.
Usage
optimumplotsize(a, b)
Arguments
a a parameter estimate of the plot size model; see fitplotsize.
b a parameter estimate of the plot size model; see fitplotsize.
Value
The (approximated) optimum plot size value.
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
24 pathanalysis
References
Meier, V. D. & Lessman, K. J. (1971) Estimation of optimum field plot shape and size for testing
yield in Crambe abyssinia Hochst. Crop Sci., 11:648-650.
See Also
fitplotsize
Examples
ps <- c(1, 2, 3, 4, 6, 8, 12)
cv <- c(35.6, 29, 27.1, 25.6, 24.4, 23.3, 21.6)
out <- fitplotsize(plotsize = ps, CV = cv)
plot(cv ~ ps)
curve(coef(out)[1] * x^(-coef(out)[2]), add = TRUE)
optimumplotsize(a = coef(out)[1], b = coef(out)[2])
Description
Function to perform the simple path analysis and the path analysis under collinearity (sometimes
called ridge path analysis). It computes the direct (diagonal) and indirect (off-diagonal) effects of
each explanatory variable over a response one.
Usage
Arguments
Value
A list of
coef a matrix containing the direct (diagonal) and indirect (off-diagonal) effects of
each variable.
Rsq the coefficient of determination.
ResidualEffect
the residual effect.
VIF a vector containing the variance inflation factors.
CN the condition number.
Side Effects
If collinearity = TRUE, an interactive graphic is displayed for dealing with collinearity.
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
References
Carvalho, S.P. (1995) Metodos alternativos de estimacao de coeficientes de trilha e indices de se-
lecao, sob multicolinearidade. Ph.D. Thesis, Federal University of Vicosa (UFV), Vicosa, MG,
Brazil.
Examples
data(peppercorr)
pathanalysis(peppercorr, 6, collinearity = FALSE)
Description
The data give the correlations between 6 pepper variables. The data are taken from the article
published by Silva et al. (2013).
Usage
data(peppercorr)
Format
An object of class "matrix".
26 raise.matrix
Source
Silva et al. (2013) Path analysis in multicollinearity for fruit traits of pepper. Idesia, 31:55-60.
Examples
data(peppercorr)
print(peppercorr)
Description
raise.matrix raises a square matrix to a power by using spectral decomposition.
Usage
raise.matrix(x, power = 1)
Arguments
x a square matrix.
power numeric; default is 1.
Value
An object of class "matrix".
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
See Also
eigen, svd
Examples
m <- matrix(c(1, -2, -2, 4), 2, 2)
raise.matrix(m)
raise.matrix(m, 2)
Description
Function to determine the minimum sample size for calculating a statistic based on its the confidence
interval.
Usage
samplesize(x, fun, sizes = NULL, lcl = NULL, ucl = NULL,
nboot = 200, conf.level = 0.95, nrep = 500, graph = TRUE, ...)
Arguments
x a numeric vector.
fun an objective function at which to evaluate the sample size; see details.
sizes a numeric vector containing sample sizes; if NULL (default), samplesize creates
a vector ranging from 2 to n-1.
lcl the lower confidence limit for the statistic defined in fun; if NULL (default),
samplesize estimates lcl based on bootstrap percentile interval.
ucl the upper confidence limit for the statistic defined in fun; if NULL (default),
samplesize estimates ucl based on bootstrap percentile interval.
nboot the number of bootstrap samples; it is used only if lcl or ucl is NULL.
conf.level the confidence level for calculating the lcl and ucl; it is used only if lcl or ucl
is NULL.
nrep the resampling (with replacement) number for each sample size in sizes; de-
fault is 500.
graph logical; default is TRUE.
... further graphical arguments.
Details
If ucl or lcl is NULL, fun must be defined as in boot, i.e., the first argument passed will always be
the original data and the second will be a vector of indices, frequencies or weights which define the
bootstrap sample. By now, samplesize considers the second argument only as index.
Value
A list of
CI a vector containing the lower and the upper confidence limit for the statistic
evaluated.
pointsOut a data frame containing the sample sizes (in sizes), the number of points outside
the CI (n.out) and the proportion of this number (prop).
28 sHe
Side Effects
If graph = TRUE, a graphic with the dispersion of the estimates for each sample size, as well as the
graphic containing the number of points outside the confidence interval for the reference sample.
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
Examples
cv <- function(x, i) sd(x[i]) / mean(x[i]) # coefficient of variation
x = rnorm(20, 15, 2)
cv(x)
samplesize(x, cv)
Description
Estimate spatial gene diversity (expected heterozygozity - He) through the individual-centred ap-
proach by Manel et al. (2007). sHe() calculates the unbiased estimate of He based on the infor-
mation of allele frequency obtained from codominant or dominant markers in individuals within a
circular moving windows of known radius over the sampling area.
Usage
sHe(x, coord.cols = 1:2, marker.cols = 3:4,
marker.type = c("codominant", "dominant"),
grid = NULL, latlong2km = TRUE, radius, nmin = NULL)
Arguments
x a data frame or numeric matrix containing columns with coordinates of individ-
uals and marker genotyping
coord.cols a vector of integer giving the columns of coordinates in x
marker.cols a vector of integer giving the columns of markers in x
marker.type a character; the type of molecular marker
grid optional; a two-column matrix containing coordinates over which to predict He
latlong2km logical; should coordinates be converted from lat/long format into kilometer-
grid based?
sHe 29
radius the radius of the moving window. It must be in the same format as sampling
coordinates
nmin optional; a numeric value indicating the minimum number of individuals used
to calculate He. If is the number of individuals in a certain location is less then
nmin, sHe will consider He as zero.
Details
The unbiased estimate of expected heterogygozity (Nei, 1978) is given by:
n
X 2n
He = (1 − p2i )
i=1
2n − 1
where pi is the frequency of the i-th allele per locus considering the n individuals in a certain
location.
Value
A list of
diversity a data frame with the following columns: coord.x - the x-axis coordinates of
the predicion grid, coord.y - the y-axis coordinates of the predicion grid, n - the
number of individuals in a certain points in the grid, MaxDist - the maximum
observed distance among these individuals, uHe - the unbiased estimate of gene
diversity (as expressed above), and SE - the standard error of uHe.
mHe a matrix containing the estimates of He for every marker, on each point of the
grid.
locations a numeric matrix containing the sampling coordinates, as provides as input.
Warning
Depending on the dimension of x and/or grid, sHe() can be time demanding.
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
Ivandilson Pessoa Pinto de Menezes <ivan.menezes@ifgoiano.edu.br>
References
Manel, S., Berthoud, F., Bellemain, E., Gaudeul, M., Luikart, G., Swenson, J.E., Waits, L.P., Taber-
let, P.; Intrabiodiv Consortium. (2007) A new individual-based spatial approach for identifying
genetic discontinuities in natural populations. Molecular Ecology, 16:2031-2043.
Nei, M. (1978) Estimation of average heterozygozity and genetic distance from a small number of
individuals. Genetics, 89: 583-590.
See Also
levelplot
30 singh
Examples
data(moco)
data(brazil)
# check points
plot(brazil, cex = 0.1, col = "gray")
points(Lat ~ Lon, data = moco, col = "blue", pch = 20)
# A FANCIER PLOT...
# using Brazil's coordinates as prediction grid
# ex2 <- sHe(x = moco, coord.cols = 1:2,
# marker.cols = 3:20, marker.type = "codominant",
# grid = brazil, radius = 150)
# ex2
#
# library(maps)
# borders <- data.frame(x = map("world", "brazil")$x,
# y = map("world", "brazil")$y)
#
# library(latticeExtra)
# plot(ex2, xlab = "Lon", ylab = "Lat",
# xlim = c(-75, -30), ylim = c(-35, 10), aspect = "iso") +
# latticeExtra::as.layer(xyplot(y ~ x, data = borders, type = "l")) +
# latticeExtra::as.layer(xyplot(Lat ~ Lon, data = moco))
Description
A function to calculate the Singh (1981) criterion for importance of variables based on the squared
generalized Mahalanobis distance.
n−1
XX n
S.j = (xij − xi0 j ) ∗ (xi − xi0 )0 ∗ Σ−1
j
i=1 i0 >i
Usage
## Default S3 method:
singh(data, cov, inverted = FALSE)
singh 31
Arguments
Value
singh returns a matrix containing the Singh statistic, the importance proportion and the cummula-
tive proprtion of each variable (column) in data.
Author(s)
References
Singh, D. (1981) The relative importance of characters affecting genetic divergence. Indian Journal
Genetics & Plant Breeding, 41:237-245.
See Also
D2.dist
Examples
# Manly (2004, p.65-66)
x1 <- c(131.37, 132.37, 134.47, 135.50, 136.17)
x2 <- c(133.60, 132.70, 133.80, 132.30, 130.33)
x3 <- c(99.17, 99.07, 96.03, 94.53, 93.50)
x4 <- c(50.53, 50.23, 50.57, 51.97, 51.37)
x <- cbind(x1, x2, x3, x4)
Cov <- matrix(c(21.112,0.038,0.078,2.01, 0.038,23.486,5.2,2.844,
0.078,5.2,24.18,1.134, 2.01,2.844,1.134,10.154), 4, 4)
(s <- singh(x, Cov))
plot(s)
Description
tocher performs the Tocher (Rao, 1952) optimization clustering from a distance matrix. The cophe-
netic distance matrix for a Tocher’s clustering can also be computed using the methodology pro-
posed by Silva \& Dias (2013).
Usage
## S3 method for class 'dist'
tocher(d, algorithm = c("original", "sequential"))
## S3 method for class 'tocher'
print(x, ...)
## S3 method for class 'tocher'
cophenetic(x)
Arguments
d an object of class "dist".
algorithm a character indicating the algorithm to be used for clustering objects. It must be
one of the two: "original" (default) or "sequential". The latter is the method
proposed by Vasconcelos et al. (2007), and sometimes called "modified" Tocher.
x an object of class "tocher".
... optional further arguments from print.
Value
An object of class tocher. A list of
call the call which produced the result.
algorithm character; the algorithm that has been used as input.
clusters a list of length k (the number of clusters), containing the labels of the objects in
d for each cluster.
class a numeric vector indicating the class (the cluster) of each object in d.
criterion a numeric vector containing the clustering criteria - the greatest amongst the
smallest distances involving each object in d. If algorithm = "original",
this vector contains an unique value, i.e., the same criterion is used for every
clustering step.
distClust a matrix of distances within (diagonal) and between (off-diagonal) clusters.
d the input object.
Warning
Clustering a large number of objects (say 300 or more) can be time demanding.
tocher 33
Author(s)
Anderson Rodrigo da Silva <anderson.agro@hotmail.com>
References
Cruz, C.D.; Ferreira, F.M.; Pessoni, L.A. (2011) Biometria aplicada ao estudo da diversidade
genetica. Visconde do Rio Branco: Suprema.
Rao, R.C. (1952) Advanced statistical methods in biometric research. New York: John Wiley &
Sons.
Sharma, J.R. (2006) Statistical and biometrical techniques in plant breeding. Delhi: New Age
International.
Silva, A.R. & Dias, C.T.S. (2013) A cophenetic correlation coefficient for Tocher’s method. Pesquisa
Agropecuaria Brasileira, 48:589-596.
Vasconcelos, E.S.; Cruz, C.D.; Bhering, L.L.; Resende Junior, M.F.R. (2007) Alternative method-
ology for the cluster analysis. Pesquisa Agropecuaria Brasileira, 42:1421-1428.
See Also
dist, D2.dist, cophenetic, distClust, hclust
Examples
# example 1
data(garlicdist)
(garlic <- tocher(garlicdist))
garlic$distClust # cluster distances
# example 2
data(USArrests)
(usa <- tocher(dist(USArrests)))
usa$distClust
# cophenetic correlation
cophUS <- cophenetic(usa)
cor(cophUS, dist(USArrests))
# example 3
data(eurodist)
(euro <- tocher(eurodist))
euro$distClust
garlicdist, 14
gencovtest, 15
hclust, 33
34