0% found this document useful (0 votes)

79 views9 pages

GAPIT Guide for Geneticists

1) The document discusses using the R package GAPIT to perform association mapping using mixed models on wheat variety phenotype and genotype data. 2) It provides detailed instructions on preparing the data, including filtering markers, imputing missing data, coding genotypes numerically, and creating a kinship matrix from a subset of markers to better represent relationships across the genome. 3) Running GAPIT with the prepared genotype and map data will calculate a kinship matrix, and comparing it to one made from thinned markers shows the latter has lower average kinship and a clearer heatmap structure.

Uploaded by

sivagro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views9 pages

GAPIT Guide for Geneticists

Uploaded by

sivagro

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

ASSOCATION MAPPING WITH GAPIT

Genomic association and Prediction Integrated Tools

We are going to use this package to carry out association mapping using the mixed model.
Even if you do not grasp all the detail at a first run through this tutorial, I hope there is
enough information here to help you run association mapping on your own data sets. For this
reason, I’ve given a lot of detail.

TriticeaeGenome http://www.triticeaegenome.eu was a European collaborative project which

developed a panel of 384 UK, French and German winter wheat varieties. We have used this
previously to illustrate the cross-locations analysis of phenotype data. Here we are going to
use the variety means from those analyses as phenotypes for association mapping.

GAPIT is R software from the Buckler Lab (http://www.maizegenetics.net/). It is an

alternative to TASSEL, which some of you may be familiar with. In my hands TASSEL does
not function well. In addition to association mapping using mixed effects modelling GAPIT
will also carry out a form of genomic selection.

Unlike most packages in R, GAPIT doesn’t install or load simply. The first time you use it,
you must issue the following commands: (Cut and paste them into the R command window.
They are also given in the GAPIT manual
(http://zzlab.net/GAPIT/gapit_help_document.pdf):

source("http://www.bioconductor.org/biocLite.R")
biocLite("multtest")

install.packages("gplots")
install.packages("LDheatmap")
install.packages("genetics")
install.packages("EMMREML")
install.packages("scatterplot3d")

library(multtest)
library(gplots)
library(LDheatmap)
library(genetics)
library(EMMREML)
library(compiler) #this library is already installed in R
library(scatterplot3d)

source("http://zzlab.net/GAPIT/gapit_functions.txt")
source("http://zzlab.net/GAPIT/emma.txt")

In subsequent R sessions, you can omit the first seven lines unless you wish to update those
packages. If you cut-and-paste these commands from here and they do not work, it is worth
downloading the latest version of the GAPIT manual and checking that the location and
names of some of the files have not changed. This has happened before.

If you type:

ls()

1
you will see that you have now loaded an extensive number of functions into R with names
beginning with ‘GAPIT’ or ‘emma’. All of these functions are called by options within a
single R command “GAPIT().” All options are listed in the manual (though some appear to
be either not implemented or not working). The manual is not that clear, I find.

Before running GAPIT, we need to upload our data and create a kinship matrix. GAPIT will
create a kinship matrix on-the-go, but this is not necessarily appropriate as we shall see.
There is also some pre-processing required of the genotype data, but I have done most of this.

If you haven’t already done so, change directory to the “GAPIT” directory found in todays
course folder. You will shortly be creating a large number of output files.

GAPIT only works with SNPs or binary markers. To analyse SSRs, each allele could be
tested for association independently, possibly with some amalgamation of the results at the
end, but this is an unnecessary complication for demonstration purposes.

I also deleted markers with a minor allele frequency (maf) <2% and markers with greater
than 10% missing data. Most association mapping studies delete loci with low maf as they
are unlikely, on their own, to detect any association, even if they are functional. The
threshold is somewhat arbitrary. In the current case, <2% means less than 8 copies of the rare
allele. There is an option within GAPIT to set the maf threshold for each trait in turn: patterns
of missing data among the traits can alter the allele frequency for the phenotyped individuals.

Finally, I deleted markers with > 10% missing data (ie > 37 observations). These markers are
likely to be of poor quality.

Data are in the Excel file ‘TG data for course’ in the sheet ‘selected raw data’. The top two
rows are for information only, and give the map position, where known. Data have been
sorted on map position. The position of the mapped markers is given in the second sheet. All
the data required for this exercise have also been pasted into separate text files, which is less
confusing here than repeatedly copying data to the clipboard and reading it in from there.

There are two remaining stages of data preparation for the marker data.

Firstly, we need to select a subset of the markers to use for kinship calculation. Kinship
matrices, and any operations upon these matrices, such as PCO and tree construction, are
dominated by dense clusters of markers, if these exist. So a tree, for example, will not
necessarily reflect average relationships over the genome, but the relationships among
varieties in the genomic regions which have the highest number of markers. This is
particularly a problem if, as is often the case, the dense clusters of markers are also in high
LD. For this reason, it is good practice to thin the markers to give a more uniform
representation of relationships over the whole genome. One way of doing this is to calculate
the correlation coefficient among all markers and delete one marker of each pair in high LD
(say r>0.9). Another is to delete markers if they are too closely linked.

There is a fine balancing act in thinning markers. If markers are not uniformly distributed
over the genome, then marker-trait associations will be over-corrected in some parts of the
genome and under-corrected in others. If too few markers are left, then kinship will not be
accurately estimated and adjusted for. For our data, we shall use only the mapped markers
for kinship estimation, deleting all but one of any markers which map to the same position.

2
The map for the 1,117 markers I selected as markers for kinship calculation is in the sheet
“kinship map.”

The second stage of data preparation for the markers is to impute the missing data: we fill in
the missing values with our best guess at what the alleles would have been if the genotyping
had been successful. There are two good reasons for imputing the missing data. Firstly
several operations on the markers, including kinship calculation, do not handle missing data
well or at all. We could delete any marker (or variety) with missing data but we usually can
end up with very few varieties or markers left. Secondly, if we can make accurate estimations
of the missing values we gain power in our analyses by increasing the sample size.

There are packages specifically targeted at imputing marker data which incorporate LD
information among markers directly (“Beagle” is the standard). There are also general
packages for imputation of missing data, such as “impute” and “softImpute” in R. In practice,
especially for the kinship matrix, inserting missing data with the mean allele frequency (if
genotypes are coded 0 and 1) or two times the mean allele frequency (if genotypes are coded
0, 1 and 2) will work adequately.

For this example, I have already imputed missing data, using the package Imputation (which
is no longer supported).

The raw genotype data are in “selected raw data.” However, we’ll use the imputed data in
the sheet “imputed data”. Note the data are coded as 0 and 1. 0, 1 and 2 would be more
common if any markers were coded as heterozygous and is required for the collation of a
kinship matrix by GAPIT. Data cannot be read in as A, C, G, T or other alphanumeric coding.

Read in the imputed data. If reading from the clipboard, do not copy the first two rows to the
clipboard; they are for information only. Easiest is to read in from the prepared file:

geno.imp.corrected<-read.table("all_markers.txt",header=T)

For the kinship calculation (but not for the association analysis itself) the data are expected to
be coded as 0, 1, 2 (or on a 0 to 2 scale for the imputed data).

length(names(geno.imp.corrected))
geno.imp.corrected [,2:2873]<- geno.imp.corrected [,2:2873]*2

The map for these markers is in “all markers map”. Note that the chromosomes must be
numbered rather than given names: don’t copy the column of chromosome names into R.
Because all markers must be given a map position, BioGemma SNP markers are positioned
on chromosome 22 (I am grateful to them for their permission to use them), unmapped DArT
markers on 23, and some specific gene markers on 24. Read in the map:

geno.map<-read.table("all_markers_map.txt",header=T)

Now we can finally use GAPIT. If provided with nothing but genotype data, it will create a
kinship matrix:

GAPIT(GD=geno.imp.corrected,GM=geno.map)

You may get an error line:

3
“some row.names duplicated: 2,3,5,6,8,9,10,11”

If you do, the fix is to ensure that the genotype and map data both have a first column of
names.

You should get a lot of output on the screen, and in your working directory you will find that
four files have been created, two with kinship information. Rename the kinship files because
we want to compare them with a kinship matrix constructed from the thinned markers which
will otherwise overwrite these files.

The subset of markers selected for this purpose and the corresponding map are in “kinship
markers” and “kinship map”. Read them in:

kin.markers <-read.table("markers_for_kinship.txt",header=T)
kin.map<- read.table("kinship_markers_map.txt", header=T)

For the kinship calculation, the data are expected to be coded as 0, 1, 2 (or on a 0 to 2 scale
for the imputed data).

length(names(kin.markers))
kin.markers [,2:1118]<-kin.markers[,2:1118]*2

Now create a kinship matrix using just the thinned markers:

GAPIT(GD=kin.markers,GM=kin.map)

Compare the two pdf files that have been created. The effect isn’t large in this case, but the
kinship matrix from the thinned markers has lower average kinship and the heatmap is more
“washed out”. Sometimes the difference is much more extreme than seen here, and we shall
look at it in a more dramatic fashion shortly.

The kinship matrix itself will be exported to a csv file (opens in Excel) called
GAPIT.Kin.VanRaden.csv. You may note that the diagonals are not 2, as they should be for
fully inbred lines, but (a) something less for the all markers data set (b) something more for
the thinned (kinship) data set. (b) appears to largely caused by imputation, many genotypes
having values >0 and <2. We can discuss possible consequences and fixes for this.

We can read the kinship matrix back into R directly from the created spreadsheet. This seems
perverse but we want to use this matrix explicitly rather than the default (created from all the
markers) which GAPIT would otherwise use:

Rename the kinship matrix created from the thinned marker set as
“GAPIT.Kin.VanRaden.kin_markers_only.csv”

kin.matrix<-read.csv("GAPIT.Kin.VanRaden.kin_markers_only.csv",header=F)

Note 1. The file was output as a csv file so we’re using the special form of “read.table” for
these files. We could also use “read.table(.....,sep =",")
2. We require “header = F”, though variety names are still expected in
the.first column.

4
Having gone to all this trouble, we can also compare relationships among lines by PCO,
using either the full data or the thinned marker set. Close down any open pdf and csv files
and rename them if you want to keep them.

GAPIT(GD=kin.markers,GM=kin.map,PCA.total=4)

As well as the four files from above, you will now get 6 more output files, all beginning with
GAPIT.PCA. Once more, before running the next line, rename the output files to prevent
them being overwritten, then run the PCA using all the markers.

GAPIT(GD=geno.imp.corrected,GM=geno.map,PCA.total=4)

The added option PCA.total=4 causes GAPIT to include the largest four eigenvalues in the
association analysis. This follows the approach used in ‘Structured Association’ and
‘Eigensoft’ aka ‘Eigenstrat’ for adjusting for population structure. Here, we aren’t carrying
out that analysis, but it does create a csv with all the principal components.

To compare the effect of the marker thinning, open the “GAPIT.PCA.pdf” file for each data
set and compare the PC1-PC2 plots against each other. You should see clear evidence of
population structure when using all markers, almost certainly originating from the 1B1R rye
translocation, but observe no such structure when using the thinned markers. The effect is
also just about visible in the 3D plot of the first 3 PCAs. For this dataset, the PCA clustering
for all markers results from a clustering of markers in a single area of the genome and should
be removed from the kinship matrix. In other cases, if the clustering results from genome
wide differentiation, it will remain, correctly, after thinning of markers. An example would
be a spring versus winter split in European barley. These are differentiated breeding pools
and kinship differs genome-wide between the two, and this should be accounted for in any
association analysis.

Now we can do some association analysis. For this we need trait data: in the sheet of that
name.

pheno<-read.table("phens.txt",header=T)

To jump straight in, the following command will carry out a standard mixed model analysis
on the trait “awns”: presence or absence of awns on the ear. This is highly heritable and is
used for variety discrimination, so is a good place to start.

test<-
GAPIT(Y=pheno[,c(1,13)],GD=geno.imp.corrected,GM=geno.map,KI=kin.matrix,gro
up.from=376,group.to=376)

This may take a couple of minutes. It should exit saying "GAPIT accomplished
successfully...”. There will also be a notice about warnings which you can ignore.

Multiple output files will be created. It is worth moving these to a separate directory so the
current directory does not get cluttered, or subsequent runs overwrite files you want. Files
worth looking at are:

5
GAPIT.AWNS.Manhattan-Plot.Genomewise.pdf: a plot of the results on a –log10 scale. This
is generally the most interesting and informative plot.

GAPIT.AWNS.GWAS.Results.csv: results for each marker, ordered by p-value.

GAPIT.AWNS.Allelic_Effect_Estimates.csv - the allelic effects of each marker,

GAPIT.AWNS.QQ-Plot.pdf A QQ plot of the p-values to assess if there is an excess of

(perhaps false) positive results.

To break the GAPIT command down:

Y=pheno[,c(1,13)] analyse the 13th phenotype in the input file. You can analyse
multiple phenotypes – they will be given different output file
names by, for example Y=pheno[,c(1,13,15)]. “Y=pheno” will
analyse all phenotypes. Note you must have variety names in
column 1.

GD=geno.imp.corrected. test the markers for association in geno.imp.corrected. You could

select a subset of markers in exactly the same way as for
phenotypes.

GM=geno.map the map. It must match the genotypes, so if you select specific
markers, you must select the appropriate rows here too

KI=kin.matrix the kinship matrix. Note, if KI is missing, then GAPIT will create
its own kinship matrix. If markers are clustered, this can cause
causes problems. (Feel free to try it.)

group.from In addition to the EMMA algorithm, GAPIT uses an

implementation of the mixed model in which individuals are
clustered into more-or-less unrelated groups. The kinship matrix
can then set relationships between groups to zero, treat the groups
as independent and work with a much small number of
relationships within groups. This speeds up the analysis, which
can be important if working with very large numbers of
individuals and markers, but it is not necessary in the crop datasets
I have come across so far. See the manual for details. By setting
group.from = group.to = to the number of individuals to be
analysed, GAPIT is obliged to use the basic EMMA mixed model,
without adulteration, which I think is best.

group.to see above.

We can carry out an analysis unadjusted for kinship by restricting group.from and group.to 1:

test<-
GAPIT(Y=pheno[,c(1,13)],GD=geno.imp.corrected,GM=geno.map,KI=kin.matrix,gro
up.from=1,group.to=1)

We could also use lm() or lmer() for this analysis.

6
If you compare the QQ and Manhattan plots between the adjusted and unadjusted analyses,
you will see evidence of the increased frequency of small p-values, resulting from population
structure and kinship which have not been adjusted for.

Note that we have used the imputed genotype data in this analysis, rather than the raw data.
Which is best depends on the accuracy of the imputation for the marker under test.

Let’s look at height. (trait 12). All you need to is re-run the commands above, pointing at the
new phenotype.

Compare the QQ plots. You should see very many more significant results in the unadjusted
analysis. But if you look at the gene markers (chromosome 24), you will see one very
significant result with no adjustment, but two with adjustment. We can look in the GWAS
results file to see what is going on. I find it easier to do this if I first rename them – add ‘adj’
and ‘unadj’ to the names so it is clear which you are looking at.

In the unadjusted results, you should find Rht2 is the most significant marker. This accounts
for a very large proportion of the variation. It is the most important dwarfing gene employed
in Europe, so this is not surprising.

In the adjusted results, you will still find Rht2 on top, but the p-value is much reduced, as is
the additional variation accounted for by Rht2. Height is a major contributor to population
structure in wheat: German varieties tend to rely less on dwarfing genes for example and so
are taller. Adjusting for kinship has the effect of leaving less variation to be accounted for by
genuine marker relationships. Note, however, that the second highest result is from ppdD1, a
gene for earliness which is used in some wheat varieties, particularly in Southern Europe,
where early maturation may allow escape from summer drought.

Perhaps Rht2 is dominating the results. If we include Rht2 as a covariate in the analysis, will
other sources of reduced height will be revealed. For this we need to include Rht2 in the
analysis as a covariate.

I’ve set up a file called ‘covs.txt’, which contains the covariates we shall use in this session.
The format is the same as for the phenotype data. Read all the data in.

covs<-read.table("covs.txt",header=T)

To add a covariate to the mixed model:

test<-GAPIT(Y=pheno[,c(1,12)],CV=covs[,c(1,5)],GD=geno.imp.corrected,
KI=kin.matrix,GM=geno.map,group.from=376,group.to=376)

Not the addition of the “CV=” option to include covariates. Rht2 is in the fifth column. You
can include more than one covariate: just select what you want as described for selecting
phenotypes. CV=covs would include all the covariates in the data frame.

Compare the Manhattan plots. If you look at the GWAS results file, you will notice that Rht2
is no longer the peak marker. This is correct: it has been included as a covariate so its effect
should disappear. PpdD1 is now promoted to top position. Rht1 has emerged too. Rht1 is
another dwarfing gene used in wheat breeding, though it is less prevalent in Europe than has
7
Rht2. In the first analysis, its association with height was masked by Rht2, in the second its
significance is increased.

It is possible to include multiple covariates in the model and select the subset which gives the
best fitting model. For this the option:

"Model.selection = TRUE",

Is added to the GAPIT command. Goodness of fit is tested by BIC: Bayesian Information
Criterion. We won’t go into this option here. We’ll have a look at yield next, which is in
column 2 of ‘pheno’.

If there is time:

We are going to adjust for population structure in three ways:

None
Kinship
Population membership
Kinship + population membership.

We have discussed adjustment by population membership already. In essence, covariates of

membership are included in the model first, so that the significance of any marker effect is
tested after adjustment by these covariates. Before the advent of the mixed model, this was
the approved method. Population membership could be estimated by STRUCTURE, or by
eigenvectors. STUCTURE has the disadvantage that it takes a long time to run and for many
crops it is difficult to decide what value of k to use. Inclusion of eigenvectors is more popular
now. Many workers like to run a mixed model and include eigenvectors as covariates as well.
The merit of this approach is still debated. In part, at least, it can depend on the form of the
kinship matrix. For TriticeaeGenome, we are going to use country of origin as a surrogate for
population membership.

That country of origin is important for yield, we can we can test in a simple anova:

anova(lm(pheno$ALL~ covs$COUNTRY))

Unfortunately, GAPIT does not allow factors to be entered as covariates, they must be
numerical. We have three countries (one entry from the USA has been coded as unknown).
We can therefore have a column for each of the three countries, with 1 indicating origin in
that country, 0 for elsewhere. However, these three columns are not independent: they sum to
1 across rows, so if you know two columns, you also know the third. This is why COUNTRY
has only 2 df in the analysis of variance. Therefore we must drop one of these columns. I’ve
chosen to exclude GBR, leaving column 3 containing FRA and column 4 DEU. Repeat the
analysis of variance on these two columns:

anova(lm(pheno$ALL~ covs[,3]+covs[,4]))

This gives a very similar result to before: the partition of the sum of squares is very similar. It
is not identical because there are two countries of unknown origin, coded NA originally. In
my recoding, I’ve given these values of 1/3 for each country.

8
There is another covariate that has an important effect on yield: age of the variety. You can
see this either by plotting yield against age, or by adding age to the previous analysis of
variance. Three varieties had an unknown age of origin and I’ve given these the mean year of
origin, 1991.4.

For the mixed model, with all covariates:

test<-
GAPIT(Y=pheno[,c(1,2)],CV=covs[,1:4],GD=geno.imp.corrected,GM=geno.map,
KI=kin.matrix,group.from=376,group.to=376)

This command can be edited to run the other models. Save the output of each run in a
separate directory. Compare the QQ plots, Manhattan plots and GWAS results. What do you
think? It is clear that some correction must be done but what is the best.

Things to consider:

Repeat using the raw genotype data

Repeat the mixed model with the major hit (Rht2 again) as a covariate.

Repeat the mixed model with automatic covariate selection

Repeat for all traits.

Try a different kinship matrix. Create one based on correlations across individuals on
marker scores standardised for each marker. This will give diagonal elements of 1
(two after doubling for inclusion). The code below will do this:

# standardise markers, then correlate across individuals;

# The function scale will scale the data by columns (ie markers)
# ‘t’ transforms the data
# cor creates the correlation matrix across individuals.
cor.kin<-cor(t(scale(geno.imp.corrected[,2:532])))

write.table(cor.kin,file="cor.kin.txt")
rm(cor.kin)

This table could be edited in Excel (don’t forget to multiply by 2), and re-imported: easier in
Excel than doing it in R and you’ll have the matrix for future reference.

Conclusion & Recommendations:

There remains some art in carrying out association analysis: what kinship matrix to use, how
and if to use for imputed markers and so on. The best approach is probably to simulate traits
from the marker data for your data set and test analysis methods on those. In simulations here
(on the AGOUEB barley data and on a much smaller set of UK wheat lines), we concluded
that the best all-round method is the mixed model, with a kinship matrix based on excess
allele sharing (ie the correlation matrix), and no additional covariates to account for
population structure. But this wasn’t always best for all traits and datasets. If one has time, it
is worth experimenting.
9

GAPIT Manual
No ratings yet
GAPIT Manual
50 pages
GWAS Tutorial Using TASSEL 5.0
No ratings yet
GWAS Tutorial Using TASSEL 5.0
140 pages
QTL Mapping Guide for Researchers
100% (2)
QTL Mapping Guide for Researchers
47 pages
L9-QTL Mapping
No ratings yet
L9-QTL Mapping
37 pages
Genetic and Physical Mapping Guide
No ratings yet
Genetic and Physical Mapping Guide
43 pages
R Packages for GWAS Analysis
No ratings yet
R Packages for GWAS Analysis
18 pages
Manual de Laboratorio Cimmyt PDF
No ratings yet
Manual de Laboratorio Cimmyt PDF
102 pages
Protease Comparison for DNA Extraction
No ratings yet
Protease Comparison for DNA Extraction
3 pages
Bromelain's Anti-Inflammatory Mechanism
No ratings yet
Bromelain's Anti-Inflammatory Mechanism
6 pages
Manual y Calibración Micropipeta Transferpette - S - 2020 - EN
No ratings yet
Manual y Calibración Micropipeta Transferpette - S - 2020 - EN
12 pages
Principles and Procedures of QTL Mapping
100% (1)
Principles and Procedures of QTL Mapping
128 pages
Multiplex PCR
100% (1)
Multiplex PCR
25 pages
Population Genetics
100% (3)
Population Genetics
5 pages
User Manual - Hybaid - Omn E - UK Omn-E Thermal Cycler
50% (2)
User Manual - Hybaid - Omn E - UK Omn-E Thermal Cycler
48 pages
Population Genetic Analysis of ISSR Data
No ratings yet
Population Genetic Analysis of ISSR Data
8 pages
Primer Design Exercise
No ratings yet
Primer Design Exercise
34 pages
Population Genetics
No ratings yet
Population Genetics
68 pages
Pop Gen
No ratings yet
Pop Gen
38 pages
Chapter8 Primerdesigning
No ratings yet
Chapter8 Primerdesigning
8 pages
Development of A QPCR Assay For Quantification of Saccharibacteria
No ratings yet
Development of A QPCR Assay For Quantification of Saccharibacteria
15 pages
Marcadores Moleculares
No ratings yet
Marcadores Moleculares
66 pages
Rna-Based Synthetic Biology: Central Dogma of Molecular Biology?
No ratings yet
Rna-Based Synthetic Biology: Central Dogma of Molecular Biology?
6 pages
RNA Processing
No ratings yet
RNA Processing
29 pages
KASP Genotyping Quick Start Guide: 2. Assay
No ratings yet
KASP Genotyping Quick Start Guide: 2. Assay
15 pages
Single Nucleotide Polymorphism Analysis
No ratings yet
Single Nucleotide Polymorphism Analysis
34 pages
ADBT 3 Marker Assisted Breeding
No ratings yet
ADBT 3 Marker Assisted Breeding
48 pages
18 - Real-Time QPCR Assay Design Guide - v8
No ratings yet
18 - Real-Time QPCR Assay Design Guide - v8
29 pages
Genbank & BLAST in Biology Class
No ratings yet
Genbank & BLAST in Biology Class
9 pages
Tutorial - Shane's Simple Guide To F-Statistics
No ratings yet
Tutorial - Shane's Simple Guide To F-Statistics
21 pages
Categorization of Microorganisms Based On Physical and Nutritional Requirements For Growth
No ratings yet
Categorization of Microorganisms Based On Physical and Nutritional Requirements For Growth
11 pages
Experimental Transmission of Plant Viruses by Aphids or Whiteflies
No ratings yet
Experimental Transmission of Plant Viruses by Aphids or Whiteflies
314 pages
Molecular Assisted Selection in Plant Breeding Programs
No ratings yet
Molecular Assisted Selection in Plant Breeding Programs
48 pages
ZFN
No ratings yet
ZFN
20 pages
L2.2 Measure of Genetic Diversity
No ratings yet
L2.2 Measure of Genetic Diversity
45 pages
Practical 6
No ratings yet
Practical 6
6 pages
Measures of Diversity
No ratings yet
Measures of Diversity
5 pages
DNA Sequencing by Capillary Electrophoresis Chemistry Guide
No ratings yet
DNA Sequencing by Capillary Electrophoresis Chemistry Guide
310 pages
Regulating Gmos
No ratings yet
Regulating Gmos
6 pages
BIOINFORMATICS LAB Report
No ratings yet
BIOINFORMATICS LAB Report
14 pages
User Manual - Hybaid - PXE - HBPXE Thermal Cycler
No ratings yet
User Manual - Hybaid - PXE - HBPXE Thermal Cycler
70 pages
Guide Sheet For Tics Lab 1 - 4
No ratings yet
Guide Sheet For Tics Lab 1 - 4
17 pages
Applications of Technology Molecular Markers in Aquaculture Genetics
100% (2)
Applications of Technology Molecular Markers in Aquaculture Genetics
29 pages
Group # 13
No ratings yet
Group # 13
49 pages
Concept of Gene Pool
0% (1)
Concept of Gene Pool
41 pages
Activity 3 - Dna Isolation
No ratings yet
Activity 3 - Dna Isolation
4 pages
Exploring Ensembl: Exercise 1 - Panda
No ratings yet
Exploring Ensembl: Exercise 1 - Panda
4 pages
Intro To Genetics 2024
100% (1)
Intro To Genetics 2024
39 pages
Plasmids
No ratings yet
Plasmids
8 pages
CRISPR: A Beginner's Guide
No ratings yet
CRISPR: A Beginner's Guide
19 pages
Molecular Markers 2005
No ratings yet
Molecular Markers 2005
31 pages
Genome Acquisition - Sangers's Sequencing
No ratings yet
Genome Acquisition - Sangers's Sequencing
30 pages
Chemistry of Nucleic Acids..slide
No ratings yet
Chemistry of Nucleic Acids..slide
19 pages
Viral Vector
No ratings yet
Viral Vector
35 pages
RDT (Vectors) Question-Answers - I: by - Shweta Singh
No ratings yet
RDT (Vectors) Question-Answers - I: by - Shweta Singh
1 page
QPCR - Real Time PCR SYBR Green Plate Protocol and Calculator
No ratings yet
QPCR - Real Time PCR SYBR Green Plate Protocol and Calculator
2 pages
Co-So-Di-Truyen-Chon-Giong-Cay-Trong - Geneticmaps - (Cuuduongthancong - Com)
No ratings yet
Co-So-Di-Truyen-Chon-Giong-Cay-Trong - Geneticmaps - (Cuuduongthancong - Com)
41 pages
Genetic Maps
No ratings yet
Genetic Maps
41 pages
GWAStutorial 23feb
No ratings yet
GWAStutorial 23feb
3 pages
Gapit Help Document
No ratings yet
Gapit Help Document
52 pages
RENUMF90
0% (1)
RENUMF90
10 pages
1 L2 Security
No ratings yet
1 L2 Security
18 pages
Dhis2 User Manual
No ratings yet
Dhis2 User Manual
787 pages
Figure 1. The Anatomy of A Scroll Bar
No ratings yet
Figure 1. The Anatomy of A Scroll Bar
9 pages
1.0 Proposal For PLC Mitsubishi Q Series Toyo Memory
No ratings yet
1.0 Proposal For PLC Mitsubishi Q Series Toyo Memory
8 pages
Software Requirements Specification: Version 1.0 Approved
No ratings yet
Software Requirements Specification: Version 1.0 Approved
14 pages
Gowtham Venkatesh: Kadiyam
No ratings yet
Gowtham Venkatesh: Kadiyam
2 pages
Server Checklist
100% (1)
Server Checklist
22 pages
Eng Ek 127 Worksheet 3A: Vec Input ('Enter A Vector: ')
No ratings yet
Eng Ek 127 Worksheet 3A: Vec Input ('Enter A Vector: ')
2 pages
Software Design Document: Version
100% (1)
Software Design Document: Version
38 pages
Brocade Silkworm 4100 Quick Start Guide
No ratings yet
Brocade Silkworm 4100 Quick Start Guide
16 pages
DLD Lab 7
No ratings yet
DLD Lab 7
7 pages
Case Study
100% (3)
Case Study
6 pages
Ibm DS 3512
No ratings yet
Ibm DS 3512
176 pages
IC Design with Silvaco Tools
No ratings yet
IC Design with Silvaco Tools
11 pages
Ms Excel Assignment
No ratings yet
Ms Excel Assignment
4 pages
Acrobat Pro 2020
No ratings yet
Acrobat Pro 2020
4 pages
PCM CMX User Guide
No ratings yet
PCM CMX User Guide
246 pages
AR208 Service Manual
No ratings yet
AR208 Service Manual
164 pages
Green Tick For Your Business
No ratings yet
Green Tick For Your Business
7 pages
Training & Research For Academic Newcomers: A Project of The King Baudouin Foundation
No ratings yet
Training & Research For Academic Newcomers: A Project of The King Baudouin Foundation
27 pages
Tomas Claudio Colleges College of Business and Accountancy: Taghangin, Morong, Rizal
No ratings yet
Tomas Claudio Colleges College of Business and Accountancy: Taghangin, Morong, Rizal
4 pages
From: Mandeep Bca6B Online Notice Board PHP Report 17BCA1268
No ratings yet
From: Mandeep Bca6B Online Notice Board PHP Report 17BCA1268
14 pages
Mvb-Training Plan
No ratings yet
Mvb-Training Plan
2 pages
Data Analysis Using ChatGPT
No ratings yet
Data Analysis Using ChatGPT
10 pages
გაბრიელ გარსია მარკესი - სიყვარული ჟამიანობის დროს PDF
No ratings yet
გაბრიელ გარსია მარკესი - სიყვარული ჟამიანობის დროს PDF
397 pages
Core Faq
No ratings yet
Core Faq
12 pages
Circular 20250514085140 Revised Datesheet For Be Computer Science - Technology End Term Examinations, May 2025
No ratings yet
Circular 20250514085140 Revised Datesheet For Be Computer Science - Technology End Term Examinations, May 2025
1 page
w2 Reviewsem12223
No ratings yet
w2 Reviewsem12223
4 pages
Sap-C Cpi 2404
50% (2)
Sap-C Cpi 2404
13 pages
CBH
No ratings yet
CBH
2 pages

GAPIT Guide for Geneticists

Uploaded by

GAPIT Guide for Geneticists

Uploaded by

ASSOCATION MAPPING WITH GAPIT

Genomic association and Prediction Integrated Tools

TriticeaeGenome http://www.triticeaegenome.eu was a European collaborative project which

GAPIT is R software from the Buckler Lab (http://www.maizegenetics.net/). It is an

You may get an error line:

Now create a kinship matrix using just the thinned markers:

GAPIT.AWNS.GWAS.Results.csv: results for each marker, ordered by p-value.

GAPIT.AWNS.QQ-Plot.pdf A QQ plot of the p-values to assess if there is an excess of

To break the GAPIT command down:

GD=geno.imp.corrected. test the markers for association in geno.imp.corrected. You could

group.from In addition to the EMMA algorithm, GAPIT uses an

group.to see above.

We could also use lm() or lmer() for this analysis.

To add a covariate to the mixed model:

We are going to adjust for population structure in three ways:

We have discussed adjustment by population membership already. In essence, covariates of

For the mixed model, with all covariates:

Repeat using the raw genotype data

Repeat the mixed model with automatic covariate selection

Repeat for all traits.

# standardise markers, then correlate across individuals;

Conclusion & Recommendations:

You might also like