Annotation - How To Do
Annotation - How To Do
This FAQ presents in depth explanations of the bioinformatics analyzes necessary for the Annotathon. For a explanations on how to create
user accounts and general sequence management issues, please refer to the user manual, also called the Rule book (http://biologie.univ-
mrs.fr/annotathon/index.php?actionjs=rules) . Finally, please note this is a Wiki, so everyone is invited to contribute to this documentation!
Contents
1 Translation, ORFs and coding/non-coding status
2 INTERPRO: Identifying conserved protein domains
3 BLAST: Finding sequence homologs
4 Using BLAST to compile a list of FASTA formatted sequence homologs
5 A microbial view of the Tree Of Life
6 Designing sequence ingroups and outgroups for phylogenetic tree inference
6.1 Strategy for defining ingroups and outgroups
6.2 List of complete microbial genomes at NCBI
6.3 Common pitfalls & difficulties in building trees
7 Phylogeny.fr: Infering phylogenetic trees
1. If the genomic DNA does contain ORFs over 60aa in length, proceed with the rest of the analysis with the longest available
ORF. Two outcomes are possible:
analysis of the longest ORF shows homologs and/or conserved protein domains => select coding STATUS and proceed
with the rest of the analyzes (multiple alignements, phylogeny etc.)
analysis of the longest ORF shows no homologs (even in the ENV_NR environmental sequence database) and no
conserved protein domains => discuss if the DNA is coding or not and select the appropriate STATUS. If the ORF is very
long (say over 200aa), then it is likely that this ORF does indeed code for a protein: it is then called an ORFan - an ORF
with no known homologs! If the ORFan is only just above the 60aa length threshold, you might want to classify it as non-
coding. Also beware of low complexity DNA (e.g. repeated stretches of the same bases), as this is often found to yield long
false positive ORFs (in which case the translations usually also bear highly prominent AA repeats). In any case, discuss
your choice and always carry out a BLASTx before concluding that the DNA is non-coding. Only proceed with the analysis
of a lesser sized ORF, if it is largely overlapping with a longer ORFan and shows BLAST homologs or conserved protein
domains. This is not so common but has been seen a few times in the GOS data (the real ORF, with clear homology to
known proteins, is contained in a larger false positive ORF with no matches, usually antisense).
As far as what inititation codon parameter to select in the ORF finding software, start with the greedy approach: that which produces the
longest possible ORFs (i.e. use "any codon" for ORF start in SMS/Orfinder). If later on your multiple alignments seem to suggest that all
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 1/35
10/26/13 Frequently Asked Questions - Metagenes
INTERPRO:
Identifying
conserved protein
domains
ORF finding
Identifying conserved protein
domains in a protein is a
powerful method to predict its putative function.
Paste your protein sequence in the sequence field of the InterproScan (http://www.ebi.ac.uk/InterProScan/) tool: click "Submit Job" to
scan your protein against the large InterPro (http://www.ebi.ac.uk/interpro/) federation of protein domain databases.
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 2/35
10/26/13 Frequently Asked Questions - Metagenes
ignore any InterPro domains that are tagged as "Unintegrated", unless you have absolutely nothing else to feed your "functional
role" line of investigation
click on the "Raw output" button to see the full results in the "text only" format, suitable for copying & pasting into the "Domains"
raw results section of the Annotathon (copy the results in extenso, not just the domains your consider of interest)
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 4/35
10/26/13 Frequently Asked Questions - Metagenes
BLASTp versus SWISSPROT: to find homologs that are well annotated (e.g. molecular functions etc.)
BLASTp versus NR: find all possible homologs (e.g. to carry out a phylogenetic analysis)
BLASTx versus NR: translates your genomic fragment directly into the 6 possible frames and then runs 6 BLASTp's (if you are
unsure of the ORF location, or if you suspect sequencing errors producing frameshifts)
copy/paste your query sequence (ORF protein sequence for BLASTp, full genomic DNA sequence for BLASTx)
select the databank you wish to search: usually SWISSPROT or NR (NR is a compilation of all protein databanks, and therefore
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 5/35
10/26/13 Frequently Asked Questions - Metagenes
a header/protocol line
which non-ambiguously
describes what search
was carried out (ex:
BLASTp versus
SWISSPROT, NCBI
default parameters other
than "500 max target
sequences")
the full, unabridged, list of
hits and E-values (Fig.
B4)
the first dozen pairwise
alignments only (Fig. B5)
the full, unabridged,
Figure B3: BLAST results header
taxonomic report (the first
section, entitled Lineage
Report Fig B7) copied into the Annotathon "Taxonomic Report" section (Fig B8)
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 6/35
10/26/13 Frequently Asked Questions - Metagenes
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 7/35
10/26/13 Frequently Asked Questions - Metagenes
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 8/35
10/26/13 Frequently Asked Questions - Metagenes
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 9/35
10/26/13 Frequently Asked Questions - Metagenes
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 10/35
10/26/13 Frequently Asked Questions - Metagenes
Figure B8: Annotathon section for BLAST "Taxonomic Report" (Lineage report)
Go to the pairwise alignments section of your NCBI BLAST report, and follow instructions in the following screenshots.
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 11/35
10/26/13 Frequently Asked Questions - Metagenes
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 12/35
10/26/13 Frequently Asked Questions - Metagenes
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 13/35
10/26/13 Frequently Asked Questions - Metagenes
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 14/35
10/26/13 Frequently Asked Questions - Metagenes
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 15/35
10/26/13 Frequently Asked Questions - Metagenes
Above all, remember that each and every sequence you wish to include in your phylogenetic tree should be a clear homolog of the others:
each sequence should have a credible BLAST E-value when aligned to your query, and each sequence must fit snuggly in the multiple
sequence alignment! Any sequence that looks like it doesn't belong to the same family, or is too partial (truncated) compared to other
members of the family, should be removed from the in or out groups!
Cyanobacteria
Thermotogales
delta-Proteobacteria
Planctomycetales + Chlamidiales + Verrucomicrobiales (PVC)
Proteobacteria
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 16/35
10/26/13 Frequently Asked Questions - Metagenes
beta-Proteobacteria +gamma-Proteobacteria
Bacteria
Archaea
Cellular organisms (Archaea + Bacteria + Eukaryotes)
Invalid ingroups:
Cyanobacteria + Firmicutes
alpha-Proteobacteria + beta-Proteobacteria
Example n°1:
Ingroup = Firmicutes
Outgroup = other lineages of same taxonomic level (i.e. all
other bacterial phyla: Thermotogales, Aquificales,
Cyanobacteria, Proteobacteria, PVC...)
Your groups should contain representatives of each of
Microbial Tree Of Life
these phyla
Example n°2:
Ingroup = alpha-Proteobacteria
Outgroup = other lineages of same taxonomic level (i.e. all other Proteobacteria: beta, delta, epsilon, & gamma)
Your groups should contain representatives of each of these classes
Example n°3:
Ingroup = gamma-Proteobacteria
Outgroup = other lineage of same taxonomic level (i.e. beta-Proteobacteria)
Your groups should contain representatives of each of these two classes
Example n°4:
Ingroup = Bacteria
Outgroup = other lineages of same taxonomic level (i.e. Archae & Eukaryotes)
Your groups should contain representatives of each of these domains
Example n°5:
Ingroup = Bacteria
If there are no archeal or eukaryotic homologs, you will not use an outgroup (the resulting tree will be unrooted!)
Important: it is essential that you select the sequences to build the in and out groups in such a way that these group's full diversities are
well represented (i.e. that you have sequence representatives of each of the subgroups that make up the in and out groups). Use the above
simplified Tree Of Life, the NCBI Taxonomy browser (http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi) and the BLAST
Lineage Report to identify existing subgroups.
Example: In the tree opposite, the in group is made up of the pink and blue branches, the unknown query sequence is highlighted in
yellow. The out group is made of the green and red branches.
for the in group, pick 15-30 sequences in each one of the subgroups enclosed by a pink or blue bracket (e.g. those on a grey
background)
for the out group, pick 5-10 sequences in each one of the subgroups enclosed by a red or green bracket (e.g. those on a grey
background)
in the example across, the resulting phylogeny would successfully suggest that the query sequence belongs to the pink group,
probably even to the same subgroup as Nitrosomonas europaea
under no circumstance should you just pick the set of first 15 best BLAST scoring hits for you in (or out) group! This will usually
result in just representing a single subgroup...
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 17/35
10/26/13 Frequently Asked Questions - Metagenes
In the figure across, the in group is made of the pink + blue groups, the query sequence is highlighted in yellow, and the out group is made
of the red + green groups.
An incorrect selection of in group sequences is indicated by the light grey backgrounds. The resulting inferred phylogeny (right panel) will
show the query sequence emerging exactly between the in group and the out group. This usually indicates :
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 18/35
10/26/13 Frequently Asked Questions - Metagenes
In the figure below, the in group is blue, the outgroup pink, and the query sequence yellow (infered phylogeny n°1). Regardless of the
efforts to properly represent the in group diversity, the query sequence always emerges between the in and out groups. Left in this state, no
conclusion is possible from the infered phylogeny n°1.
Solution : broaden the out group (add further green and red out groups) and rerun tree inference. Two outcomes are possible using
the broadened outgroup:
1. the query sequence is specifically linked to the in group (without integrating the latter, inferred phylogeny n°2): it is
legitimate to conclude that the query sequence is a close relative of the in group, even if one can not conclusively state that it is
part of it. The query sequence represents either an unknown subgroup of the in group, or it represents an unknown novel
group, close relative of the in group
1. the query sequence is not specifically linked to the in group in particular (inferred phylogeny n°3): it is legitimate to
conclude that the query sequence represents an unknown altogether novel group, not specifically related to the in group
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 19/35
10/26/13 Frequently Asked Questions - Metagenes
In the figure below, the conventional species phylogeny is shown on the left (True phylogeny). The phylogeny inferred from a set of
homologous sequences is shown in the center (Inferred phylogeny), and shows an additional red branch linked specifically to the blue
branch. This unexpected inferred phylogeny can be explained by either:
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 20/35
10/26/13 Frequently Asked Questions - Metagenes
Resolving past duplication events is notoriously difficult; it usually involves restricting the analysis to species for which a complete genome
sequence is available, allowing the inference of trees containing all paralogs and orthologs involved. However, differential gene loss which
often follows gene duplications can make inferred trees rather cryptic...
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 21/35
10/26/13 Frequently Asked Questions - Metagenes
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 22/35
10/26/13 Frequently Asked Questions - Metagenes
Workflow setup
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 23/35
10/26/13 Frequently Asked Questions - Metagenes
Data entry
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 24/35
10/26/13 Frequently Asked Questions - Metagenes
Multiple alignment
The "MUSCLE" format alignment obtained by clicking the "Alignment in CLUSTAL format" link (paste in Annotathon multiple
alignment field): ):
gi|8613437 ------------------MSNSRKRHEALLYHAKPKPGKIAVVPTKKYATQHDLALAYSP
GOS_26940 ------------------------------------------------------------
gi|8870682 --------------MDDDKSRQAARDAALRYHAYPKPGKLEIRATKPLANGQDLARAYSP
gi|2066869 -----------------MSDSQNLRQAALNYHEFPRPGKLEIRATKPMANGRDLARAYSP
Spomeroyi -----------------MSDQPSLRQAALDYHAFPKPGKLEIRATKPMANGRDLARAYSP
gi|1584252 ----------------MSNISEDLKSGALVYHRSPKPGKLEIQATKPLGNQRDLALAYSP
gi|1529713 -------------------MDEQLKQSALDFHEFPVPGKIQVSPTKPLATQRDLALAYSP
gi|7680888 ----------MSTSSSSSSSKEKLREAALDYHEFPTPGKVAIAPTKQMINQRDLALAYSP
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 25/35
10/26/13 Frequently Asked Questions - Metagenes
gi|1879253 MPSNVYSNPPSEARLMSTPVNSKLREAALDYHEFPTPGKIAIAPTKQMINQRDLALAYSP
gi|8613437 GVAEPCLEIAKDKNNIYKYTSKGNLVAVISNGTAVLGLGDIGPEASKPVMEGKGLLFKIF
GOS_26940 ------------------------------------------------------------
gi|8870682 GVAEACLEIVKDPATAADYTARGNLVAVISNGSAVLGLGNIGGLAAKPVMEGKAVLFKNF
gi|2066869 GVAEACTEIQADAANAARYTSRGNLVAVVSNGSAVLGLGNIGALASKPVMEGKAVLFKNF
Spomeroyi GVAEACLEIKDNAAHAETYTARGNLVAVVSNGTAVLGLGNIGALASKPVMEGKAVLFKKF
gi|1584252 GVAAACEAIKADPLQAAELTTRANLVAVVSNGTAVLGLGNIGPLASKPVMEGKAVLFKKF
gi|1529713 GVAAPCLEIEKDPLAAYKYTARGNLVAVVSNGTAVLGLGNIGALAGKPVMEGKGVLFKKF
gi|7680888 GVAFACEEIVENPLNAARFTARSNLVGVVTNGTAVLGLGNIGPLASKPVMEGKAVLFKKF
gi|1879253 GVAFACEEIVENPLNAARFTARSNLVGVVTNGTAVLGLGNIGPLASKPVMEGKAVLFKKF
gi|8613437 AMKLAAVHALADLAKKSVPEQVNIVYDEVSLNFGKEYIIPKPFDPRLIYEIPPAVAKAAM
GOS_26940 -----------------------------------------PFDPRLSSVVSSAVAEAAM
gi|8870682 AMQLACIDGIAALSRATTSAEAAEAYRGEQLVFGVDYLIPKPFDPRLMGVVASAVASAAM
gi|2066869 EMQIACVDGIAELARATTSAEAAAAYKGEQLNFGADYLIPKPFDPRLVAVVSSAVAKAAM
Spomeroyi AMQIACVEGIAELARITTSAEAAAAYQGEQLTFGADYLIPKPFDPRLVGVVSSAVARAAM
gi|1584252 EMKMAAVEAIAALARETPSDVVARAYGGETRAFGADSIIPSPFDPRLILRIAPAVAKAAM
gi|1529713 EMKLAAVHAIAELAHAEQSEVVASAYGDQDLSFGPEYIIPKPFDPRLIVKIAPAVAKAAM
gi|7680888 EMEIAAVNAIAELAQQEQSDIVATAYGIQDLSFGPEYLIPKPFDPRLIVKIAPAVAQAAM
gi|1879253 EMEIAAVNAIAELARQEQSDIVATAYGIQDLSFGPEYLIPKPFDPRLIVKVAPAVAKAAM
****** :..*** ***
gi|8613437 ESGVALEPISDWDAYREELMERSGSGSKEIRQIHNRAK---RNKKRIVFAEADHLDVLKA
GOS_26940 QSGVATQPIKDIDAYRDALKQTVVKSAFLMRPVFEAAS---SSARRIVFAEGEDERVLRA
gi|8870682 ETGVATRPVEDLVAYRERLDASVFRSSMIMRPVFAAAA---LSQRRIVFAEGEDERVLRT
gi|2066869 ESGVATRPIEDITAYKQKLNQTVFKSALLMRPVFEAAR---AAARRIVFAEGEDERVLRA
Spomeroyi ESGVARRPITDLEAYRQKLNQSVFKSALLMRPVFEAAA---KAARRLVFAEGEDERVLRA
gi|1584252 DTGVATRPIADFDAYNEKLDEFVFRSGFIMRPLFQRAK---QDKKRVIYAEGEDERVLRA
gi|1529713 DSGVATRPIADFDAYIEKLSEFVYKTNLFMKPIFSQAR---KEPKRVVLAEGEETRVLHA
gi|7680888 DGGVATRPIEDMEAYKVHLQQFVYHSGTTMKPVFQIARGAPAEKKRVVFAEGEEERVLRA
gi|1879253 DSGVAERPIEDMEAYEQHLQQFVYHSGTTMKPIFQLARGVEPEKKRIVFAEGEEERVLRA
: *** *: * ** * :. :. * .*:: **.: **.:
gi|8613437 AQRVQEEKLGLPILLGRKEVILELKEEIGFT----EDVPIFDPKTDEEKERRDRFGIAYW
GOS_26940 AQAVLEETSEVPIVIGRPEVIQQRCERLGLDIRPDRDFNIVNPQQD---DRYRDYWTSYH
gi|8870682 AQVIVEEMTDRPILIGRPEIIARRCEKAGLTIKPGEDFEVVNPEDD---SRHRRYWEAYL
gi|2066869 AQAILEETTETPILIGRPEVIERRCEKLGLDVRPGRDFQLVNPEND---PRYYDYWNSYH
Spomeroyi AQAILEETTETPILIGRPEVIEARCEKMGLSVRPGQDFQIVNPEND---PRYYDYWTSYH
gi|1584252 AQAVIEEGIAHPILVARPSVLEARLQRFGLSIRPGKDFEVINPEDD---PRYRDFVRSYI
gi|1529713 TQELVSLGLAKPILVGRPSVIEMRIQKLGLQIKAGVDFEIVNNESD---PRFKEYWSEYY
gi|7680888 VQIVVDEKLAKPILIGRPAVIEHRIQRYGLRLTPGVDFTIVNTEHD---ERYRDFWQTYF
gi|1879253 MQIIVDEKLAKPILIGRPAVIEQRIARYGLRLIAGQDYTVVNTDHD---ERYRDFWQEYH
*:. **::.* :: *: * :.: . * * : *
gi|8613437 ESRQRKGRTLTEAKKLMRERN-YFAAMMVNVGEADALITGYSRPYPTVIRPILESIQKDS
GOS_26940 SLLARRGVSPDLAKSIMRTNTTAIGAVMVHRGEADSLICGAVGEFRWHLNYIEQILGSK-
gi|8870682 QLMSRRGVTPDLAKVIMRTNTTAIAAIMVYCGDADSMVCGSFGQYLWHLNYVRQILAYD-
gi|2066869 KVMQRRGVTPDLAKAIMRTNTTAIGAIMVHRGEADSLLCGTFGEYRWHLNYVQQVLGGG-
Spomeroyi QLMERRGVTPDIAKAIMRTNTTAIGAIMVHRGEADSLICGTFGEYRWHLNYVEQVLGSK-
gi|1584252 EIAGRRGVTPDAARTLVRTSSTVISALAVKKGEADAMLCGIEGRFSRHLRHVRDIIGLAP
gi|1529713 QLMKRRGITQEQAQRAVISNTTVIGAIMVHRGEADAMICGTIGEYHDHYRVVQPLFGYRD
gi|7680888 KMMARKGISEQLARVEMRRRTTLIGSMLVKKGEADGMICGTISTTHRHLHFIDQVIGKRA
gi|1879253 KMMSRKGISAQMAKLEMRRRTTLIGAMLVEKGEADGMICGTVSTTHRHLHFIDQVIGKKE
. *.* : *. : . :.:: * *:**.:: * .: :
gi|8613437 GISKVAACNLMLTKQGPMFLADTTINLNPTAKDLVKISQMTSNLVKMFGMKPNVAMLSFS
GOS_26940 TLSPSGALSLMILEDGPLFIADTHVWADPTPMQIAQTAKGAARHVRRFGIEPQVALCSQS
gi|8870682 GAHPRGALSLMITEDEPLFIADTHVHPEPTPEQIADTVMAAANHVRRFGMKPNIALCSHS
gi|2066869 TYSPHGALSMMILEDGPLFIADTHVHVEPTPEQIAETVIGAARHVRRFGLAPKIALCSQS
Spomeroyi DLRPHGALSLMILEDGPLFIADTHVRSRPSPEELAEITLGAARHVRRFGIEPQIALCSQS
gi|1584252 GVRELAALSLLITPKGNLFLCDTQVQTEPNAADLAEMTILAAAHVRRFGIEPKVALLSHS
gi|1529713 GVSTAGAMNALLLPSGNTFIADTYVNHDPSPEELAEITLMAAESVRRFGIEPRVALLSHS
gi|7680888 GCSVYGAMNALVLPGRQIFLVDTHVNVDPTPAQLAEITIMAAEEVRRFGIEPKVALLSHS
gi|1879253 GAKVYAAMNALVLPNRQIFLVDTHVNVDPTPEQLAEITIMAAEEVRRFGIEPKIALLSHS
.* . :: *: ** :. *.. ::.. :: *. **: *.:*: * *
gi|8613437 NFGSTKNESSQKIREAVSYIHRNFPNAVVDGEIQADFALNPEMLAKEFPFSKLNGKKVNV
GOS_26940 QFGNLNSETGKKMRQALDILDTEKVTFTYEGEMNIDTALDPELRARLLPENR--------
gi|8870682 QFGNLDIDSGRRVRQAMALLEAREPDFAYEGEMHIDSALDPDLRARIFPNSRLQG-PANV
gi|2066869 QFGNISCDTGSRLRAAIEILDDKRRDFVYEGEMNIDTALDPELRERIFPNSRLEG-AANV
Spomeroyi QFGNQAEGSGQRLRQAIEILDSRPRDFVYEGEMNLDSALDPELRQRIFPNSRLYG-AANV
gi|1584252 NFGSNDTVCARRVRAALDILKDRAPELEVDGEMQAELALLPDARERILPHSRLQG-VANV
gi|1529713 NFGSADCPSASKMRKTLELVKARAPELMIDGEMHGDAALVESIRNDRMPDSPLKG-AANI
gi|7680888 NFGTSNAPSAQKMRDTLAILQERAPDLHVDGEMHGDVALDAALRKEILPESTLEG-EANL
gi|1879253 NFGTSNAPTAQKMRDTLAILRERAPDLQVDGEMHGDIALDANLRREVMPDSTLEG-DANL
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 26/35
10/26/13 Frequently Asked Questions - Metagenes
:**. . .:* :: : :**:: : ** :* .
gi|8613437 LIFPNLESANITYKLLKEMQG-AESIGPVILGLSKAVHIVQLGASVDEMVNMAALACVDA
GOS_26940 ------------------------------------------------------------
gi|8870682 LVFAYGDAASGVRNILKMRGG-ALEVGPILMGMGNRAHIVTPSITARGLLNISALAGTDV
gi|2066869 LIFAHADAASGVRNILKMRAG-GLEVGPILMGMGNRAHIVSPSITARGLLNMAAIAGTPV
Spomeroyi LIFAHADAASGVRNVLKMKAN-GIEVGPILMGMGNRAHIVTPSITARGLLNMAAIAGTPV
gi|1584252 LVMPDLDAADIAYNMIKVLGD-ALPVGPILMGTAKPAHILGPTVTARGIVNMTAVAVVEA
gi|1529713 LVMPNMEAARISYNLLRVSSSEGVTVGPVLMGVAKPVHILTPIASVRRIVNMVALAVVEA
gi|7680888 LVLPNIDAANIAYNLLKTAAGNNIAIGPILLGAAQPVHVLTESATVRRIVNMTALLVADV
gi|1879253 LVLPNIDAANISYNLLKTAAGNNIAIGPMLLGAAKPVHVLTASATVRRIVNMTALLVADV
gi|8613437 QQREKK
GOS_26940 ------
gi|8870682 THYS--
gi|2066869 AHYG--
Spomeroyi AHYG--
gi|1584252 QSEA--
gi|1529713 QTEPL-
gi|7680888 NAVR--
gi|1879253 IAAR--
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 27/35
10/26/13 Frequently Asked Questions - Metagenes
The GBLOCKS curated multiple sequence alignment (paste in Annotathon multiple alignment field):
10 20 30 40 50 60
=========+=========+=========+=========+=========+=========+
gi|86134375|ref ------------------MSNSRKRHEALLYHAKPKPGKIAVVPTKKYATQHDLALAYSP
GOS_26940_Trans ------------------------------------------------------------
gi|88706826|ref --------------MDDDKSRQAARDAALRYHAYPKPGKLEIRATKPLANGQDLARAYSP
gi|206686971|gb -----------------MSDSQNLRQAALNYHEFPRPGKLEIRATKPMANGRDLARAYSP
Spomeroyi_gi|56 -----------------MSDQPSLRQAALDYHAFPKPGKLEIRATKPMANGRDLARAYSP
gi|158425280|re ----------------MSNISEDLKSGALVYHRSPKPGKLEIQATKPLGNQRDLALAYSP
gi|152971328|re -------------------MDEQLKQSALDFHEFPVPGKIQVSPTKPLATQRDLALAYSP
gi|76808889|ref ----------MSTSSSSSSSKEKLREAALDYHEFPTPGKVAIAPTKQMINQRDLALAYSP
gi|187925371|re MPSNVYSNPPSEARLMSTPVNSKLREAALDYHEFPTPGKIAIAPTKQMINQRDLALAYSP
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 28/35
10/26/13 Frequently Asked Questions - Metagenes
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 29/35
10/26/13 Frequently Asked Questions - Metagenes
Spomeroyi_gi|56 DLRPHGALSLMILEDGPLFIADTHVRSRPSPEELAEITLGAARHVRRFGIEPQIALCSQS
gi|158425280|re GVRELAALSLLITPKGNLFLCDTQVQTEPNAADLAEMTILAAAHVRRFGIEPKVALLSHS
gi|152971328|re GVSTAGAMNALLLPSGNTFIADTYVNHDPSPEELAEITLMAAESVRRFGIEPRVALLSHS
gi|76808889|ref GCSVYGAMNALVLPGRQIFLVDTHVNVDPTPAQLAEITIMAAEEVRRFGIEPKVALLSHS
gi|187925371|re GAKVYAAMNALVLPNRQIFLVDTHVNVDPTPEQLAEITIMAAEEVRRFGIEPKIALLSHS
############################################################
======
gi|86134375|ref QQREKK
GOS_26940_Trans ------
gi|88706826|ref THYS--
gi|206686971|gb AHYG--
Spomeroyi_gi|56 AHYG--
gi|158425280|re QSEA--
gi|152971328|re QTEPL-
gi|76808889|ref NAVR--
gi|187925371|re IAAR--
Parameters used
Minimum Number Of Sequences For A Conserved Position: 5
Minimum Number Of Sequences For A Flanking Position: 8
Maximum Number Of Contiguous Nonconserved Positions: 8
Minimum Length Of A Block: 10
Allowed Gap Positions: None
Use Similarity Matrices: Yes
New number of positions in input.fasta-gb: 288 (36% of the original 786 positions)
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 30/35
10/26/13 Frequently Asked Questions - Metagenes
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 31/35
10/26/13 Frequently Asked Questions - Metagenes
Infered tree
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 32/35
10/26/13 Frequently Asked Questions - Metagenes
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 33/35
10/26/13 Frequently Asked Questions - Metagenes
The phylogenetic tree in "text" format to be copied in the Annotathon "Tree" section (remember to add the taxonomic group definitions):
-------0.2-----
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 34/35
10/26/13 Frequently Asked Questions - Metagenes
annotathon.org/Metagenes/index.php?title=Frequently_Asked_Questions&printable=yes 35/35