File Format for Next- Generation Sequencing:
• SAM • CRAM:
• BAM o New format that is very similar to BAM.
o Same information as SAM and is Compressed.
• CRAM
o SAM – Sequence Alignment/ Map • BAM:
▪ A Header “optional, start o It has same information as
with @ SAM files.
▪ An alignment section o They are in binary file
• It is text file format. format.
• It contains the alignment o It is not readable by
information of various sequences human.
that are mapped against reference o It is smaller and more
sequence. efficient for software to
• It also contains unmated work, saving time and
sequence. reducing cost of
• It is readable by humans. computation and storage.
PAM: Point Accepted Mutation
• A PAM matrix is a matrix where each column and row represents
one of the twenty standard amino acids.
• PAM matrices are amino acid substitution matrices that encode
the expected evolutionary change at the amino acid level.
• Each PAM matrix is designed to compare two sequences which
are a specific number of PAM units apart.
• PAM 120 score matrix is designed to compare between sequences
that are 120 PAM units apart.
• The score it gives a pair of sequences is the probability of such
sequences evolving during 120 PAM units of evolution.
• Different PAM matrices correspond to different length of time in
1
the evolution of the protein sequence.
• LALIGN is not used for global multiple sequence alignment.
• LALIGN/ PLALIGN find internal duplications by calculating non-
intersecting local alignments of protein or DNA sequences.
• LALIGN shows the alignments and similarity scores.
• PLALIGN presents a dot-plot like graph.
• Multiple Sequence Alignments is generally the alignment of three or more
biological sequences (protein or nucleic acid) of similar length.
• Homology can be inferred and the evolutionary relationships between the
sequence studies.
• The most widely used programs for global multiple sequence alignment are
from the CLUSTAL series of programs.
• PILEUP: it creates a multiple sequence alignment from a group of related
sequence using progressive, Pair wise alignments.
• T-COFFEE: compares all sequences two by two producing a global
alignment and a series of local alignment.