0% found this document useful (0 votes)

8 views51 pages

Bio 3

The document provides an overview of protein structure, including primary, secondary, tertiary, and quaternary structures, as well as the concept of active sites and their importance in drug design. It discusses bioinformatics topics such as sequence alignment, protein-ligand docking, and the challenges faced in the interdisciplinary field. Additionally, it covers DNA sequencing techniques and the significance of sequence conservation in evolutionary biology.

Uploaded by

mahmoudweso2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views51 pages

Bio 3

Uploaded by

mahmoudweso2003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

BIOINFORMATICS(BIOCOMPUTING)

(3)
ALIGNMENT AND MATCHING
DR. IBRAHIM ZAGHLOUL
PROTEIN STRUCTURE

https://www.rcsb.org/structure/3ERT 2
MACROMOLECULAR STRUCTURE
• Primary structure of proteins
– Linear polymers linked by peptide bonds
– Sense of direction

3
SECONDARY STRUCTURE
• Polypeptide chains fold into regular local structures
– alpha helix, beta sheet, turn, loop
– based on energy considerations

4
ALPHA HELIX

5
BETA SHEET

anti-parallel parallel

schematic

6
TERTIARY STRUCTURE
• 3-d structure of a polypeptide sequence
– interactions between non-local and foreign atoms
– often separated into domains

tertiary structure of domains of CD4

myoglobin

7
QUATERNARY STRUCTURE

• Arrangement of protein subunits

quaternary structure
of Cro
human hemoglobin
tetramer

8
ACTIVE SITE (BINDING SITE)
- Upon folding, the protein active site is formed.
- The spot at which molecules fit and interact.
- The major point for protein activity.
- Usually it is a Cleft, Pocket, Cavity.
- Called active because interaction usually
results in some chemical change or reaction.
- Basis of the lock and key model.
https://www.slideshare.net/MerlynH/protein-structure-
function-46933802

http://www.chemeddl.org/collections/TSTS/Gellman/Gellm
anpg5-8/Active%20Sites.html 9
ACTIVE SITE AND DRUG DESIGN: LOCK AND KEY

• Structure (chemically interact).

• Shape should match .
• Interacting molecules (ligands).
• Lock: Protein (Receptor, Target).
• Key: Ligand (Compound)

10
PROTEIN-LIGAND DOCKING
 Computational method that mimics the binding of a
ligand to a protein
 Given: Target (Protein), Binding Site, Ligand (set of
ligands)
• Predicts:
• The pose of the molecule in the binding site
• The binding affinity or a score representing the
strength of binding
• Docking can be used for:
• virtual screening: Virtual testing of compounds
• Lead optimization: Investigate specific compounds
• De novo design of ligands: Synthesize new
compounds.
Image credit: Charaka Goonatilake, Glen Group, University of Cambridge.
11
http://www-ucc.ch.cam.ac.uk/research/cg369-research.html
VIDEO: MOLECULAR DOCKING USING GLIDE
Bioinformatics: A simple view

Biological Computational
+
Data methods

13
Challenges of working in bioinformatics

• Need to feel comfortable in interdisciplinary area

• Depend on others for primary data
• Need to address important biological and computer
science problems

14
Skill set

• Artificial intelligence
• Machine learning
• Statistics & probability
• Algorithms
• Databases
• Programming

15
Bioinformatics Topics Genome Sequence

• Finding Genes in Genomic DNA

– introns
– exons
– promotors
• Characterizing Repeats in Genomic DNA
– Statistics
– Patterns
• Duplications in the Genome
– Large scale genomic alignment

16
Bioinformatics Topics Protein Sequence
• Sequence Alignment
– non-exact string matching, gaps
• Scoring schemes and Matching
– How to align two strings optimally via statistics
Dynamic Programming
• Patterns – How to tell if a given
– Local vs Global Alignment alignment or match is
– TM-helix finding – Amino acid substitution scoring
statistically significant
– Motifs
– A P-value (or an e-value)?
• Secondary Structure “Prediction”
– Assessing Secondary Structure Prediction – Score Distributions
(extreme val. dist.)
– Ab initio
– Low Complexity Sequences
• Function Prediction
• Evolutionary Issues
– Active site identification
– Rates of mutation and
• Tertiary Structure Prediction change
– Fold Recognition • Relation of Sequence Similarity to Structural
– Threading Similarity
17
Evolution

1
DNA Sequencers
DNA Sequencing
• DNA sequencing refers to the general laboratory technique for determining the exact
sequence of nucleotides, or bases, in a DNA molecule.

• A DNA sequencer is a scientific instrument used to automate the DNA

sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the
order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then
reported as a text string, called a read.
DNA Sequencers
DNA Sequencers
Sequencing Reads
Alignment
Assembly
Evolution

3
Sequence conservation implies function

Alignment is the key to

• Finding important regions
• Determining function
• Uncovering the evolutionary forces
Sequence alignment
• Comparing DNA/protein sequences for
– Similarity
– Homology
• Prediction of function
• Construction of phylogeny: the history of the evolution of a species or
group.
• Shotgun assembly
– End-space-free alignment / overlap alignment
• Finding motifs

3
Homology

• Homology: Homology among DNA, or proteins is

inferred from their sequence similarity. Significant
similarity is strong evidence that two sequences are related
by evolutionary changes from a common ancestral
sequence.
• Orthologs (Different Species)
– Divergence follows speciation
– Similarity can be used to construct phylogeny between
species
• Paralogs (Same Species)
- Divergence follows duplication
3
Sequence Alignment

• Procedure of comparing two (pairwise) or more

(multiple) sequences by searching for a series of
individual characters that are in the same order in
the sequences

GCTAGTCAGATCTGACGCTA
| |||| ||||| |||
TGGTCACATCTGCCGC

35
Sequence Alignment

• Procedure of comparing two (pairwise) or more

(multiple) sequences by searching for a series of
individual characters that are in the same order
in the sequences

VLSPADKTNVKAAWGKVGAHAGYEG
||| | | | || | ||
VLSEGDWQLVLHVWAKVEADVAGEG

3
Sequence Alignment
AGGCTATCACCTGACCTCCAGGCCGATGCCC
TAGCTATCACGACCGCGGTCGATTTGCCCGAC

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---
TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC

Definition
Given two strings x = x1x2...xM, y = y1y2…yN,

an alignment is an assignment of gaps to positions 0,…, M

in x, and 0,…, N in y, so as to line up each letter in one
sequence with either a letter, or a gap in the other sequence

3
Sources of variation
• Nucleotide substitution
– Replication error
– Chemical reaction
• Insertions or deletions (indels)
– Unequal crossing over
– Replication slippage
• Duplication
– a single gene (complete gene duplication)
– part of a gene (internal or partial gene duplication)
• Domain duplication
• Exon shuffling
– part of a chromosome (partial polysomy)

– an entire chromosome (aneuploidy or polysomy)

– the whole genome (polyploidy)

38
A simple alignment

• Let us try to align two short nucleotide sequences:

– AATCTATA and AAGATA
• Without considering any gaps (insertions/deletions) there
are 3 possible ways to align these sequences

AATCTATA AATCTATA AATCTATA

AAGATA AAGATA AAGATA

• Which one is better?

39
Scoring the alignments
• We need to have a scoring mechanism to evaluate alignments
– match score
– mismatch score
• We can have the total score as:
n
∑
=1
i
match or mismatch score at position i

• For the simple example, assume a match score of 1 and a

mismatch score of 0:
AATCTATA AATCTATA AATCTATA
AAGATA AAGATA AAGATA
4 1 3
40
Simple alignment with gaps
• Considering gapped alignments vastly
increases the number of possible alignments.

• If gap penalty is -1 what will be the new

scores?

AATCTATA AATCTATA AATCTATA

AAG-AT-A AA-G-ATA AA--GATA
1 3 3

41
BLOSUM 62 matrix
String Definitions

A string S is a finite ordered list of characters.

Characters are drawn from an alphabet Σ.

Nucleic acid alphabet: { A, C, G, T }

Amino acid alphabet: { A, R, N, D, C, E, Q, G, H, I, L, K, M, F, P, S, T, W, Y, V }

Length of S, |S |, is the number of characters in S

ϵ is the empty string. | ϵ | = 0

43
String Definitions

• For strings S and T over Σ, their concatenation consists of the characters of

S followed by the characters of T, denoted ST

• S is a substring of T if there exist (possibly empty) strings u and v such that

T = uSv

• S is a prefix of T if there exists a string u such that T = Su.

If neither S nor u are ϵ, S is a proper prefix of T.
• Definitions of suffix and proper suffix are similar.

44
String Definitions

• We defined substring. Subsequence is similar except the

characters need not be consecutive.

• “cat” is a substring and a subsequence of “concatenate”

• “cant” is a subsequence of “concatenate”, but not a

substring

45
Exact matching

• Looking for places where a pattern P occurs as a substring

of a text T. Each such place is an occurrence or match.

• An alignment is a way of putting P’s characters opposite

T’s characters. It may or may not correspond to an
occurrence.

46
Exact Matching

47
Exact matching: naïve algorithm

48
Exact matching: naïve algorithm

49
Exact matching: naïve algorithm

50
Can we improve on the naïve algorithm?

P: word
T: There would have been a time for such a word
word

u doesn’t occur in P,so skip next two alignments

P: word
T: There would have been a time for such a word
word
word skip!
word skip!
word

Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Bioinformatics Sequence Alignment Guide
No ratings yet
Bioinformatics Sequence Alignment Guide
47 pages
Sequence Alignment
No ratings yet
Sequence Alignment
24 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Genomics and Similarity Search
No ratings yet
Genomics and Similarity Search
43 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
Sequence Alignment
No ratings yet
Sequence Alignment
25 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Lecture 6
No ratings yet
Lecture 6
31 pages
Genomics & Proteomics Overview
No ratings yet
Genomics & Proteomics Overview
89 pages
Sequence Alignment
No ratings yet
Sequence Alignment
63 pages
Sequences Alignments (Similarity & Homology)
No ratings yet
Sequences Alignments (Similarity & Homology)
32 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
54 pages
Pairwise Alignment Prelab PDF
No ratings yet
Pairwise Alignment Prelab PDF
87 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
Unit3 Final
No ratings yet
Unit3 Final
114 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
61 pages
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
No ratings yet
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
107 pages
Lec 02
No ratings yet
Lec 02
103 pages
Retrieval of Data
No ratings yet
Retrieval of Data
22 pages
BLAST and Sequence Alignment
No ratings yet
BLAST and Sequence Alignment
36 pages
Bioinformatics 2
No ratings yet
Bioinformatics 2
26 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
No ratings yet
Sequence Alignment: Sequence Alignment Is The Most Important Task in Bioinformatics!
13 pages
Algorithms On Strings Trees and Sequence PDF
No ratings yet
Algorithms On Strings Trees and Sequence PDF
326 pages
Algorithms On String Trees and Sequences
No ratings yet
Algorithms On String Trees and Sequences
326 pages
Bioinformatics Sequence Alignments
No ratings yet
Bioinformatics Sequence Alignments
37 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
54 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Sequence Alignment
No ratings yet
Sequence Alignment
36 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Sequence Alignment Basics
No ratings yet
Sequence Alignment Basics
27 pages
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
No ratings yet
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
18 pages
L6-Pairwise Seq Alignment
No ratings yet
L6-Pairwise Seq Alignment
70 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
55 pages
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
No ratings yet
Local and Global Sequence Alignment 12 by DR Sheikh Arslan Sehgal
59 pages
Sequence Alignment - Final
No ratings yet
Sequence Alignment - Final
6 pages
BIF401 Current Papers Solution Part 1
No ratings yet
BIF401 Current Papers Solution Part 1
6 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Bioinformatics Sequence Alignment
No ratings yet
Bioinformatics Sequence Alignment
32 pages
Bioinformatics Intro
No ratings yet
Bioinformatics Intro
69 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
Sequence Alignment & BLAST Guide
No ratings yet
Sequence Alignment & BLAST Guide
37 pages
Lecture1 Loi
No ratings yet
Lecture1 Loi
52 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
89 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
B.I Sec 4.
No ratings yet
B.I Sec 4.
18 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Lecture 3
No ratings yet
Lecture 3
39 pages
Lecture 6 - Sequence Analysis
No ratings yet
Lecture 6 - Sequence Analysis
28 pages
Bioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
Bioinfo Ders 7 ALLIGNMENT - 1
55 pages
Advanced Sequence Alignment Guide
No ratings yet
Advanced Sequence Alignment Guide
83 pages
Chapter 2 Bioinformatics
No ratings yet
Chapter 2 Bioinformatics
9 pages
Evaporation Falling Film Forced Circulation Vapor Recompression Gea 170546
No ratings yet
Evaporation Falling Film Forced Circulation Vapor Recompression Gea 170546
20 pages
Restriction Enzyme
No ratings yet
Restriction Enzyme
15 pages
Minimum Water Content
No ratings yet
Minimum Water Content
19 pages
Rate of Reaction in Cooking
No ratings yet
Rate of Reaction in Cooking
9 pages
Properties of Fibres
No ratings yet
Properties of Fibres
16 pages
Catalog Conmik Vietnam
No ratings yet
Catalog Conmik Vietnam
3 pages
Bitumode Membrane
No ratings yet
Bitumode Membrane
2 pages
Insects and Plants Essay
No ratings yet
Insects and Plants Essay
4 pages
SDS Renolin Unysin CLP 320
No ratings yet
SDS Renolin Unysin CLP 320
7 pages
USP-NF 470 Determination of Ethylene Glycol, Diethylene Glycol, and Triethylene Glycol in Polyethylene Glycol
No ratings yet
USP-NF 470 Determination of Ethylene Glycol, Diethylene Glycol, and Triethylene Glycol in Polyethylene Glycol
8 pages
Water Fluoridation & Defluoridation
No ratings yet
Water Fluoridation & Defluoridation
14 pages
Hot Insulated Pipe Supports
No ratings yet
Hot Insulated Pipe Supports
18 pages
Procedure For Continuous Distillation Column Design: Panorama Consulting & Engineering Inc. USA Blog Process Engineering
No ratings yet
Procedure For Continuous Distillation Column Design: Panorama Consulting & Engineering Inc. USA Blog Process Engineering
3 pages
DAT Destroyer Study Guide
100% (1)
DAT Destroyer Study Guide
29 pages
Alcohols and Carboxylic Acids: Test Yourself 23.1 and 23.2 (Page 453)
40% (5)
Alcohols and Carboxylic Acids: Test Yourself 23.1 and 23.2 (Page 453)
2 pages
MSDS Gadus S 2 V220ac2
No ratings yet
MSDS Gadus S 2 V220ac2
8 pages
Rook & Rebel by Kate Crew pdf.2
No ratings yet
Rook & Rebel by Kate Crew pdf.2
1 page
Biochemistry: Life's Molecular Basis
No ratings yet
Biochemistry: Life's Molecular Basis
33 pages
Heating and Cooling Day 4 Class VIII
No ratings yet
Heating and Cooling Day 4 Class VIII
13 pages
Jee Mains 2026 30 Chapters PDF
No ratings yet
Jee Mains 2026 30 Chapters PDF
19 pages
Iii. Lipid:: Structures of Some Common Lipids
No ratings yet
Iii. Lipid:: Structures of Some Common Lipids
4 pages
Experiment 9 - Synthesis of Polymer
No ratings yet
Experiment 9 - Synthesis of Polymer
8 pages
Class X Yearly Syllabus Science 1
No ratings yet
Class X Yearly Syllabus Science 1
4 pages
Papablic Product Instruction-Bottle Sterilizer Dryer
No ratings yet
Papablic Product Instruction-Bottle Sterilizer Dryer
19 pages
Solvents for Thermoplastics
No ratings yet
Solvents for Thermoplastics
1 page
Namma Kalvi 11th Chemistry Model Question Paper em 218154
No ratings yet
Namma Kalvi 11th Chemistry Model Question Paper em 218154
4 pages
SRTSDS (Zdec)
No ratings yet
SRTSDS (Zdec)
6 pages
Activity 4 Properties of Solid
No ratings yet
Activity 4 Properties of Solid
6 pages
Y7 - Revision Test
No ratings yet
Y7 - Revision Test
34 pages
Worksheet 12 Bio (2021) STEP
No ratings yet
Worksheet 12 Bio (2021) STEP
26 pages

Bio 3

Uploaded by

Bio 3

Uploaded by

BIOINFORMATICS(BIOCOMPUTING)

tertiary structure of domains of CD4

• Arrangement of protein subunits

• Structure (chemically interact).

• Need to feel comfortable in interdisciplinary area

• Finding Genes in Genomic DNA

• A DNA sequencer is a scientific instrument used to automate the DNA

Alignment is the key to

• Homology: Homology among DNA, or proteins is

• Procedure of comparing two (pairwise) or more

• Procedure of comparing two (pairwise) or more

an alignment is an assignment of gaps to positions 0,…, M

– an entire chromosome (aneuploidy or polysomy)

• Let us try to align two short nucleotide sequences:

AATCTATA AATCTATA AATCTATA

• Which one is better?

• For the simple example, assume a match score of 1 and a

• If gap penalty is -1 what will be the new

AATCTATA AATCTATA AATCTATA

A string S is a finite ordered list of characters.

Characters are drawn from an alphabet Σ.

Nucleic acid alphabet: { A, C, G, T }

Length of S, |S |, is the number of characters in S

ϵ is the empty string. | ϵ | = 0

• For strings S and T over Σ, their concatenation consists of the characters of

• S is a substring of T if there exist (possibly empty) strings u and v such that

• S is a prefix of T if there exists a string u such that T = Su.

• We defined substring. Subsequence is similar except the

• “cat” is a substring and a subsequence of “concatenate”

• “cant” is a subsequence of “concatenate”, but not a

• Looking for places where a pattern P occurs as a substring

• An alignment is a way of putting P’s characters opposite

u doesn’t occur in P,so skip next two alignments

You might also like