0% found this document useful (0 votes)

55 views38 pages

Lecture 7

This document discusses probability distributions for alignment scores between random sequences. It explains that Dayhoff's experiment of aligning randomly scrambled sequences produced a normal distribution of scores, but Lipman's experiment of aligning natural sequences produced a wider distribution. This is because natural sequences have local patterns and compositions that produce some matches even between unrelated sequences. The document introduces the extreme value distribution as a better model than the normal distribution for estimating the significance of alignment scores.

Uploaded by

Mohsan Ullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views38 pages

Lecture 7

Uploaded by

Mohsan Ullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

bioinformatics 1 -- lecture 7

Probability and conditional probability

Random sequences and significance (real sequences are
not random)

Erdos & Renyi: theoretical basis for the significance of an

alignment given its length and score

Extreme value distribution, better than a Gaussian for

estimating significance.

E-values
How significant is that?
...a specific
inference. Thanks.
...as opposed to...
...how likely the data would not
have been the result of chance,...

Please give me a number for...

Dayhoff's randomization
experiment
Margaret Dayhoff

Aligned scrambled Protein A versus scrambled Protein B

100 times (re-scrambling each time).
NOTE: scrambling does not change the AA composition!
Results: A Normal Distribution
significance of a score is measured
as the probability of getting
this score in a random alignment
Why do we need a “model” for
significance?

Because random scores end here.

And non-random scores start here.
And we care about the difference
significance here.

So we want to be able to calculate the significance.

Easy right? Just fit to a Gaussian (i.e. normal) distribution...?
Lipman's randomization experiment
Aligned Protein A to 100 natural sequences, not scrambled.
Results: A wider normal distribution (Std dev = ~3 times larger)
David Lipman
WHY? Because natural sequences are different than random.
Even unrelated sequences have similar local patterns, and
uneven amino acid composition.
Was the significance
over-estimated using
Dayhoff's method?

Lippman got a similar result if he randomized the sequences by

words instead of letters.
Why was Lippman’s distribution wider?
low complexity
complexity = sequence heterogeneity
A low complexity sequence is homogeneous in its
composition. For example:
AAAAAAAAHAAAAAAAAKAAAAAEAA
is a low-complexity sequence.

Compared to other sequences, there are relatively

few ways to make a 26-residue sequence that has 23
A's, 1 H, 1 K and 1 E.
Why was Lipman’s distribution wider? (part 2)

Local patterns (words).

The three-letter sequence PGD occurs more often than

expected by chance because PGD occurs in beta-turns.
The amphipathic helix pattern pnnppnn (p=polar, n=non-polar) occurs
more often than expected because it folds into an alpha helix

•Even sequences that are non-homologous (Lippman’s exp)

have word matches. But scrambled sequences (Dayhoff’s exp)
do not have recurrent words.
Dayhoff & Lipman assumed a
normal distribution for random
alignment scores.
Should it really be a normal
distribution?

I guess we need a theoretical foundation

Probability
Susan B Anthony The Queen Ireland

coin S coin Q coin I

"P(H)" means the probability of H, heads. 0≤P≤1

Unconditional probabilities
Joint probability of a sequence of 3 flips, given any one (un)fair coin,
is the product.
P(HHT) = P(H)*P(H)*P(T)

Conditional probabilities
If the coins are "unfair" (not 50:50), then P(H) depends on the
coin you choose (S,Q or I). P(H) is "conditional" on the choice
of coin, which may have its own odds.

P(S,H) = P(S)*P(H|S)
Conditional probabilities
"P(A|B)" means the probability of A given B, where A is the
result or observation, B is the condition. (The condition
may be a result/observation of a previous condition.)

P(H|S) is the probability of H (heads) given that the coin

is S.

In general, the probability of two things together, A and B, is

P(A,B) = P(A|B)P(B) = P(B|A)P(A)

Divide by P(B), you get Bayes' rule:

To reverse the order of conditional
P(A|B)= P(B|A)* P(A) /P(B) probabilities, multiply by the ratio of the
probabilities of the conditions.
Scoring alignments using P
For each aligned position (match), we get P(A|B) , which is the
substitution probability. Ignoring all but the first letters, the
probability of these two sequences being homologs is:
P(s1[1]|s2[1]) substitution of s2[1] for s1[1]

Ignoring all but the first two letters, it is:

where S(A|B) = 2*log2( P(A|B)/P(A) ) = BLOSUM score

This is the form of the substitution score,

Log-likelihood ratios (alias LLRs, log-
odds, lods). Usually “2 times log2 of the
probability ratio” (or “half-bits”).
The point is...
• Alignment scores are like coin tosses.
• ...with 20 “unfair” coins...
• (...and each coin actually has 20 sides...)

To establish the theoretical foundation, we first go to

the simple case.

14
Theoretical distribution of coin
toss alignments.
Consider a fair coin, tossed n=25 times. The sequence is, let’s say:
HTHTHTTTHHHTHTTHHTHHHHHTH
The longest row of H’s is 5 in this case.
What is the expected length of the longest row of H's given n?
Erdos & Renyi equation:
number of E(n) = log1/p(n)
times it
occurred where p is the P(H).

Hey! Score
never goes
below zero.
length of longest sequence of H
Heads = match, tails = mismatch
Similarly, we can define an expectation value, E(M), for the
longest row of matches in an alignment of length n. E(M) is
calculated similar to the heads/tails way, using the Erdos &
Renyi equation (p is the odds of a match, 1-p is the odds of a
mismatch):
E(M) = log1/p(M) expectation given an alignment of length M

But over all possible alignments of two sequences of length

n, the number is
E(M) = log1/p(n*n) = 2 log1/p(n)
If the two sequences are length n and length m, it is
E(M) = log1/p(mn) [+ some constant terms that don’t depend on m and n]
Heads/tails = match/mismatch
Theoretically derived equation for the expectation value for
M, the longest block of Matches.

E(M) = log1/p(mn) + log1/p(1-p) + 0.577log(e) - 1/2

Note that we can define a number K such that log1/p(K) = constant terms.

E(M) = loge(Kmn)/λ

...where λ = loge(1/p)
Theoretical expectation value
E(M) = log1/p(mn) + log1/p(1-p) + 0.577log(e) - 1/2

E(M) = loge(Kmn)/λ
Solving, using p=0.25, we get K=0.6237, λ= loge(4) = 1.386, m=n=140

E(M) = loge(Kmn)/λ = 6.8

This would be the most likely number of matches in an

alignment of two random DNA sequences of length 100
and 100. You can try this! Align randomly selected
100bp DNA using local DP alignment (i.e. LALIGN
server. DNA. Local. Gap penalty=0). Count the longest
string of matches. Plot the distribution of scores.

google: “LALIGN server” or http://www.ch.embnet.org/software/LALIGN_form.html

P(S > x)
E(M) gives us the expected length of the longest number of
matches in a row. But, what we really want is the answer to this
question:

How good is the score x? (i.e. how significant)

So, we need to model the whole distribution of chance scores,

then ask how likely is it that my score or greater comes from
that model.

freq

score
Distribution Definitions
Mean = average value.
Mode = most probable value.
extremes = minimum and maximum values.
Standard deviation = one type of decay function.

For a variable whose distribution comes from extreme

value, such as random sequence alignment scores, the
score must be greater than expected from a normal
distribution to achieve the same level of significance.
A Normal Distribution
Usually, we suppose the likelihood of deviating from the mean
by x in the positive direction is the same as the likelihood of
deviating by x in the negative direction, and the likelihood of
devating by x decreases as the power of x.
Why? Because multiplying probabilities gives this type of
curve.
This is called a Normal, or Gaussian distribution.
Extreme value distribution, a distribution derived
from extreme values

Get statistics from lots of

optimal scores

best score (optimal)

Using only best
all possible scores scores produces a
for two sequences = skewed distrib.
Normal distrib.

Extreme value distribution

y = exp(–x – e–x)
EVD has this shape. But the Mode and decay parameters depend on the data.
Empirical proof of the EVD

Suppose you had a Gaussian distribution “dart-board”. You

throw 1000 darts randomly. Score your darts according the
number on the X-axis where it lands. What is the
probability distribution of scores?
Answer:The same Gaussian distribution! (duh)
Empirical proof of the EVD

What if we throw 10 darts at a time and remove all but the

highest-scoring dart. Do that 1000 times. What is the
distribution of the scores?
Empirical proof of the EVD

The EVD
gets sharper
as we
increase the
number of
darts thrown
Empirical proof of the EVD
Extreme value distribution for sequence scores

The EVD with mode

uλ and decay y = exp(–x – e–λ(x-u))
parameter λ:
The mode, from
the Erdos & u = loge(Kmn)/λ
Renyi equation:

substituting gives: P(x) = exp(–x – e–λ(x-ln(Kmn)/λ))

Integrating from x P(S≥x) = 1 - exp(-Kmne-λx)

to infinity gives:
Theoretical value for the decay
function λ
λ is calculated as the value of x that satisfies:

ΣpipjeSijx = 1
Substitution matrix values.

Sij is the log-likelihood ratio, log[P(i->j)/pipj]. So, eSij is the likelihood ratio,
P(i->j)/pipj. So eSijx is exP(i->j)/pipj. Let ex = pipj (averaged over all i and j), then
exP(i->j)/pipj. = P(i->j), and this sums to one by definition.
So λ = log( <pipj> ) = the log of the average expected subsitution probability pipj.
voodoo mathematics
For values of x greater than 1, we can make this approximation:

1-exp[-e-x] ≈ e-x
The integral of
the EVD, P(S≥x) = 1 - exp(-Kmne-λx)
approximates to, P(S≥x) ≈ Kmne-λx
Taking the log of both
sides, we get log(P(S≥x)) ≈ log(Kmn) - λx
which is linear.

The linearized EVD may be used to find the parameters K and

λ empirically (instead of theoretically).
Finding the EVD parameters
The Linearized EVD log(P(S≥x)) ≈ log(Kmn) - λx
To determine K and λ, we can plot log(P(S≥x)) versus x
(x = false alignment score), and then fit it. (least squares)
1. Generate a large number of known false alignment scores S.
(all with the same query length m and database size n).
2. Calculate P(S≥x) for all x.
3. Plot log(P(S≥x)) versus x.
4. Fit the data to a line. (Least squares)
5. The slope is −λ.
6. Intercept is log(Kmn). Solve for K.
7. Use K and λ to calculate the e-value = exp( log(Kmn) - λx )
x
x The slope is −λ, (the
logP(S≥x)

x x
xxx
xx intercept is log(Kmn)
xx
xx
x x xxx
x x
x
e-values in BLAST
•Every BLAST "hit" has a bit-score, x, derived from the
substitution matrix.
•Parameters for the EVD have been previously calculated
for m and n, the lengths of the database and query.
•Applying the EVD to x we get P(S≥x), which is our "p-
value"
•To get the “e-value” (expected number of times this score
will occur over the whole database) we multiply by the size
of the database m.

e-value(x) = P(S≥x)*m
where x is the alignment score, m is the size of the database, and P is calculated from
false alignment scores.
Matrix bias in local alignment
In Local Alignment we take a MAX over zero (0) and three other
scores (diagonal, across, down). Matrix Bias is added to all
match scores, so the average match score,and the extremes, can be
adjusted.

What happens if match scores are....?

all negative? : Best alignment is always no alignment.
all positive? : Best alignment is gapless, global-local.
average positive? : Best alignment is local (longer).
Typical random alignment is local.
average negative? : Best alignment is local (shorter).
Typical random alignment is no alignment.
Altschul's Principle
For local DP alignment, the match (substitution) scores should be
> zero for a match, and
< zero for a mismatch,
on average! (Some mismatches may have a > 0 score)
Why?
Matrix bias to control alignment length
*

Matrix bias = constant added to each match score.

If we add a constant to each value in the substitution matrix, it

favors matches over gaps. As we increase matrix bias...
• Longer alignments are more common in random sets.
• Longer alignments are less significant.

Negative matrix No matrix bias

Positive matrix bias
bias
*aka: bonus
History of significance
•Significance of a score is measured by the probability of
getting that score by chance.
•History of modeling “chance” in alignments
•1970’s Dayhoff: Guassian fit to scrambled alignments
•1980’s Lipman: Gaussian fit to false alignments
•1990’s Altschul: EVD fit to false alignments
summary of significance

•The expected value for the maximum length of a match

between two sequences, lengths n and m, given the probability
of a match p, has a theoretical solution, which is log(1/p)(nm),
the Erdos & Lenyi equation. The
•The score of a DP alignment is the maximum score, which is
roughly proportional to the length (for local alignments only!).
Therefore, the expected value of alignment scores follows the
same theoretical distribution.
Review
•What does the Extreme Value Distribution model?
•How are the parameters (λ,K) of the EVD determined?
•Once λ and K have been found, how do we calculate the e-
value?
•What is the meaning of the e-value?
•For a given score x and a given database, which method gives
the highest estimate and which gives the lowest estimate of the
significance of x?
--Dayhoff’s method or scrambled sequences.
--Lipman’s method of false alignments, using a Normal
distribution.
--False alignments using the EVD.
Pop-quiz
You did a BLAST search using a sequence that has
absolutely no homologs in the database. Absolutely
none.
The BLAST search gave you false “hits” with the top e-
values ranging from 0 to 20. You look at them and you
notice a pattern in the e-values.
How many of your hits have e-value ≤ 10.?

Bioinformatics 1 - Lecture 8: Random Sequences and Significance Erdos & Renyi
No ratings yet
Bioinformatics 1 - Lecture 8: Random Sequences and Significance Erdos & Renyi
37 pages
Bioinformatics 1 - Lecture 8: Probability Significance Extreme Value Distribution
No ratings yet
Bioinformatics 1 - Lecture 8: Probability Significance Extreme Value Distribution
31 pages
Probabilistic Analysis in Algorithms
No ratings yet
Probabilistic Analysis in Algorithms
6 pages
Lecture 1
No ratings yet
Lecture 1
41 pages
JB Ise Probability
No ratings yet
JB Ise Probability
53 pages
Probablity
No ratings yet
Probablity
310 pages
Lecture 5: Statistical Independence, Discrete Random Variables
No ratings yet
Lecture 5: Statistical Independence, Discrete Random Variables
4 pages
Prelims Prob Notes 2024
No ratings yet
Prelims Prob Notes 2024
71 pages
PrelimsProb MT23 26sep2023
No ratings yet
PrelimsProb MT23 26sep2023
71 pages
Lecture2 Math ML Review
No ratings yet
Lecture2 Math ML Review
87 pages
340 Printable Course Notes
No ratings yet
340 Printable Course Notes
184 pages
Discrete Random Variables & Probability
No ratings yet
Discrete Random Variables & Probability
44 pages
24-01-24 1150 Marked Slides
No ratings yet
24-01-24 1150 Marked Slides
31 pages
Probablity
100% (1)
Probablity
312 pages
Unexpected - Expectations 2015
No ratings yet
Unexpected - Expectations 2015
15 pages
Computational Linguistics Basics
No ratings yet
Computational Linguistics Basics
37 pages
Astro Stat Book of Notes
No ratings yet
Astro Stat Book of Notes
440 pages
Probability Concepts and Bayesian Analysis
No ratings yet
Probability Concepts and Bayesian Analysis
57 pages
Statistics
No ratings yet
Statistics
5 pages
STAT7055 Cheat Sheet
No ratings yet
STAT7055 Cheat Sheet
5 pages
Stats Semis
No ratings yet
Stats Semis
18 pages
Probability Distributions
No ratings yet
Probability Distributions
71 pages
Statistical Inference
No ratings yet
Statistical Inference
29 pages
Fundamentals of Mathematical Statistics: Pavol Oršanský
No ratings yet
Fundamentals of Mathematical Statistics: Pavol Oršanský
85 pages
Random Variables Tarea Teoría
No ratings yet
Random Variables Tarea Teoría
8 pages
Basicsofstatisticalmethods PDF
No ratings yet
Basicsofstatisticalmethods PDF
85 pages
Expectation of Geometric Distribution Variance and Standard Deviation
No ratings yet
Expectation of Geometric Distribution Variance and Standard Deviation
5 pages
Random Variables: 1.1 Elementary Examples
No ratings yet
Random Variables: 1.1 Elementary Examples
14 pages
Ikaj Stochmod Lectnotes
No ratings yet
Ikaj Stochmod Lectnotes
114 pages
Statistical Modeling & Intro To Probability
No ratings yet
Statistical Modeling & Intro To Probability
31 pages
Ch3 Lecture 1
No ratings yet
Ch3 Lecture 1
23 pages
Hoel, Port, Stone
100% (1)
Hoel, Port, Stone
270 pages
Introduction To Probability Theory-Houghton Mifflin
No ratings yet
Introduction To Probability Theory-Houghton Mifflin
271 pages
Stochbasics Handout
No ratings yet
Stochbasics Handout
36 pages
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
No ratings yet
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
7 pages
CDSS - Day 2
No ratings yet
CDSS - Day 2
230 pages
2-Probability and Sampling Distribution
No ratings yet
2-Probability and Sampling Distribution
92 pages
Stats 1 - IITM BS Notes - Part 3
No ratings yet
Stats 1 - IITM BS Notes - Part 3
16 pages
Probability and Statistics: To P, or Not To P?: Module Leader: DR James Abdey
No ratings yet
Probability and Statistics: To P, or Not To P?: Module Leader: DR James Abdey
5 pages
CS 725: Foundations of Machine Learning: Lecture 2. Overview of Probability Theory For ML
No ratings yet
CS 725: Foundations of Machine Learning: Lecture 2. Overview of Probability Theory For ML
23 pages
AP Statistics Guide for Students
100% (2)
AP Statistics Guide for Students
12 pages
Probabilistic PDF
No ratings yet
Probabilistic PDF
16 pages
Probability and Random Variables
No ratings yet
Probability and Random Variables
14 pages
49538ad5e2701462f3121414ecb10ba7
No ratings yet
49538ad5e2701462f3121414ecb10ba7
241 pages
Modeling With Probability
No ratings yet
Modeling With Probability
91 pages
Probability - Course Notes
No ratings yet
Probability - Course Notes
58 pages
5 Probability
No ratings yet
5 Probability
51 pages
EEE 6542 - Lecture 3 Notes - Complete - F2024
No ratings yet
EEE 6542 - Lecture 3 Notes - Complete - F2024
53 pages
STK110 - Chapter 4
No ratings yet
STK110 - Chapter 4
47 pages
MA1610 F2014 Class Notes
No ratings yet
MA1610 F2014 Class Notes
20 pages
Osobine Var
No ratings yet
Osobine Var
19 pages
Lecture 05. Introduction To Probability Web
No ratings yet
Lecture 05. Introduction To Probability Web
38 pages
Avian RNA Viruses vs. Chicken IFNs
No ratings yet
Avian RNA Viruses vs. Chicken IFNs
8 pages
Bioinformatics 1 - Lecture 8: Multiple Sequence Alignment
No ratings yet
Bioinformatics 1 - Lecture 8: Multiple Sequence Alignment
20 pages
Bioinformatics 1: Lecture 3: - Pairwise Alignment - Substitution - Dynamic Programming Algorithm
No ratings yet
Bioinformatics 1: Lecture 3: - Pairwise Alignment - Substitution - Dynamic Programming Algorithm
32 pages
Avian Viruses
No ratings yet
Avian Viruses
10 pages
(ASC-2012-0548-zj) Expression of Chicken Toll Like Receptors and Signal Adaptors in Spleen and Cecum of Young Chickens Infected With Eimeria Tenella
No ratings yet
(ASC-2012-0548-zj) Expression of Chicken Toll Like Receptors and Signal Adaptors in Spleen and Cecum of Young Chickens Infected With Eimeria Tenella
10 pages
Cambridge IGCSE: Information and Communication Technology 0417/22
No ratings yet
Cambridge IGCSE: Information and Communication Technology 0417/22
12 pages
Big Data (6CS030) Individual Assignment
No ratings yet
Big Data (6CS030) Individual Assignment
9 pages
Fifteenth Edition: Enhancing Decision Making
No ratings yet
Fifteenth Edition: Enhancing Decision Making
35 pages
Lesson 6 Paired T Test
No ratings yet
Lesson 6 Paired T Test
8 pages
IP Preboard Question Paper
No ratings yet
IP Preboard Question Paper
4 pages
30.two Factor Worm Detection Based On Signature Anomaly
No ratings yet
30.two Factor Worm Detection Based On Signature Anomaly
81 pages
Software Engineering Project
No ratings yet
Software Engineering Project
9 pages
WD & DBA Project
No ratings yet
WD & DBA Project
2 pages
ABAP Managed Database Procedure Guide
No ratings yet
ABAP Managed Database Procedure Guide
16 pages
Informatica Mass Ingestion October 2023
No ratings yet
Informatica Mass Ingestion October 2023
10 pages
PP-4 QP
No ratings yet
PP-4 QP
7 pages
Spring Data JPA @query - Baeldung
No ratings yet
Spring Data JPA @query - Baeldung
10 pages
Resume CV 1 1
No ratings yet
Resume CV 1 1
1 page
Assignment 6
No ratings yet
Assignment 6
5 pages
Sankaripugal PDF C
No ratings yet
Sankaripugal PDF C
2 pages
Database Normalization Guide
No ratings yet
Database Normalization Guide
72 pages
Computer-Assisted Translation
No ratings yet
Computer-Assisted Translation
3 pages
Beginning C - Through Game Prog - Michael Dawson
50% (2)
Beginning C - Through Game Prog - Michael Dawson
11 pages
Milestone 2
No ratings yet
Milestone 2
8 pages
Ritesh Kumar Sinha: Tech Skills & Projects
No ratings yet
Ritesh Kumar Sinha: Tech Skills & Projects
2 pages
NeoLogica DICOM Jet - DM - BrochureEng - v2
No ratings yet
NeoLogica DICOM Jet - DM - BrochureEng - v2
2 pages
HR Oracle Database
No ratings yet
HR Oracle Database
25 pages
Data Science & Big Data Course Guide
No ratings yet
Data Science & Big Data Course Guide
119 pages
COMP 3430 Operating Systems: Instructions
No ratings yet
COMP 3430 Operating Systems: Instructions
3 pages
Manual Fingertec Ingress
No ratings yet
Manual Fingertec Ingress
104 pages
Kela Soor
No ratings yet
Kela Soor
3 pages
Database Management System Lab Prelim Q2
No ratings yet
Database Management System Lab Prelim Q2
10 pages
Vijay Kumar PDF
No ratings yet
Vijay Kumar PDF
1 page
HDA Informal Settlements Status Gauteng
No ratings yet
HDA Informal Settlements Status Gauteng
52 pages
JBDL 64
No ratings yet
JBDL 64
95 pages

Lecture 7

Uploaded by

Lecture 7

Uploaded by

bioinformatics 1 -- lecture 7

Probability and conditional probability

Erdos & Renyi: theoretical basis for the significance of an

Extreme value distribution, better than a Gaussian for

Please give me a number for...

Aligned scrambled Protein A versus scrambled Protein B

Because random scores end here.

So we want to be able to calculate the significance.

Lippman got a similar result if he randomized the sequences by

Compared to other sequences, there are relatively

Local patterns (words).

The three-letter sequence PGD occurs more often than

•Even sequences that are non-homologous (Lippman’s exp)

I guess we need a theoretical foundation

coin S coin Q coin I

"P(H)" means the probability of H, heads. 0≤P≤1

P(H|S) is the probability of H (heads) given that the coin

In general, the probability of two things together, A and B, is

Divide by P(B), you get Bayes' rule:

Ignoring all but the first two letters, it is:

where S(A|B) = 2*log2( P(A|B)/P(A) ) = BLOSUM score

This is the form of the substitution score,

To establish the theoretical foundation, we first go to

But over all possible alignments of two sequences of length

E(M) = log1/p(mn) + log1/p(1-p) + 0.577log(e) - 1/2

E(M) = loge(Kmn)/λ = 6.8

This would be the most likely number of matches in an

google: “LALIGN server” or http://www.ch.embnet.org/software/LALIGN_form.html

How good is the score x? (i.e. how significant)

So, we need to model the whole distribution of chance scores,

For a variable whose distribution comes from extreme

Get statistics from lots of

best score (optimal)

Extreme value distribution

Suppose you had a Gaussian distribution “dart-board”. You

What if we throw 10 darts at a time and remove all but the

The EVD with mode

substituting gives: P(x) = exp(–x – e–λ(x-ln(Kmn)/λ))

Integrating from x P(S≥x) = 1 - exp(-Kmne-λx)

The linearized EVD may be used to find the parameters K and

What happens if match scores are....?

Matrix bias = constant added to each match score.

If we add a constant to each value in the substitution matrix, it

Negative matrix No matrix bias

•The expected value for the maximum length of a match

You might also like