The Knuth-Morris-Pratt (KMP) algorithm
It is an efficient string-matching algorithm that avoids backtracking, improving upon
the basic string matching approach.
Introduced in 1970 by Knuth, Morris, and Pratt, this algorithm efficiently identifies
patterns within text by employing a 'Prefix Table,' alternatively known as the LPS
(Longest Proper Prefix which is also Suffix) Table.
KMP algorithm is used to find a "Pattern" in a "Text". This algorithm compares
character by character from left to right. But whenever a mismatch occurs, it uses a
preprocessed table called "Prefix Table" to skip characters comparison while matching.
Some times prefix table is also known as LPS Table.
Steps for Creating LPS Table (Prefix Table)
Step 1 - Define a one dimensional array with the size equal to the length of the Pattern.
(LPS[size])
Step 2 - Define variables i & j. Set i = 0, j = 1 and LPS[0] = 0.
Step 3 - Compare the characters at Pattern[i] and Pattern[j].
Step 4 - If both are matched then set LPS[j] = i+1 and increment both i & j values by
one. Goto to
Step 3.
Step 5 - If both are not matched then check the value of variable 'i'. If it is '0' then set
LPS[j] = 0
and increment 'j' value by one, if it is not '0' then set i = LPS[i-1]. Goto Step 3.
Step 6- Repeat above steps until all the values of LPS[] are filled.
How to use LPS Table
LPS table is used to decide how many characters are to be skipped for comparison when
a mismatch has occurred.
When a mismatch occurs, check the LPS value of the previous character of the
mismatched character in the pattern.
o If it is '0' then start comparing the first character of the pattern with the next
character to the mismatched character in the text.
o If it is not '0' then start comparing the character which is at an index value equal
to the LPS value of the previous character to the mismatched character in pattern
with the mismatched character in the Text.
How the KMP Algorithm Works
Consider the following Text and pattern
Develop a Program in C for the following operations on Strings. a) Read a main String (STR),
a Pattern String (PAT) and a Replace String (REP) b) Perform Pattern Matching Operation:
Find and Replace all occurrences of PAT in STR with REP if PAT exists in STR. Report
suitable messages in case PAT does not exist in STR Support the program with functions for
each of the above operations. Don't use Built-in functions. Use KMP Algorithm
#include <stdio.h>
#include <string.h>
// Function to compute the longest prefix suffix (LPS) array
void computeLPSArray(char* pat, int M, int* lps) {
int length = 0;
lps[0] = 0; // lps[0] is always 0
int i = 1;
while (i < M) {
if (pat[i] == pat[length]) {
length++;
lps[i] = length;
i++;
} else {
if (length != 0) {
length = lps[length - 1];
} else {
lps[i] = 0;
i++;
}
}
}
}
// KMP search algorithm
void KMPSearch(char* pat, char* txt) {
int M = strlen(pat);
int N = strlen(txt);
int lps[M];
computeLPSArray(pat, M, lps);
int i = 0; // index for txt[]
int j = 0; // index for pat[]
int found = 0; // Flag to check if pattern is found
while (i < N) {
if (pat[j] == txt[i]) {
j++;
i++;
}
if (j == M) {
printf("Found pattern at index %d\n", i – j+1);
found = 1; // Pattern found
j = lps[j - 1];
} else if (i < N && pat[j] != txt[i]) {
if (j != 0) {
j = lps[j - 1];
} else {
i++;
}
}
}
if (!found) {
printf("Pattern not found\n");
}
}
int main() {
char txt[100], pat[100];
printf("Enter the text: ");
gets(txt);
printf("Enter the pattern: ");
gets(pat);
KMPSearch(pat, txt);
return 0;
}
Output
Enter the text: ababcabcabababd
Enter the pattern: ababd
Found pattern at index 11
Enter the text: ababcabcabababd
Enter the pattern: bcaba
Found pattern at index 7