KMP Algorithm
Knuth-Morris-Pratt Algorithm for Pattern Matching
KMP Algorithm
Knuth-Morris-Pratt (KMP) is an algorithm for finding patterns in text with time complexity
Main Concept
KMP avoids redundant comparisons by using a Failure Function (or Prefix Function) that tells how much to shift the pattern when a match fails.
Failure Function (Prefix Function)
For pattern of length :
- = length of the longest proper prefix of that is also a suffix
Example
Pattern: "ABABAC"
| i | P[0..i] | pi[i] | Explanation |
|---|---|---|---|
| 0 | A | 0 | No proper prefix |
| 1 | AB | 0 | A != B |
| 2 | ABA | 1 | A = A |
| 3 | ABAB | 2 | AB = AB |
| 4 | ABABA | 3 | ABA = ABA |
| 5 | ABABAC | 0 | No match |
Implementation
Compute Prefix Function
vector<int> computePrefix(const string& pattern) {
int m = pattern.size();
vector<int> pi(m, 0);
int k = 0; // length of current matching prefix
for (int i = 1; i < m; i++) {
// If mismatch, fallback to shorter prefix
while (k > 0 && pattern[k] != pattern[i]) {
k = pi[k - 1];
}
// If match, extend prefix
if (pattern[k] == pattern[i]) {
k++;
}
pi[i] = k;
}
return pi;
}
KMP Search
vector<int> kmpSearch(const string& text, const string& pattern) {
int n = text.size();
int m = pattern.size();
vector<int> pi = computePrefix(pattern);
vector<int> matches;
int j = 0; // number of characters matched
for (int i = 0; i < n; i++) {
// If mismatch, use failure function
while (j > 0 && pattern[j] != text[i]) {
j = pi[j - 1];
}
// If match, increment j
if (pattern[j] == text[i]) {
j++;
}
// If entire pattern matched
if (j == m) {
matches.push_back(i - m + 1);
j = pi[j - 1]; // prepare for next match
}
}
return matches;
}
Visualization
Text: A B A B A B A C A B A
Pattern: A B A B A C
Step 1: ABABA matches, C != B
Use pi[4] = 3, shift pattern
Text: A B A B A B A C A B A
Pattern: A B A B A C
Step 2: ABAB matches, A != B
Use pi[3] = 2, shift pattern
Text: A B A B A B A C A B A
Pattern: A B A B A C
Step 3: ABABAC matches! Found at index 2
Applications
1. Count Number of Occurrences
int countOccurrences(const string& text, const string& pattern) {
return kmpSearch(text, pattern).size();
}
2. Find Period of String
Period is the shortest string such that pattern = (repeated times)
int findPeriod(const string& s) {
vector<int> pi = computePrefix(s);
int n = s.size();
int period = n - pi[n - 1];
// Verify if it's actually a period
if (n % period == 0) {
return period;
}
return n; // No period shorter than itself
}
3. Find Shortest Palindrome (Prepend)
string shortestPalindrome(string s) {
string rev = s;
reverse(rev.begin(), rev.end());
string combined = s + "#" + rev;
vector<int> pi = computePrefix(combined);
int toAdd = s.size() - pi.back();
return rev.substr(0, toAdd) + s;
}
Complexity
| Operation | Time | Space |
|---|---|---|
| Build prefix function | ||
| Search | ||
| Total |
Comparison with Other Algorithms
| Algorithm | Preprocessing | Search | Best For |
|---|---|---|---|
| Naive | Short patterns | ||
| KMP | General purpose | ||
| Rabin-Karp | average | Multiple patterns | |
| Z-function | Similar to KMP |
Tips
- Failure function is very useful - can solve many other problems
- Use sentinel character - when concatenating strings, add separator like
# - Watch out for 0-indexed vs 1-indexed - code above uses 0-indexed