ICPC Competitive Programming

Z-Function

Z-Function for Pattern Matching and String Problems

Z-Function

Z-Function (or Z-Array) is an algorithm that computes array zz where z[i]z[i] = length of the longest substring starting from s[i]s[i] that is a prefix of ss.

Definition

For string ss of length nn:

z[i]=max{k:s[0..k1]=s[i..i+k1]}z[i] = \max\{k : s[0..k-1] = s[i..i+k-1]\}

Where z[0]=0z[0] = 0 (or some define it as nn)

Example

String: "aabxaab"

is[i..]z[i]Explanation
0aabxaab0(undefined/0)
1abxaab1”a” = s[0]
2bxaab0”b” != “a”
3xaab0”x” != “a”
4aab3”aab” = s[0..2]
5ab2”ab” = s[0..1]
6b0”b” != “a”

Algorithm

Uses the concept of Z-Box: interval [l,r][l, r] where s[l..r]s[l..r] = prefix of ss

vector<int> zFunction(const string& s) {
    int n = s.size();
    vector<int> z(n, 0);
    
    int l = 0, r = 0;  // Z-box [l, r]
    
    for (int i = 1; i < n; i++) {
        if (i < r) {
            // Use previously computed value
            z[i] = min(r - i, z[i - l]);
        }
        
        // Extend z[i] with naive comparison
        while (i + z[i] < n && s[z[i]] == s[i + z[i]]) {
            z[i]++;
        }
        
        // Update Z-box
        if (i + z[i] > r) {
            l = i;
            r = i + z[i];
        }
    }
    
    return z;
}

Visualization

s = "aabxaab"
    0123456

i=1: s[1..] = "abxaab"
     Compare with s[0..] = "aabxaab"
     "a" matches, "b" != "a"
     z[1] = 1

i=4: s[4..] = "aab"
     Compare with s[0..] = "aab..."
     "aab" matches
     z[4] = 3

Applications

1. Pattern Matching

Concatenate pattern with text: pattern + "$" + text

vector<int> zMatch(const string& text, const string& pattern) {
    string combined = pattern + "$" + text;
    vector<int> z = zFunction(combined);
    
    vector<int> matches;
    int m = pattern.size();
    
    for (int i = m + 1; i < combined.size(); i++) {
        if (z[i] == m) {
            matches.push_back(i - m - 1);
        }
    }
    
    return matches;
}

2. Find Period of String

int findSmallestPeriod(const string& s) {
    vector<int> z = zFunction(s);
    int n = s.size();
    
    for (int p = 1; p < n; p++) {
        // p is period if z[p] = n - p
        if (z[p] == n - p && n % p == 0) {
            return p;
        }
    }
    
    return n;
}

3. Count Distinct Substrings

long long countDistinctSubstrings(const string& s) {
    string current = "";
    long long total = 0;
    
    for (char c : s) {
        current = c + current;  // prepend
        vector<int> z = zFunction(current);
        
        int maxZ = 0;
        for (int i = 1; i < z.size(); i++) {
            maxZ = max(maxZ, z[i]);
        }
        
        total += current.size() - maxZ;
    }
    
    return total;
}

4. String Compression

Find shortest representation of string that is a repetition of a pattern.

string compress(const string& s) {
    vector<int> z = zFunction(s);
    int n = s.size();
    
    for (int len = 1; len <= n; len++) {
        if (n % len == 0) {
            bool valid = true;
            for (int i = len; i < n; i += len) {
                if (z[i] < len && z[i] != n - i) {
                    valid = false;
                    break;
                }
            }
            if (valid) {
                return s.substr(0, len) + " x " + to_string(n / len);
            }
        }
    }
    
    return s;
}

Z-Function vs KMP Comparison

AspectZ-FunctionKMP (Prefix Function)
DefinitionMatch with prefix from position iLongest proper prefix = suffix
Value at iLength of match starting at iLength of matching prefix-suffix
DirectionForward matchingBackward reference
Use casePattern at startPattern at end of current

Complexity

OperationTimeSpace
Build Z-arrayO(n)O(n)O(n)O(n)
Pattern searchO(n+m)O(n + m)O(n+m)O(n + m)

Tips

  1. Choose Z or KMP - both can do similar things, but some problems are easier with one
  2. Sentinel character - use a separator not in the alphabet
  3. Z-box optimization - makes the algorithm linear time