ICPC Competitive Programming

String Hashing

String Hashing technique for fast string comparison

String Hashing

String Hashing is a technique that converts strings to numbers (hash values) enabling fast string comparison in O(1)O(1).

Main Concept

Convert string ss to a number using the formula:

H(s)=(s0pn1+s1pn2++sn1p0)modmH(s) = (s_0 \cdot p^{n-1} + s_1 \cdot p^{n-2} + \ldots + s_{n-1} \cdot p^0) \mod m

Where:

  • pp = base (commonly 31 or 37)
  • mm = modulo (commonly a large prime)
  • sis_i = ASCII value of character at position ii

Polynomial Rolling Hash

To compute hash of any substring in O(1)O(1), we precompute prefix hash:

prefix[i]=s0pi1+s1pi2++si1prefix[i] = s_0 \cdot p^{i-1} + s_1 \cdot p^{i-2} + \ldots + s_{i-1}

Hash of substring s[l..r]s[l..r]:

H(s[l..r])=prefix[r+1]prefix[l]prl+1H(s[l..r]) = prefix[r+1] - prefix[l] \cdot p^{r-l+1}

Implementation

const long long MOD = 1e9 + 7;
const long long BASE = 31;

struct StringHash {
    vector<long long> h, p;
    int n;
    
    StringHash(const string& s) {
        n = s.size();
        h.resize(n + 1);
        p.resize(n + 1);
        
        p[0] = 1;
        h[0] = 0;
        
        for (int i = 0; i < n; i++) {
            p[i + 1] = (p[i] * BASE) % MOD;
            h[i + 1] = (h[i] * BASE + (s[i] - 'a' + 1)) % MOD;
        }
    }
    
    // Get hash of s[l..r] (0-indexed, inclusive)
    long long getHash(int l, int r) {
        long long res = (h[r + 1] - h[l] * p[r - l + 1] % MOD + MOD) % MOD;
        return res;
    }
    
    // Compare s[l1..r1] with s[l2..r2]
    bool compare(int l1, int r1, int l2, int r2) {
        return getHash(l1, r1) == getHash(l2, r2);
    }
};

Double Hashing

To reduce collision probability, use 2 hash functions simultaneously:

struct DoubleHash {
    StringHash h1, h2;
    
    DoubleHash(const string& s) : h1(s, 31, 1e9+7), h2(s, 37, 1e9+9) {}
    
    pair<long long, long long> getHash(int l, int r) {
        return {h1.getHash(l, r), h2.getHash(l, r)};
    }
};

Usage Examples

1. Count Distinct Substrings of Length K

int countDistinct(const string& s, int k) {
    StringHash sh(s);
    set<long long> seen;
    
    for (int i = 0; i + k <= s.size(); i++) {
        seen.insert(sh.getHash(i, i + k - 1));
    }
    
    return seen.size();
}

2. Pattern Matching (Rabin-Karp)

vector<int> rabinKarp(const string& text, const string& pattern) {
    StringHash textHash(text);
    StringHash patternHash(pattern);
    
    vector<int> matches;
    long long patHash = patternHash.getHash(0, pattern.size() - 1);
    
    for (int i = 0; i + pattern.size() <= text.size(); i++) {
        if (textHash.getHash(i, i + pattern.size() - 1) == patHash) {
            matches.push_back(i);
        }
    }
    
    return matches;
}

Complexity

OperationTimeSpace
PreprocessingO(n)O(n)O(n)O(n)
Query hashO(1)O(1)-
Compare substringsO(1)O(1)-

Tips

  1. Use Double Hashing to reduce collisions
  2. Choose appropriate base and mod - base should be larger than alphabet size, mod should be a large prime
  3. Watch out for overflow - use unsigned long long or __int128
  4. Birthday Paradox - collision probability increases with more hash values