codingcircle » 10. Regular Expression Matching

Given an input string s and a pattern p, implement regular expression matching with support for '.' and '*' where:

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

Example 1:

Input: s = "aa", p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".

Example 2:

Input: s = "aa", p = "a*"
Output: true
Explanation: '*' means zero or more of the preceding element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".

Example 3:

Input: s = "ab", p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".

Constraints:

1 <= s.length <= 20
1 <= p.length <= 20
s contains only lowercase English letters.
p contains only lowercase English letters, '.', and '*'.
It is guaranteed for each appearance of the character '*', there will be a previous valid character to match.

Problem Description

This problem asks you to implement a function that determines if the given input string s matches the given pattern p. The pattern p can include two special characters:

A period/dot (.) which matches any single character.
An asterisk (*) which matches zero or more of the element right before it.

The goal is to check if the pattern p matches the entire string s, not just a part of it. That means we need to see if we can navigate through the entire string s using the rules defined by the pattern.

Intuition

The intuition behind the provided solution is using dynamic programming to iteratively build up a solution. We create a 2D table f where f[i][j] will represent whether the first i characters of s match the first j characters of p.

The approach is as follows:

Initialize the table with False, and set f[0][0] to True because an empty string always matches an empty pattern.
Iterate over each character in the string s and the pattern p, and update the table based on the following rules:
- If the current character in p is *, we check two things: a. If the pattern without this star and its preceding element matches the current string s up to i, i.e., f[i][j] = f[i][j - 2]. b. If the element before the star can be matched to the current character in s (either it’s the same character or it’s a .), and if the pattern p up to the current point matches the string s up until the previous character, i.e., f[i - 1][j].
- If the current character in p is . or it matches the current character in s, we just carry over the match state from the previous characters, i.e., f[i][j] = f[i - 1][j - 1].
At the end, f[m][n] contains the result, which tells us if the whole string s matches the pattern p, where m and n are the lengths of s and p, respectively.

The key here is to realize that the problem breaks down into smaller subproblems. If we know how smaller parts of the string and pattern match, we can use those results to solve for larger parts. This is a classic dynamic programming problem where optimal substructure (the problem can be broken down into subproblems) and overlapping subproblems (calculations for subproblems are reused) are the main components.

Solution Approach

The solution involves dynamic programming – a method for solving complex problems by breaking them down into simpler subproblems. The key to this solution is a 2D table f with the dimensions (m + 1) x (n + 1), where m is the length of the string s and n is the length of the pattern p. This table helps in storing the results of subproblems so they can be reused when necessary.

The algorithm proceeds as follows:

Initialize the DP Table: Create a boolean DP table f where f[i][j] is True if the first i characters of s (sub-s) match the first j characters of p (sub-p), and False otherwise. We initialize the table with False and set f[0][0] to True to represent that empty s matches empty p.
Handle Empty Patterns: Due to the nature of the * operator, a pattern like “a*” can match an empty sequence. We iterate over the pattern p and fill in f[0][j] (the case where s is empty). For example, if p[j-1] is *, then we check two characters back and if f[0][j-2] is True, then f[0][j] should also be True.
Fill the Table: The main part of the algorithm is to iterate over each character in s and p and decide the state of f[i][j] based on the last character of the sub-pattern p[0...j]:a. If the last character of sub-p is *, there are two subcases:
- The star can be ignored (match 0 of the preceding element). This means if the pattern matches without the last two characters (* and its preceding element), the current state should be True (f[i][j] = f[i][j-2]).
- The star contributes to the match (match 1 or more of the preceding element). This happens if the character preceding * is the same as the last character in sub-s or if it’s a dot. If f[i - 1][j] is True, we can also set f[i][j] to True (f[i][j] |= f[i - 1][j]).
b. If the last character of sub-p is not *, we check if it’s a dot or a matching character:
- If the characters match or if the character in p is . (which matches any character), the current state depends on the previous state without these two characters: f[i][j] = f[i - 1][j - 1].
Return the Result: Once the table is filled, the answer to whether s matches p is stored in f[m][n], because it represents the state of the entire string s against the entire pattern p.

In essence, the solution uses a bottom-up approach to fill the DP table, starting from an empty string/pattern and building up to the full length of s and p. The transition between the states is determined by the logic that considers the current and previous characters of p and their meaning based on regex rules.

Example Walkthrough

Let’s take a small example to illustrate the approach described above. Consider s = "xab" and p = "x*b.". We want to determine if the pattern matches the string.

Initialize the DP Table: We create a table f where f[i][j] will be True if the first i characters of s (sub-s) match the first j characters of p (sub-p). The table has dimensions (len(s) + 1) x (len(p) + 1), which is (4 x 4):01230TFFF1F2F3FHere, T denotes True, and F denotes False. f[0][0] is True because an empty string matches an empty pattern.
Handle Empty Patterns: We iterate over p and update f[0][j]. Since p[1] is *, we can ignore “x*” for an empty s, so f[0][2] becomes True:01230TFTF1F2F3F
Fill the Table: Now, we iterate over the string s and pattern p.
- For i = 1 and j = 1, s[0] matches p[0] (‘x’ == ‘x’). So f[1][1] = f[0][0] which is True.For i = 1 and j = 2, we have a *. As per the rules, we check f[1][0] (ignoring the star completely) which is False, so f[1][2] remains False.However, since p[1] is *, and ‘x’ can match ‘x’, we also check f[1 - 1][2] which is True. Hence, f[1][2] is True.For i = 1 and j = 3, we move to the next character because p[2] is not a special character and it does not match ‘x’. Hence, f[1][3] remains False.For i = 2 and j = 2, we have a *. The preceding letter ‘x’ can match ‘x’, so we check f[2 - 1][2] which is True, and hence f[2][2] is True.For i = 2 and j = 3, p[2] is '.' and it matches any character, while f[1][2] is True. Therefore, f[2][3] is True.For i = 3 and j = 2, we have a *. We consider matching zero or multiple ‘x’. Since f[2][2] is True, and ‘x’ can match ‘x’, f[3][2] becomes True.For i = 3 and j = 3, p[2] is '.' and it matches any character, so f[3][3] = f[2][2], hence f[3][3] is True.
The final table looks as follows:01230TFTF1FTTF2FFTT3FFTT
Return the Result: The answer is stored in f[m][n], which is f[3][3]. It is True, so s matches p.

By setting up this table and following the rules, we can confidently say that “xab” matches the pattern “x*b.”.

Solution Implementation

class Solution {
public:
    bool isMatch(string s, string p) {
        int n = s.length(), m = p.length();
        
        // Create a 2D dp table where dp[i][j] indicates if s[0...i-1] matches p[0...j-1]
        bool dp[n+1][m+1];
        memset(dp, false, sizeof(dp));  // Initialize all dp values to false
        dp[0][0] = true;  // Base case: empty string matches empty pattern
        
        // Fill the dp table for all substrings of s and p
        for(int i=0; i<=n; i++){
            for(int j=1; j<=m; j++){
                if(p[j-1] == '*'){  // Case when the pattern contains '*'
                    // '*' can match zero occurrence (dp[i][j-2]) or one/more occurrences of the previous character (dp[i-1][j])
                    dp[i][j] = dp[i][j-2] || (i > 0 && (s[i-1] == p[j-2] || p[j-2] == '.') && dp[i-1][j]);
                }
                else{  // Case when the pattern contains a normal character or '.'
                    dp[i][j] = i > 0 && dp[i-1][j-1] && (s[i-1] == p[j-1] || p[j-1] == '.');
                }
            }
        }
        
        // Return the result for the entire string and pattern
        return dp[n][m];
    }
};

Copied!

enum Result {
    TRUE, FALSE
}

class Solution {
    // Memoization table to store results of subproblems
    Result[][] memo;

    // Main function to check if text matches the pattern
    public boolean isMatch(String text, String pattern) {
        // Initialize memoization table with dimensions of text and pattern lengths
        memo = new Result[text.length() + 1][pattern.length() + 1];
        // Call the dp helper function to start the recursive check
        return dp(0, 0, text, pattern);
    }

    // Recursive helper function with memoization
    public boolean dp(int i, int j, String text, String pattern) {
        // If the result is already computed, return it from memo table
        if (memo[i][j] != null) {
            return memo[i][j] == Result.TRUE;
        }

        boolean ans;
        // If we have reached the end of the pattern
        if (j == pattern.length()){
            // The match is true if we have also reached the end of the text
            ans = i == text.length();
        } else {
            // Check if the current character matches or if it's a '.'
            boolean first_match = (i < text.length() &&
                                   (pattern.charAt(j) == text.charAt(i) ||
                                    pattern.charAt(j) == '.'));

            // If the next character in the pattern is '*'
            if (j + 1 < pattern.length() && pattern.charAt(j+1) == '*'){
                // '*' can match zero occurrence or one/more occurrences of the previous character
                ans = (dp(i, j+2, text, pattern) ||  // '*' matches zero occurrence
                       first_match && dp(i+1, j, text, pattern)); // '*' matches one or more occurrences
            } else {
                // If the characters match, move both indices forward
                ans = first_match && dp(i+1, j+1, text, pattern);
            }
        }
        // Store the result in memoization table
        memo[i][j] = ans ? Result.TRUE : Result.FALSE;
        return ans;
    }
}

Copied!

class Solution:
 def isMatch(self, s: str, p: str) -> bool:
     # Initialize pointers for the end of both the string and pattern
     i, j = len(s) - 1, len(p) - 1
     # Start backtracking with an empty cache
     return self.backtrack({}, s, p, i, j)
 def backtrack(self, cache, s, p, i, j):
     # Create a unique key for caching based on current positions i and j
     key = (i, j)
     # If the result is already computed, return the cached value
     if key in cache:
         return cache[key]
     # If both string and pattern are exhausted, it's a match
     if i == -1 and j == -1:
         cache[key] = True
         return True
     # If string is exhausted but pattern is not, it's not a match
     if i != -1 and j == -1:
         cache[key] = False
         return cache[key]
     # Handle case where string is exhausted and pattern ends with '*'
     if i == -1 and p[j] == '*':
         k = j
         # Move backward to handle zero '*' matches
         while k != -1 and p[k] == '*':
             k -= 2   
         # If '*' matches the whole pattern, it's a match
         if k == -1:
             cache[key] = True
             return cache[key]  
         # If no match, return False
         cache[key] = False
         return cache[key] 
     # If string is exhausted and pattern does not end with '*', it's not a match
     if i == -1 and p[j] != '*':
         cache[key] = False
         return cache[key]
     # Handle '*' in the pattern
     if p[j] == '*':
         # Check two cases: '*' matches zero characters or one/more characters
         if self.backtrack(cache, s, p, i, j - 2):
             cache[key] = True
             return cache[key]  
         # If previous character matches the current one in the string or pattern
         if p[j - 1] == s[i] or p[j - 1] == '.':
             if self.backtrack(cache, s, p, i - 1, j):
                 cache[key] = True
                 return cache[key]
     # Handle regular character match or '.' wildcard match
     if p[j] == '.' or s[i] == p[j]:
         if self.backtrack(cache, s, p, i - 1, j - 1):
             cache[key] = True
             return cache[key]
     # If no match found, store False in cache and return
     cache[key] = False
     return cache[key]

Copied!

#include <stdbool.h>
#include <string.h>
int memo[21][21]; // memo[i][j] - stores result of < isMatch(s[i..], p[j..]) >;
// isMatchHelper(i, j) - whether s[i..] matches p[j..];
bool isMatchHelper(int i, int j, char *s, char *p) {
    if (memo[i][j] != -1) {  // If we've already computed this subproblem, return the result
        return memo[i][j];
    }
    bool ans;
    if (p[j] == '\0') {  // If we've reached the end of the pattern
        ans = (s[i] == '\0'); // Both strings must also be at the end to match
    } else {
        // Check if the first characters match, or if the pattern character is a wildcard '.'
        bool first_match = (s[i] != '\0' && (p[j] == s[i] || p[j] == '.'));

        if (p[j + 1] == '*') {  // If the next character in the pattern is a '*'
            // Two cases: 
            // 1. Skip the '*' and the preceding character (i.e., zero occurrence)
            // 2. Match the current character and recursively check for one or more occurrences
            ans = isMatchHelper(i, j + 2, s, p) || (first_match && isMatchHelper(i + 1, j, s, p));
        } else {  // No '*' after the current character in the pattern
            // We proceed to the next character in both the string and the pattern
            ans = first_match && isMatchHelper(i + 1, j + 1, s, p);
        }
    }

    memo[i][j] = ans;  // Store the result in the memoization table
    return ans;
}

bool isMatch(char *s, char *p) {
    memset(memo, -1, sizeof(memo));  // Initialize memoization table to -1 (uncomputed state)
    return isMatchHelper(0, 0, s, p);  // Start checking from the beginning of both the string and the pattern
}

Copied!

using System.Text.RegularExpressions;
public class Solution {
    public bool IsMatch(string s, string p) {
        // If the pattern contains '**' (double asterisks), return true as it's a special case
        if(p.Contains("**"))
            return true;

        // Use Regex to check if the string matches the pattern
        // The pattern is surrounded with '^' and '$' to ensure the whole string matches
        return Regex.IsMatch(s, "^"+p+"$");
    }
}

Copied!

const isMatch = (string, pattern) => {
    // early return when pattern is empty
    if (!pattern) {
        // returns true when string and pattern are empty
        // returns false when string contains chars with empty pattern
        return !string;
    }
    
    // check if the current char of the string and pattern match when the string has chars
    const hasFirstCharMatch = Boolean(string) && (pattern[0] === '.' || pattern[0] === string[0]);

    // track when the next character * is next in line in the pattern
    if (pattern[1] === '*') {
        // if next pattern match (after *) is fine with current string, then proceed with it (s, p+2).  That's because the current pattern may be skipped.
        // otherwise check hasFirstCharMatch. That's because if we want to proceed with the current pattern, we must be sure that the current pattern char matches the char
        // If hasFirstCharMatch is true, then do the recursion with next char and current pattern (s+1, p).  That's because current char matches the pattern char. 
        return (
            isMatch(string, pattern.slice(2)) || 
            (hasFirstCharMatch && isMatch(string.slice(1), pattern))
        );
    }
    
    // now we know for sure that we need to do 2 simple actions
    // check the current pattern and string chars
    // if so, then can proceed with next string and pattern chars (s+1, p+1)
    return hasFirstCharMatch ? isMatch(string.slice(1), pattern.slice(1)) : false;
};

Copied!

function isMatch(s: string, p: string): boolean {
  const memo: boolean[][] = [];
  // this is the main DP recursion. its the exact same as it would work recursively
  // except that we cache the results before returning it
  function isMatchDp(i:number,j:number): boolean {
    // if it exists in cache then return it
    if (getMemo(i,j) !== undefined) {
        return getMemo(i,j);
    }
    // BASE CASE:
    // if theres remaining values in string, but none in pattern then we didn't match everything return false
    if (i < s.length && j === p.length) return setMemo(i,j,false);
    // if theres nothing left to match then we've matched everything, return true
    if (i === s.length && j === p.length) return setMemo(i,j,true);
    // STAR CASE:
    if (p[j+1] === "*") {
      if (matchFirstChar(i, j)) {
        // if the chars match then either:
        // take one character and keep star
        // do nothing and remove star
        // if either recurses true, then we got at least one match, so the whole expression is valid
        return setMemo(i,j,isMatchDp(i+1, j) || isMatchDp(i, j+2));
      }
      // otherwise if nothing matches, then we remove star and continue
      return setMemo(i,j,isMatchDp(i, j+2));
    }
    // REGULAR CASE:
    if (matchFirstChar(i,j)) {
      // successful match, recurse
      return setMemo(i,j,isMatchDp(i+1, j+1));
    }
    // first characters never matched, return false
    return setMemo(i,j,false);
  }
  function getMemo(i:number, j:number): boolean {
    return memo?.[i]?.[j];
  }
  function setMemo(i:number, j:number, val:boolean): boolean {
      if (memo[i] === undefined) {
        memo[i] = [];
      }
      memo[i][j] = val;
      return val;
  }
  // if chars match either because they are the same or pattern is "."
  function matchFirstChar(i:number, j:number): boolean {
    if (s[i] === undefined) return false;
    if (p[j] === ".") return true;
    return s[i] === p[j];
  }
  return isMatchDp(0,0);
}

Copied!

class Solution {
    const WILDCARD = "*";  // Wildcard character representing "zero or more" of the preceding character
    const ANY_CHAR = ".";  // Represents any single character
    
    /**
     * @param String $string
     * @param String $pattern
     * @return Boolean
     */
    function isMatch($string, $pattern) 
    {
        // Base case: if the pattern is empty, return true if the string is also empty
        if (!$pattern) {
            return !$string;
        }
        
        // Check if the first characters of the string and pattern match, 
        // or if the pattern starts with a wildcard '.' (which matches any single character)
        $matchResult = $string && ($string[0] === $pattern[0] || self::ANY_CHAR === $pattern[0]);
        
        // Check if the second character of the pattern is a '*' (greedy match)
        $greedyMatch = !empty($pattern[1]) && $pattern[1] == self::WILDCARD;

        // If there is no match and no greedy match, return false
        if (!$matchResult && !$greedyMatch) {
            return false;
        }

        // If there is a greedy match ('*' found in pattern), try two options:
        // 1. The pattern '*' matches the first character of the string and recursively check the rest
        // 2. Skip the '*' and try matching the rest of the pattern
        if ($greedyMatch) {
            return ($matchResult && $this->isMatch(substr($string, 1), $pattern)) || $this->isMatch($string, substr($pattern, 2));
        }
        
        // If no greedy match, continue checking the rest of the string and pattern (move both one character forward)
        return $matchResult && $this->isMatch(substr($string, 1), substr($pattern, 1));
    }
}

Copied!

class Solution {
    // Main function to check if string `s` matches the pattern `p`
    func isMatch(_ s: String, _ p: String) -> Bool {
        
        // Initialize a 2D array 'visit' to store the results of subproblems
        var visit = [[Bool]]()
        let sLength = s.count, pCount = p.count
        
        // Create the 'visit' array with dimensions (sLength + 1) x (pCount + 1)
        // Fill the array with 'false' initially, using a default value
        for _ in 0...sLength + 1 {
            visit.append([Bool](repeating: false, count: pCount + 1))
        }
        
        // Set the base case for the bottom-right corner of the table to true
        visit[sLength][pCount] = true
        
        // Iterate backward through the string and pattern
        for i in stride(from: sLength, through: 0, by: -1) {
            for j in stride(from: pCount - 1, through: 0, by: -1) {
                
                // Convert string and pattern to arrays to access characters by index
                let arrS = Array(s), arrP = Array(p)
                
                // Check if the first characters match or the pattern has a wildcard '.'
                let first = i < sLength && (arrS[i] == arrP[j] || arrP[j] == ".")
                
                // If the next character in pattern is a '*', check both cases:
                // 1. Skip the '*' (move 2 steps ahead in pattern)
                // 2. If the first character matches, move 1 step ahead in string
                if j + 1 < pCount && arrP[j + 1] == "*" {
                    visit[i][j] = visit[i][j + 2] || first && visit[i + 1][j]
                } else {
                    // If no '*', just check if characters match and move both string and pattern forward
                    visit[i][j] = first && visit[i + 1][j + 1]
                }
            }
        }
        
        // Return the result of whether the whole string and pattern match
        return visit[0][0]
    }
}

Copied!

class Solution {
    // Main function to check if string `s` matches the pattern `p`
    fun isMatch(s: String, p: String): Boolean {
        // Create a memoization map to store results of subproblems
        val memo = mutableMapOf(Pair(0, 0) to true)
        
        // Recursive function to check if the substring s[0..i] matches p[0..j]
        fun rec(i: Int, j: Int): Boolean =
            // Base case: if both indices are out of bounds, it's a valid match
            i >= 0 && j >= 0 && memo.getOrPut(Pair(i, j)) {
                // If pattern character is '*'
                j > 0 && when (p[j - 1]) {
                    // If '*' is used, check two cases:
                    // 1. If we match current character in string and move one character in the string
                    // 2. If '*' means zero occurrences, we move 2 steps in pattern
                    '*' -> (i > 0 && rec(i - 1, j) && (p[j - 2] in setOf('.', s[i - 1]))) || rec(i, j - 2)
                    
                    // If pattern character is '.'
                    // Check if the previous character of both string and pattern match
                    '.' -> rec(i - 1, j - 1)
                    
                    // For regular characters, check if they match and recursively call for the rest of the string and pattern
                    else -> i > 0 && s[i - 1] == p[j - 1] && rec(i - 1, j - 1)
                }
            }
        
        // Start the recursion with the full length of the string and pattern
        return rec(s.length, p.length)
    }
}

Copied!

class Solution {
  // Function to check if the string `s` matches the pattern `p`
  bool isMatch(String s, String p) {
    // Base case: if pattern `p` is empty, the result depends on whether string `s` is also empty
    if (p.isEmpty) {
      return s.isEmpty;
    }

    // Check if the first characters of `s` and `p` match
    // A match occurs if they are equal, or if the pattern character is a wildcard '.'
    bool firstMatch = s.isNotEmpty && (p[0] == s[0] || p[0] == '.');

    // If the next character in the pattern is a '*', there are two possibilities:
    // 1. We skip the '*' and the preceding character in the pattern (p.substring(2))
    // 2. We match the current character in the string with the character before '*' in the pattern
    if (p.length >= 2 && p[1] == '*') {
      return (isMatch(s, p.substring(2)) || (firstMatch && isMatch(s.substring(1), p)));
    } else {
      // If the next character is not a '*', we proceed by checking if the first character matches
      // and recursively check the rest of the string and pattern
      return firstMatch && isMatch(s.substring(1), p.substring(1));
    }
  }
}

Copied!

func isMatch(s string, p string) bool {
    // 1. Create a 2D table to store results of subproblems
    table := make([][]bool, len(s)+1)
    for i := 0; i < len(table); i++ {
        table[i] = make([]bool, len(p)+1)
    }
    table[0][0] = true  // An empty pattern matches an empty string
    
    // 2. Handle patterns starting with '*' (can match empty sequence)
    for j := 2; j < len(p)+1; j++ {
        if p[j-1] == '*' {
            table[0][j] = table[0][j-2]  // '*' can cancel out the preceding character
        }
    }
    
    // 3. Iterate through the string and the pattern to fill the table
    for i := 1; i < len(s)+1; i++ {
        for j := 1; j < len(p)+1; j++ {
            // 4. If characters match or the pattern has '.', it carries forward the result from the previous state
            if s[i-1] == p[j-1] || p[j-1] == '.' {
                table[i][j] = table[i-1][j-1]
            } else if p[j-1] == '*' { 
                // 5. If the pattern has '*', check if it can match zero or more of the previous character
                empty := table[i][j-2]  // Case where '*' represents an empty sequence
                nonempty := (s[i-1] == p[j-2] || p[j-2] == '.') && table[i-1][j]  // Case where '*' matches one or more of the previous character
                table[i][j] = empty || nonempty
            }
        }
    }
    
    // 6. Return the result stored in the bottom-right corner of the table (final match state)
    return table[len(s)][len(p)]
}

Copied!

Time and Space Complexity

The time complexity of the provided code is O(m * n), where m is the length of the input string s and n is the length of the pattern p. This is because the solution iterates through all combinations of positions in s and p using nested loops.

In terms of space complexity, the code uses O(m * n) space as well due to the creation of a 2D array f that has (m + 1) * (n + 1) elements to store the state of matching at each step.

10. Regular Expression Matching

Example Walkthrough

One thought on “10. Regular Expression Matching”

Leave a Reply