You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 39 Next »

Regular expressions are widely used to match strings of text. For example, the POSIX grep utility supports regular expressions for finding patterns in the specified text. For introductory information on regular expressions, see the Java Tutorials [[Tutorials 08]]. The java.util.regex package provides the Pattern class that encapsulates a compiled representation of a regular expression and the Matcher class that is an engine which interprets and uses a Pattern to perform matching operations on a CharacterSequence.

The powerful regular expression (regex) facilities must be protected from misuse. An attacker may supply a malicious input that modifies the original regular expression in such a way that the regex fails to comply with the program's specification. This attack vector, referred to as a regex injection, might affect control flow, cause information leaks, or result in denial of service vulnerabilities (DoS).

Certain constructs and properties of Java regular expressions are susceptible to exploitation:

  • Matching flags: Untrusted inputs may override matching options that may or may not have been passed to the Pattern.compile() method.
  • Greediness: An untrusted input may attempt to inject a regex that changes the original regex to match as much of the string as possible, exposing sensitive information.
  • Grouping: The programmer can enclose parts of a regular expression in parentheses to perform some common action on the group. An attacker may be able to change the groupings by supplying untrusted input, leading to the security weaknesses described earlier.

Untrusted input should be sanitized before use to prevent regex injection. When the user must specify a regex as input, care must be taken to ensure that the original regex cannot be modified without restriction. White-listing characters (such as letters and digits) before delivering the user supplied string to the regex parser is a good input validation strategy. However, when the user is allowed to enter regexes, the white-list may need to permit certain dangerous characters. These inputs should not be used to build a security sensitive dynamic regex. A programmer must provide only a very limited subset of regular expression functionality to the user to minimize any chance of misuse.

Noncompliant Code Example

This noncompliant code example searches a log file of previous searches for keywords that match a regular expression to present search suggestions to the user. The suggestSearches() method is repeatedly called to provide suggestions for the user for completion of the search text. The full log of previous searches is stored in the logBuffer object. The strings in logBuffer are periodically copied to the log object for use in searchSuggestions().

import java.util.HashSet;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public final class ExploitableLog {
  private static final StringBuilder logBuffer = new StringBuilder();
  private static String log = logBuffer.toString();

  static {
    // this is supposed to come from a file, but its here as a string for
    // illustrative purposes. Each line's format is: name,id,timestamp
    append("Alice,1267773881,2147651408\n");
    append("Bono,1267774881,2147351708\n");
    append("Charles,1267775881,1175523058\n");
    append("Cecilia,1267773222,291232332\n");
  }
      
  private static void append(CharSequence str) {
    logBuffer.append(str);
    log = logBuffer.toString(); // update log string on append
  }

  public static Set<String> suggestSearches(String search) {
    Set<String> searches = new HashSet<String>();

    // Construct regex from user string
    String regex = "^(" + search + ".*),[0-9]+?,[0-9]+?$";
    int flags = Pattern.MULTILINE;
    Pattern keywordPattern = Pattern.compile(regex, flags);

    // Match regex
    Matcher logMatcher = keywordPattern.matcher(log);
    while (logMatcher.find()) {
      String found = logMatcher.group(1);
      searches.add(found);
    }

    return searches;
  }
}

The regex used to search the log is:

^^(" + search + ".*),[0-9]+?,[0-9]+?$"

This regex matches against an entire line of the log and searches for old searches beginning with the entered keyword. The anchoring operators and use of the reluctance operators mitigate some greediness concerns. The grouping characters allow the program to grab only the keyword while still matching the IP and timestamp. Because the log String contains multiple lines, the MULTILINE flag must be active to force the anchoring operators to match against newlines. By all appearances, this is a strong regex.

However, this class does not sanitize the incoming regular expression, and as a result, exposes too much information from the log file to the user.

A non-malicious use of the searchSuggestions() method would be to enter "C" to match "Charles" and "Cecilia". However, a malicious user could enter

 ?:)(^.*,[0-9]+?,[0-9]+?$)|(?:

which grabs the entire log line rather than just the old keywords. The first close parentheses of the malicious search string defeats the grouping protection. Using the OR operator allows injection of any arbitrary regex. Now this regex will reveal all IPs and timestamps of past searches.

Compliant Solution

One method of preventing this vulnerability is to filter out the sensitive information prior to matching and then running the user-supplied regex against the remaining non-sensitive information. However, if the log format changes without a corresponding change in the class, sensitive information may be exposed. Furthermore, depending on how encapsulated the search keywords are, a malicious user may be able to grab a list of all the keywords. (If there are a lot of keywords, this may cause a denial of service.)

This compliant solution filters out non-alphanumeric characters from the search string using Java's Character.isLetterOrDigit(). This removes the grouping parentheses and the OR operator which triggers the injection.

import java.util.HashSet;
import java.util.Set;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public final class FilteredLog {
  // ...

  public static Set<String> suggestSearches(String search) {
    Set<String> searches = new HashSet<String>();

    // Filter bad chars from user input
    StringBuilder sb = new StringBuilder(search.length());
    for (int i = 0; i < search.length(); ++i) {
      char ch = search.charAt(i);
      if (Character.isLetterOrDigit(ch) ||
          ch == ' ' ||
          ch == '\'') {
        sb.append(ch);
      }
    }
    search = sb.toString();

    // Construct regex from user string
    String regex = "^(" + search + ".*),[0-9]+?,[0-9]+?$";
    int flags = Pattern.MULTILINE;
    Pattern keywordPattern = Pattern.compile(regex, flags);

    // Match regex
    Matcher logMatcher = keywordPattern.matcher(log);
    while (logMatcher.find()) {
      String found = logMatcher.group(1);
      searches.add(found);
    }

    return searches;
  }
}

Risk Assessment

Violating this guideline may result in sensitive information disclosure.

Rule

Severity

Liklihood

Remediation Cost

Priority

Level

IDS18-J

medium

probable

high

P8

L2

References

[[Tutorials 08]] Regular Expressions
[[MITRE 09]] CWE ID 625 "Permissive Regular Expressions"
[[CVE 05]] CVE-2005-1949

  • No labels