IDS02-J. Normalize strings before validating them

Many applications that accept untrusted input strings employ input filtering and validation mechanisms based on the strings' character data.

For example, an application's strategy for avoiding Cross Site Scripting (XSS) vulnerabilities may include forbidding <script> tags in inputs. Such black-listing mechanisms are a useful part of a security strategy, even though they are insufficient for complete input validation and sanitization. When implemented, this form of validation must be performed only after normalizing the input.

According to the Unicode Standard [[Davis 2008]], annex #15, Unicode Normalization Forms

When implementations keep strings in a normalized form, they can be assured that equivalent strings have a unique binary representation.

Normalization Forms KC and KD must not be blindly applied to arbitrary text. Because they erase many formatting distinctions, they will prevent round-trip conversion to and from many legacy character sets, and unless supplanted by formatting markup, they may remove distinctions that are important to the semantics of the text. It is best to think of these Normalization Forms as being like uppercase or lowercase mappings: useful in certain contexts for identifying core meanings, but also performing modifications to the text that may not always be appropriate. They can be applied more freely to domains with restricted character sets ...

The most suitable normalization form for performing input validation on arbitrarily-encoded strings is KC (NFKC), because normalizing to KC transforms the input into an equivalent canonical form that can be safely compared with the required input form.

Another domain where normalization is required before validation is in sanitizing untrusted path names in a file system. This is addressed by rule IDS21-J. Canonicalize path names before validating them.

Noncompliant Code Example

This noncompliant code example attempts to validate the String before performing normalization. Consequently, the validation logic fails to detect inputs that should be rejected, because the check for angle brackets fails to detect alternative Unicode representations.

// String s may be user controllable
// \uFE64 is normalized to < and \uFE65 is normalized to > using NFKC
String s = "\uFE64" + "script" + "\uFE65"; 

// Validate
Pattern pattern = Pattern.compile("[<>]"); // Check for angle brackets
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
  // Found black listed tag
  throw new IllegalStateException();
} else {
  // ... 
}

// Normalize
s = Normalizer.normalize(s, Form.NFKC);

The normalize method transforms Unicode text into an equivalent composed or decomposed form, allowing for easier searching of text. The normalize method supports the standard normalization forms described in Unicode Standard Annex #15 â” Unicode Normalization Forms.

Compliant Solution

This compliant solution normalizes the string before validating it. Alternative representations of the string are normalized to the canonical angle brackets. Consequently, input validation correctly detects the malicious input and throws an IllegalStateException.

String s = "\uFE64" + "script" + "\uFE65";

// normalize
s = Normalizer.normalize(s, Form.NFKC); 

//validate
Pattern pattern = Pattern.compile("[<>]"); 
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
  // Found black listed tag
  throw new IllegalStateException();
} else {
  // ... 
}

Risk Assessment

Validating input before normalization affords attackers the opportunity to bypass filters and other security mechanisms. This can result in the execution of arbitrary code.

Rule	Severity	Likelihood	Remediation Cost	Priority	Level
IDS02-J	high	probable	medium	P12	L1

Related Vulnerabilities

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

Related Guidelines

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="37c06139-ff03-4f11-886b-d6ae77ffec80"><ac:plain-text-body><![CDATA[	[[MITRE 2009	AA. Bibliography#MITRE 09]]	[CWE ID 289	http://cwe.mitre.org/data/definitions/289.html] "Authentication Bypass by Alternate Name" ]]></ac:plain-text-body></ac:structured-macro>
	CWE ID 180 "Incorrect Behavior Order: Validate Before Canonicalize"

Bibliography

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="799233af-cac2-495f-a4bc-7d9bda22d961"><ac:plain-text-body><![CDATA[	[[API 2006	AA. Bibliography#API 06]]	]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="ef8af890-3d17-47fb-a56e-88ba8257c87b"><ac:plain-text-body><![CDATA[	[[Davis 2008	AA. Bibliography#Davis 08]]	]]></ac:plain-text-body></ac:structured-macro>
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="e711890c-fec4-45f8-b0ce-c6dce3492d2d"><ac:plain-text-body><![CDATA[	[[Weber 2009	AA. Bibliography#Weber 09]]	]]></ac:plain-text-body></ac:structured-macro>

IDS01-J. Sanitize untrusted data passed across a trust boundary IDS03-J. Sanitize non-character code points before performing other sanitization

Space shortcuts

Page tree