CERT
Skip to end of metadata
Go to start of metadata

Heap pollution occurs when a variable of a parameterized type references an object that is not of that parameterized type. (For more information on heap pollution, see The Java Language Specification (JLS), §4.12.2, "Variables of Reference Type" [JLS 2015].)

Mixing generically typed code with raw typed code is one common source of heap pollution. Generic types were unavailable prior to Java 5, so popular interfaces such as the Java Collection Framework relied on raw types. Mixing generically typed code with raw typed code allowed developers to preserve compatibility between nongeneric legacy code and newer generic code but also gave rise to heap pollution. Heap pollution can occur if the program performs some operation involving a raw type that would give rise to a compile-time unchecked warning.

When generic and nongeneric types are used together correctly, these warnings can be ignored; at other times, these warnings can denote potentially unsafe operations. Mixing generic and raw types is allowed provided that heap pollution does not occur. For example, consider the following code snippet.

In some cases, it is possible that a compile-time unchecked warning will not be generated. According to the JLS, §4.12.2, "Variables of Reference Type" [JLS 2015]:

Note that this does not imply that heap pollution only occurs if an unchecked warning actually occurred. It is possible to run a program where some of the binaries were compiled by a compiler for an older version of the Java programming language, or by a compiler that allows the unchecked warnings to [be] suppressed. This practice is unhealthy at best.

Heap pollution can also occur if the program aliases an array variable of non-reifiable element type through an array variable of a supertype that is either raw or nongeneric. 

Noncompliant Code Example

This noncompliant code example compiles but results in heap pollution. The compiler produces an unchecked warning because a raw argument (the obj parameter in the addToList() method) is passed to the List.add() method. 

Heap pollution is possible in this case because the parameterized type information is discarded before execution. The call to addToList(list, 42) succeeds in adding an integer to list, although it is of type List<String>. This Java runtime does not throw a ClassCastException until the value is read and has an invalid type (an int rather than a String). In other words, the code throws an exception some time after the execution of the operation that actually caused the error, complicating debugging.

Even when heap pollution occurs, the variable is still guaranteed to refer to a subclass or subinterface of the declared type but is not guaranteed to always refer to a subtype of its declared type. In this example, list does not refer to a subtype of its declared type (List<String>) but only to the subinterface of the declared type (List).

Compliant Solution (Parameterized Collection)

This compliant solution enforces type safety by changing the addToList() method signature to enforce proper type checking:

The compiler prevents insertion of an object to the parameterized list because addToList() cannot be called with an argument whose type produces a mismatch. This code has consequently been changed to add a String instead of an int to the list.

Compliant Solution (Legacy Code)

The previous compliant solution eliminates use of raw collections, but implementing this solution when interoperating with legacy code may be infeasible.

Suppose that the addToList() method is legacy code that cannot be changed. The following compliant solution creates a checked view of the list by using the Collections.checkedList() method. This method returns a wrapper collection that performs runtime type checking in its implementation of the add() method before delegating to the back-end List<String>. The wrapper collection can be safely passed to the legacy addToList() method.

The compiler still issues the unchecked warning, which may still be ignored. However, the code now fails when it attempts to add the integer to the list, consequently preventing the program from proceeding with invalid data.

Noncompliant Code Example

This noncompliant code example compiles and runs cleanly because it suppresses the unchecked warning produced by the raw List.add() method. The printNum() method intends to print the value 42, either as an int or as a double depending on the type of the variable type.

However, despite list being correctly parameterized, this method prints 42 and never 42.0 because the int value 42 is always added to list without being type checked. This code produces the following output:

Compliant Solution (Parameterized Collection)

This compliant solution generifies the addToList() method, eliminating any possible type violations:

This code compiles cleanly and produces the correct output:

If the method addToList() is externally defined (such as in a library or as an upcall method) and cannot be changed, the same compliant method printNum() can be used, but no warnings result if addToList(list, 42) is used instead of addToList(list, 42.0). Great care must be taken to ensure type safety when generics are mixed with nongeneric code.

Noncompliant Code Example (Variadic Arguments)

Heap pollution can occur without using raw types such as java.util.List. This noncompliant code example builds a list of lists of strings before passing it to a modify() method. Because this method is variadic, it casts list into an array of lists of strings. But Java is incapable of representing the types of parameterized arrays. This limitation allows the modify() method to sneak a single integer into the list. Although the Java compiler emits several warnings, this program compiles and runs until it tries to extract the integer 42 from a List<String>.

This program produces the following output:

Noncompliant Code Example (Array of Lists of Strings)

This noncompliant code example is similar, but it uses an explicit array of lists of strings as the single parameter to modify(). The program again dies with a ClassCastException from the integer 42 injected into a list of strings.

Compliant Solution (List of Lists of Strings)

This compliant solution uses a list of lists of strings as the argument to modify(). This type safety enables the compiler to prevent the modify() method from injecting an integer into the list. In order to compile, the modify() method instead inserts a string, preventing heap pollution.

Note that to avoid warnings, we cannot use Arrays.asList() to build a list of lists of strings because that method is also variadic and would produce a warning about variadic arguments being parameterized class objects.

Risk Assessment

Mixing generic and nongeneric code can produce unexpected results and exceptional conditions.

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

OBJ03-J

Low

Probable

Medium

P4

L3

Automated Detection

ToolVersionCheckerDescription
Parasoft Jtest9.5CODSTA.EPC.AGBPTImplemented

Bibliography

[Bloch 2008]

Item 23, "Don't Use Raw Types in New Code"

[Bloch 2007] 

[Bloch 2005]

Puzzle 88, "Raw Deal"

[Darwin 2004]

Section 8.3, "Avoid Casting by Using Generics"

[JavaGenerics 2004]

 

[Java Tutorials]

"Heap Pollution"

[JLS 2015]

§4.8, "Raw Types"
§4.12.2, "Variables of Reference Type"
Chapter 5, "Conversions and Promotions"

§5.1.9, "Unchecked Conversion"

[Langer 2008]

Topic 3, "Coping with Legacy"

[Naftalin 2006]

Chapter 8, "Effective Generics"

[Naftalin 2006b]

"Principle of Indecent Exposure"

[Schildt 2007]

"Create a Checked Collection"

 


7 Comments

  1. Tim,

    • Bounded wildcards do increase flexibility, but can the 'clumsy workarounds' be security threats or result in unexpected behavior?
    • Can there be any security ramifications of using "? extends T" instead of "? super T"? or vice versa? Are these detected by the compiler?
  2. This sounds like a bit better footing, as mixing generic and non-generic code usually indicates a bad design. But can it be insecure? That is, can you provide a code sample that compiles, yet produces surprising output? That will make or break this rule.

    1. Good, the NCCE does illustrate a potential security flaw (bad data can sneak onto a 'good' container). So this is a valid rule. My only comments is to fill in the TODO sections (and provide a reference or two).

  3. The title has definitely changed since my last visit and I see a concrete recommendation proposal here. (smile)

    A quick glance at the NCE reveals that you're doing a "addToList(list, 1);" (passing '1') in printOne(). Did you intend to pass the variable 'type'? If yes, what is the unexpected output? (ir prints 1.0,1.0,1,1). If not, do explain in detail what the code is doing in the NCE.

    I would try inserting objects of type 'a' in a list that accepts the type 'b'. That should pass compiler checks but when you try to retrieve the type 'b' by casting to its type while iterating through the list, an exception would occur since it didn't expect the type 'a' to be present in the list. That's one way to look at it and may require a static field List in this example; There are other hazards too that you could explore, if you wish to but I suspect one solid one will see you home. Good luck!

  4. I'm a litle unsure about the code examples because they suppress warnings. I know we don't have a rule forbidding that; maybe we sould. One could argue that the NCCE's problem is that it suppresses those warnings, not that it violates them.

    So I'm not certain this rule has a future. But I do think it is complete for this assignment.

    1. According to JLS -

      Note that this does not imply that heap pollution only occurs if an unchecked warning actually occurred. It is possible to run a program where some of the binaries were compiled by a compiler for an older version of the Java programming language, or by a compiler that allows the unchecked warnings to suppressed. This practice is unhealthy at best.

      This implies that you cannot bank upon the warnings for this.

  5. This rule used to be titled: "Do not mix generic with nongeneric raw types in new code"
    The rule forbade using generic and raw types in the same package (it was allowed in different packages).
    Allowing in different packages was an attempt to cater to legacy code, but in our experience,
    legacy code mixes raw & generic types, so our catering attempt failed.
    Besides, raw & generic types is not a problem unless you get heap pollution.