The abstract InputStream.read() and Reader.read() methods are used to read a byte or character, respectively, from a stream. The  InputStream.read() method reads a single byte from an input source and returns its value as an int in the range 0 to 255 (0x00-0xff). The Reader.read() method reads a single character and returns its value as an int in the range 0 to 65,535 (0x0000-0xffff). Both methods return the 32-bit value -1 (0xffffffff) to indicate that the end of the stream has been reached and no data is available. The larger int size is used by both methods to differentiate between the end-of-stream indicator and the maximum byte (0xff) or character (0xffff) value. The end-of-stream indicator is an example of an in-band error indicator. In-band error indicators are problematic to work with, and the creation of new in-band-error indicators is discouraged.

Prematurely converting the resulting int to a byte or char before testing for the value −1 makes it impossible to distinguish between characters read and the end of stream indicator. Programs must check for the end of stream before narrowing the return value to a byte or char.

This rule applies to any method that returns the value −1 to indicate the end of a stream. It includes any InputStream or Reader subclass that provides an implementation of the read() method. This rule is a specific instance of NUM12-J. Ensure conversions of numeric types to narrower types do not result in lost or misinterpreted data.

Noncompliant Code Example (byte)

FileInputStream is a subclass of InputStream. It will return −1 only when the end of the input stream has been reached. This noncompliant code example casts the value returned by the read() method directly to a value of type byte and then compares this value with −1 in an attempt to detect the end of the stream.

FileInputStream in;
// Initialize stream 
byte data;
while ((data = (byte) in.read()) != -1) { 
  // ... 
}

If the read() method encounters a 0xFF byte in the file, this value becomes indistinguishable from the −1 value used to indicate the end of stream, because the byte value is promoted and sign-extended to an int before being compared with −1. Consequently, the loop will halt prematurely if a 0xFF byte is read.

Compliant Solution (byte)

Use a variable of type int to capture the return value of the byte input method. When the value returned by read() is not −1, it can be safely cast to type byte. When read() returns 0x000000FF, the comparison will test against 0xFFFFFFFF, which evaluates to false.

FileInputStream in;
// Initialize stream 
int inbuff;
byte data;
while ((inbuff = in.read()) != -1) { 
  data = (byte) inbuff;
  // ...  
}

Noncompliant Code Example (char)

FileReader is a subclass of InputStreamReader, which is in turn a subclass of Reader. It also returns −1 only when the end of the stream is reached. This noncompliant code example casts the value of type int returned by the read() method directly to a value of type char, which is then compared with −1 in an attempt to detect the end of stream. This conversion leaves the value of data as 0xFFFF (e.g., Character.MAX_VALUE) instead of −1. Consequently, the test for the end of file never evaluates to true.

FileReader in;
// Initialize stream 
char data;
while ((data = (char) in.read()) != -1) { 
  // ... 
}

Compliant Solution (char)

Use a variable of type int to capture the return value of the character input method. When the value returned by read() is not −1, it can be safely cast to type char.

FileReader in;
// Initialize stream 
int inbuff;
char data;
while ((inbuff = in.read()) != -1) { 
  data = (char) inbuff;
  // ...  
}

Risk Assessment

Historically, using a narrow type to capture the return value of a byte input method has resulted in significant vulnerabilities, including command injection attacks; see CA-1996-22 advisory. Consequently, the severity of this error is high.

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

FIO08-J

High

Probable

Medium

P12

L1

Automated Detection

Some static analysis tools can detect violations of this rule.

ToolVersionCheckerDescription
Parasoft Jtest
2023.1
CERT.FIO08.CRRVCheck the return value of methods which read or skip input
SpotBugs

4.6.0

EOS_BAD_END_OF_STREAM_CHECKImplemented (since 4.4.0)

Related Guidelines

Bibliography



6 Comments

  1. here is the code from the The Daily WTF CodeSOD, 2006-11-02
    (abridged)

    while (!_validConnection) {
      StringBuffer _stringBuffer = new StringBuffer();
    try {
      while (true) {
        char _char;
        _stringBuffer.append(
           _char = (char)_inputStream.read());
        if (_char == -1) break;
        else if (_char == '\r') {
          …break on "\r\n\r\n"… }
        }
      } catch (OutOfMemoryError error) {
        // received a bad response, try it again!
        continue;
    }
    
  2. The risk assessment of this rule doesn't look right. For the case highlighted by the code examples, the risk is denial of service (via infinite loop or exception). For the case mentioned in the CA-1996-22 advisory, the severity is high, but the probability seems really low. Either way, "high" and "probable" doesn't seem right.

    1. I think the Risk Assessment is correct, or at least reasonable. If a violation of this rule results in a vulnerability akin to CA-1996-22, then its very likely that command injection is possible and will be done. The probability that a violation of this rule results in command injection is debatable...no idea how you could state what the probability is in that case and defend it (w/o doing whole program analysis).

      Put another way, there are several valid 'probability' values depending on the context of the violation. The 'probable' case is the most severe, and is possible in the right context; that is why we list a violation as 'probable' in the Risk Assessment.

      1. Put another way, there are several valid 'probability' values depending on the context of the violation. The 'probable' case is the most severe, and is possible in the right context; that is why we list a violation as 'probable' in the Risk Assessment.

        True, but our likelihood rating is context-free: i.e., it ranges over all contexts in which a violation of this rule occurs. To quote the preface:

        How likely is it that a flaw introduced by violating the rule could lead to an exploitable vulnerability.

        The command-injection attack is quite a corner case for this rule. The scenario from the code examples seems much more likely.

        I don't mean to nitpick this particular rule – my point is broader to the ratings throughout the book. If likelihood is context dependent, I think it provides far less value to developers/project-managers as a high-level indicator of risk.

        That said, given that we don't claim to have any hard data to back up any of these Likelihood ratings, perhaps it doesn't really matter?

        1. Yes, the command-injection attack is a corner case. It is well-understood, more than the other cases we have here. I suspect your last statement is on the mark. That is, you need some hard data to convince me that the likelihood should be changed.

  3. Changed the title and description and moved back to FIO where it pretty obviously belongs.  Please review.