According to [API 06] Class InputStream
, read
method documentation:
[read()] Reads some number of bytes from the input stream and stores them into the buffer array b. The number of bytes actually read is returned as an integer. The number of bytes read is, at most, equal to the length of b.
Note that the read
methods will return as soon as they find some input data. By default, none of them guarantee that all the requested bytes will be read. It is left to the programmer to check the number of bytes read and call the read
method again as required. Ignoring the result returned by the read
methods is a direct violation of EXP02-J. Do not ignore values returned by methods.
Multibyte encodings like UTF-8 are used for character sets that require more than one byte to uniquely identify each constituting character. For example, the Japanese encoding Shift-JIS (shown below), supports multibyte encoding where the maximum character length is 2
bytes (one leading and one trailing byte).
Byte Type |
Range |
---|---|
single-byte |
0x00 through 0x7F and 0xA0 through 0xDF |
lead-byte |
0x81 through 0x9F and 0xE0 through 0xFC |
trailing-byte |
0x40-0x7E and 0x80-0xFC |
The trailing byte ranges overlap the range of both the single byte and lead byte characters. This can cause issues because if a multibyte character is separated between buffer boundaries, it will be interpreted differently, as defined by its composing bytes. [Phillips 05]
A third issue is caused due to the behavior of the String
class constructor. According to [API 06] String
class documentation:
The length of the new String is a function of the charset, and hence may not be equal to the length of the byte array. The behavior of this constructor when the given bytes are not valid in the given charset is unspecified.
Noncompliant Code Example
This noncompliant snippet intends to read a specific number of bytes from a FileInputStream
but suffers from a few pitfalls. The objective is to read 1024
bytes and return them as a String
. Unfortunately, this won't happen because of the general contract of the read
methods.
The other issue involves multibyte character encoding. It is possible for the read
method to read data from the stream terminating the String buffer str
with the leading byte and in the next iteration reading the trailing bytes. Since the bytes are concatenated to str
, the multibyte encoding information is lost as it does not extend across buffer boundaries.
Finally, str
will contain data represented by the default encoding of the system as no specific encoding has been specified in the call to the String
class constructor.
public static String readBytes(FileInputStream in) throws IOException { String str = ""; byte[] data = new byte[1024]; while (in.read(data) > -1) { str += new String(data); } return str; }
Compliant Solution (1)
This compliant solution takes into account the total number of bytes read (and adjusts the remaining bytes' offset) so that the required data is fully read.
The space for the data
byte buffer should be allocated depending upon the maximum number of bytes required to write an encoded character. For example, UTF-8 encoded data requires a maximum of 3
bytes to denote one character. As counter intuitive as it may sound, any character above U+FFFF
requires a maximum of 4
bytes. However, such a sequence is split into two separate char
values of 2
bytes each since Java internally uses UTF-16 for representing a char
. Therefore the buffer size should be four times the size of a typical byte sequence.
This compliant solution also states the String str
encoding explicitly to facilitate portability.
public static String readBytes(FileInputStream in) throws IOException { int offset = 0; int bytesRead = 0; byte[] data = new byte[1024]; while(true) { bytesRead += in.read(data, offset, data.length - offset); if(bytesRead == -1 || offset >= data.length) break; offset += bytesRead; } String str = new String(data, "UTF-8"); return str; }
Compliant Solution (2)
The no argument and one argument readFully()
methods of the DataInputStream
class can be used to read all the requested data. An IOException
gets thrown if the byte array overflows or during the absence of incoming data. How to proceed is left to the exception handler to decide.
public static String readBytes(DataInputStream dis) throws IOException { byte[] data = new byte[1024]; dis.readFully(data); String str = new String(data,"UTF-8"); return str; }
Risk Assessment
Non compliance can lead to the wrong number of bytes being read or character sequences being interpreted incorrectly.
Rule |
Severity |
Likelihood |
Remediation Cost |
Priority |
Level |
---|---|---|---|---|---|
FIO03-J |
low |
unlikely |
medium |
P2 |
L3 |
Automated Detection
TODO
Related Vulnerabilities
Search for vulnerabilities resulting from the violation of this rule on the CERT website.
References
[[API 06]] Class InputStream
, DataInputStream
[[Phillips 05]]
[[Harold 99]] Chapter 7: Data Streams, Reading Byte Arrays
[[Chess 07]] 8.1 Handling Errors with Return Codes
[[MITRE 09]] CWE ID 135 "Incorrect Calculation of Multi-Byte String Length"
FIO02-J. Use Runtime.exec() correctly 07. Input Output (FIO) 07. Input Output (FIO)