Skip to end of metadata
Go to start of metadata

String data passed to complex subsystems may contain special characters that can trigger commands or actions, resulting in a software vulnerability. As a result, it is necessary to sanitize all string data passed to complex subsystems so that the resulting string is innocuous in the context in which it will be interpreted.

These are some examples of complex subsystems:

Noncompliant Code Example

Data sanitization requires an understanding of the data being passed and the capabilities of the subsystem. John Viega and Matt Messier provide an example of an application that inputs an email address to a buffer and then uses this string as an argument in a call to system() [Viega 2003]:

The risk, of course, is that the user enters the following string as an email address:

For more information on the system() call, see ENV03-C. Sanitize the environment when invoking external programs and ENV33-C. Do not call system().

Compliant Solution

It is necessary to ensure that all valid data is accepted, while potentially dangerous data is rejected or sanitized. Doing so can be difficult when valid characters or sequences of characters also have special meaning to the subsystem and may involve validating the data against a grammar. In cases where there is no overlap, whitelisting can be used to eliminate dangerous characters from the data.

The whitelisting approach to data sanitization is to define a list of acceptable characters and remove any character that is not acceptable. The list of valid input values is typically a predictable, well-defined set of manageable size. This compliant solution, based on the tcp_wrappers package written by Wietse Venema, shows the whitelisting approach:

The benefit of whitelisting is that a programmer can be certain that a string contains only characters that are considered safe by the programmer. Whitelisting is recommended over blacklisting, which traps all unacceptable characters, because the programmer needs only to ensure that acceptable characters are identified. As a result, the programmer can be less concerned about which characters an attacker may try in an attempt to bypass security checks.

Noncompliant Code Example

This noncompliant code example is taken from [VU#881872], a vulnerability in the Sun Solaris TELNET daemon (in.telnetd) that allows a remote attacker to log on to the system with elevated privileges.

The vulnerability in in.telnetd invokes the login program by calling execl(). This call passes unsanitized data from an untrusted source (the USER environment variable) as an argument to the login program:

An attacker, in this case, can gain unauthenticated access to a system by setting the USER environment variable to a string, which is interpreted as an additional command-line option by the login program. This kind of attack is called argument injection.

Compliant Solution

This compliant solution inserts the "--" (double dash) argument before the call to getenv("USER") in the call to execl():

Because the login program uses the POSIX getopt() function to parse command-line arguments, and because the "--" option causes getopt() to stop interpreting options in the argument list, the USER variable cannot be used by an attacker to inject an additional command-line option. This is a valid means of sanitizing the untrusted user data in this context because the behavior of the interpretation of the resulting string is rendered innocuous.

The call to execl() is not susceptible to command injection because the shell command interpreter is not invoked. (See ENV33-C. Do not call system().)

Risk Assessment

Failure to sanitize data passed to a complex subsystem can lead to an injection attack, data integrity issues, and a loss of sensitive data.




Remediation Cost









Automated Detection







Command injection
Format string injection
LDAP injection
Library injection
SQL injection

Coverity6.5TAINTED_STRINGFully implemented



LDRA tool suite9.5.6108 D, 109 DPartially implemented
Polyspace Bug FinderR2016a

Execution of externally controlled command

Command executed from externally controlled path

Library loaded from externally controlled path

Command argument from an unsecure source vulnerable to operating system command injection

Path argument from an unsecure source

Using a library argument from an externally controlled path

Related Vulnerabilities

Search for vulnerabilities resulting from the violation of this rule on the CERT website.

Related Guidelines




  1. Historical Context

    The history of this bug is as follows. The expreserve command was set-uid root. When the editor crashed it would save the vi editor buffer and send mail to the user that their session was saved.

    To abuse this:

    • create an executable shell script called "bin" in the current directory
    • execute "PATH=.:$PATH IFS=/ vi"
    • type some text
    • type ":preserve"

    The commands in the bin directory are executed with root privileges. This worked fine on my System V Release 2 box in 1987. See also the text around "preserve" in http://securitydigest.org/unix/archive/032

    --Wietse Venema

  2. This brings up the point that safe shell scripts need to set IFS and PATH.

  3. This rule in its most general form is unenforceable, because there is no rigorous definition of a 'complex subsystem'...both system() and execle() are used as examples.

    We could try to catch violations on individual cases, like the system() call in the first NCCE. These violations are caught by dynamic analysis much more easily.

    How do Fortify and Klocwork enforce this rule???

  4. I presume this is the best rule to handle usage of _mbs functions, but I suspect we could use a more specific rule: https://www.fortify.com/vulncat/en/vulncat/cpp/often_misused_strings__mbs.html

    1. The _mbsxxx Multibyte/wide string conversion functions referenced on the Fortify page are Windows-specific and the problems mentioned there are simply bugs. I'd like to see rules that are generally applicable and independent of implementation details (unless the details are sufficiently common across implementations). Other than that, if we had a section of localization practices I think it would be the best home for rules on the appropriate use of these function.

  5. It seems that there is an error in the provided example. The first letter of "user_data" is always replaced by "_".
    I would recommend something like:

      1. I think that you have forgotten the "cp = user_data;" line.

        For information, there is the same issue with STR02-CPP example.

        1. Fixed (both here are in STR02-CPP)