Use visually distinct identifiers that are unlikely to be misread during development and review of code. Depending on the fonts used, certain characters are visually similar or even identical and can be misinterpreted. Consider the examples in the following table.

Misleading characters

Intended Character

Could Be Mistaken for This Character, and Vice Versa

0 (zero)

O (capital o)
D (capital d)

1 (one)

I (capital i)
l (lowercase L)

2 (two)

Z (capital z)

5 (five)

S (capital s)

8 (eight)

B (capital b)

n (lowercase N)

h (lowercase H)

rn (lowercase R, lowercase N)

m (lowercase M)

The Java Language Specification (JLS) mandates that program source code be written using the Unicode character encoding [Unicode 2013]. Some distinct Unicode characters share identical glyph representation when displayed in many common fonts. For example, the Greek and Coptic characters (Unicode Range 0370–03FF) are frequently indistinguishable from the Greek-character subset of the Mathematical Alphanumeric Symbols (Unicode Range 1D400–1D7FF).

Avoid defining identifiers that include Unicode characters with overloaded glyphs. One straightforward approach is to use only ASCII or Latin-1 characters in identifiers. Note that the ASCII character set is a subset of Unicode.

Do not use multiple identifiers that vary by only one or more visually similar characters. Also, make the initial portions of long identifiers distinct to aid recognition.

Noncompliant Code Example

This noncompliant code example has two variables, stem and stern, within the same scope that can be easily confused and accidentally interchanged:

int stem;  // Position near the front of the boat
/* ... */
int stern; // Position near the back of the boat

Compliant Solution

This compliant solution eliminates the confusion by assigning visually distinct identifiers to the variables:

int bow;   // Position near the front of the boat
/* ... */
int stern; // Position near the back of the boat

Noncompliant Code Example

This noncompliant example prints the result of adding an int and a long value even though it appears that two integers 11111 are being added. According to the JLS, §3.10.1, "Integer Literals" [JLS 2013],

An integer literal is of type long if it is suffixed with an ASCII letter L or l (ell); otherwise, it is of type int. The suffix L is preferred because the letter l (ell) is often hard to distinguish from the digit 1 (one).

Consequently, use L, not l, to clarify programmer intent when indicating that an integer literal is of type long.

public class Visual {
  public static void main(String[] args) {
    System.out.println(11111 + 1111l);
  }
}


Compliant Solution

This compliant solution uses an uppercase L (long) instead of lowercase l to disambiguate the visual appearance of the second integer. Its behavior is the same as that of the noncompliant code example, but the programmer's intent is clear:

public class Visual {
  public static void main(String[] args) {
    System.out.println(11111 + 1111L);
  }
}

Noncompliant Code Example

This noncompliant example mixes decimal values and octal values while storing them in an array. Integer literals with leading zeros denote octal values–not decimal values. According to §3.10.1, "Integer Literals" of the JLS [JLS 2013],

An octal numeral consists of an ASCII digit 0 followed by one or more of the ASCII digits 0 through 7 interspersed with underscores, and can represent a positive, zero, or negative integer.

This misinterpretation may result in programming errors and is more likely to occur while declaring multiple constants and trying to enhance the formatting with zero padding.

int[] array = new int[3];

void exampleFunction() {
  array[0] = 2719;
  array[1] = 4435;
  array[2] = 0042;
  // ...
}

The third element in array was likely intended to hold the decimal value 42. However, the decimal value 34 (corresponding to the octal value 42) is assigned.

Compliant Solution

When integer literals are intended to represent a decimal value, avoid padding with leading zeros. Use another technique instead, such as padding with whitespace, to preserve digit alignment.

int[] array = new int[3];

void exampleFunction() {
  array[0] = 2719;
  array[1] = 4435;
  array[2] =   42;
  // ...
}

Applicability

Failing to use visually distinct identifiers could result in the use of the wrong identifier and lead to unexpected program behavior.

Heuristic detection of identifiers with visually similar names is straightforward. Confusing a lowercase letter l with a digit 1 when indicating that an integer denotation is a long value can result in incorrect computations. Automated detection is trivial.

Mixing decimal and octal values can result in improper initialization or assignment.

Detection of integer literals that have a leading zero is trivial. However, determining whether the programmer intended to use an octal literal or a decimal literal is infeasible. Accordingly, sound automated detection is also infeasible. Heuristic checks may be useful.

Automated Detection

ToolVersionCheckerDescription
PVS-Studio

7.30

V6061V6097
SonarQube
9.9
S1314
S818


Bibliography



3 Comments

  1. This recommendation seems very broad to me. 

    For instance,

      array[0] = 2719;
      array[1] = 4435;
      array[2] = 0042;

    Is its own separate rule in C: DCL18-C.

    Similarly,

    System.out.println(11111 + 1111l);
    // vs
    System.out.println(11111 + 1111L);

    is a separate rule in both C and C++

    1. Yes, this guideline covers topics covered by three C guidelines. So it is broad. I would argue that it is not too broad as the issues are related, and this guideline is smaller than many other guidelines in our standard (smile)