Perl's capture variables ($1
, $2
, etc.) are assigned the values of capture expressions after a regular expression (regex) match has been found. If a regex fails to find a match, the contents of the capture variables can remain undefined. The perlre manpage [Wall 2011] contains this note:
NOTE: Failed matches in Perl do not reset the match variables, which makes it easier to write code that tests for a series of more specific cases and remembers the best match.
Consequently, the value of a capture variable can be indeterminate if a previous regex failed. The value can also be overwritten on subsequent regex matches. Always ensure that a regex was successful before reading its capture variables.
Noncompliant Code Example
This noncompliant code example demonstrates the hazards of relying on capture variables without testing the success of a regex.
my $data = "[ 4.693540] sr 1:0:0:0: Attached scsi CD-ROM sr0"; my $cd; my $time; $data =~ /Attached scsi CD-ROM (.*)/; $cd = $1; print "cd is $cd\n"; $data =~ /\[(\d*)\].*/; # this regex will fail $time = $1; print "time is $time\n";
This code produces the following output:
cd is sr0 time is sr0
The surprising value for the $time
variable arises because the regex fails, leaving the capture variable $1
still holding its previously assigned value sr0
.
Compliant Solution
In this compliant solution, both regular expressions are checked for success before the capture variables are accessed.
my $data = "[ 4.693540] sr 1:0:0:0: Attached scsi CD-ROM sr0"; my $cd; my $time; if ($data =~ /Attached scsi CD-ROM (.*)/) { $cd = $1; print "cd is $cd\n"; } if ($data =~ /\[(\d*)\].*/) { # this regex will fail $time = $1; print "time is $time\n"; }
This code produces the following output:
cd is sr0
This output might not be what the developer expected, but it clearly reveals that the latter regex failed to find a match.
Risk Assessment
Recommendation | Severity | Likelihood | Remediation Cost | Priority | Level |
---|---|---|---|---|---|
STR30-PL | Medium | Probable | Medium | P8 | L2 |
Automated Detection
Tool | Diagnostic |
---|---|
Perl::Critic | RegularExpressions::ProhibitCaptureWithoutTest |
Bibliography
[Conway 05] | "Captured Values," p. 253 |
---|---|
[CPAN] | Elliot Shank, Perl-Critic-1.116 RegularExpressions::ProhibitCaptureWithoutTest |
[Wall 2011] | perlre |
2 Comments
Edward Avis
I would go further and say that you should not use the capture variables for anything other than assignment to your own variable. There are many gotchas. For example calling a subroutine as foo($1) will have bizarre effects if the implementation of foo() does a regexp match before unpacking its argument list.
David Svoboda
Agreed. Reworded intro & title.