Java provides the java.util.zip package for zip-compatible data compression. It provides classes that enable you to read, create, and modify ZIP and GZIP file formats. 

A number of security concerns must be considered when extracting file entries from a ZIP file using java.util.zip.ZipInputStream. File names may contain path traversal information that may cause them to be extracted outside of the intended directory, frequently with the purpose of overwriting existing system files. Directory traversal or path equivalence vulnerabilities can be eliminated by canonicalizing the path name, in accordance with FIO16-J. Canonicalize path names before validating them, and then validating the location before extraction.

A second issue is that the extraction process can cause excessive consumption of system resources, possibly resulting in a denial-of-service attack when resource usage is disproportionately large compared to the input data. The zip algorithm can produce very large compression ratios [Mahmoud 2002]. For example, a file consisting of alternating lines of a characters and b characters can achieve a compression ratio of more than 200 to 1. Even higher compression ratios can be obtained using input data that is targeted to the compression algorithm. This permits the existence of zip bombs in which a small ZIP or GZIP file consumes excessive resources when uncompressed. An example of a zip bomb is the file 42.zip, which is a zip file consisting of 42 kilobytes of compressed data, containing five layers of nested zip files in sets of 16, each bottom layer archive containing a 4.3 gigabyte (4 294 967 295 bytes; ~ 3.99 GiB) file for a total of 4.5 petabytes (4 503 599 626 321 920 bytes; ~ 3.99 PiB) of uncompressed data. Zip bombs often rely on repetition of identical files to achieve their extreme compression ratios. Programs must either limit the traversal of such files or refuse to extract data beyond a certain limit. The actual limit depends on the capabilities of the platform and expected usage.

Noncompliant Code Example

This noncompliant code fails to validate the name of the file that is being unzipped. It passes the name directly to the constructor of FileOutputStream. It also fails to check the resource consumption of the file that is being unzipped. It permits the operation to run to completion or until local resources are exhausted.

static final int BUFFER = 512;
// ...

public final void unzip(String filename) throws java.io.IOException{
  FileInputStream fis = new FileInputStream(filename);
  ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
  ZipEntry entry;
  try {
    while ((entry = zis.getNextEntry()) != null) {
      System.out.println("Extracting: " + entry);
      int count;
      byte data[] = new byte[BUFFER];
      // Write the files to the disk
      FileOutputStream fos = new FileOutputStream(entry.getName());
      BufferedOutputStream dest = new BufferedOutputStream(fos, BUFFER);
      while ((count = zis.read(data, 0, BUFFER)) != -1) {
        dest.write(data, 0, count);
      }
      dest.flush();
      dest.close();
      zis.closeEntry();
    }
  } finally {
    zis.close();
  }
}

Noncompliant Code Example (getSize())

This noncompliant code attempts to overcome the problem by calling the method ZipEntry.getSize() to check the uncompressed file size before uncompressing it. Unfortunately, a malicious attacker can forge the field in the ZIP file that purports to show the uncompressed size of the file, so the value returned by getSize() is unreliable, and local resources may still be exhausted.

static final int BUFFER = 512;
static final int TOOBIG = 0x6400000; // 100MB
// ...

public final void unzip(String filename) throws java.io.IOException{
  FileInputStream fis = new FileInputStream(filename);
  ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
  ZipEntry entry;
  try {
    while ((entry = zis.getNextEntry()) != null) {
      System.out.println("Extracting: " + entry);
      int count;
      byte data[] = new byte[BUFFER];
      // Write the files to the disk, but only if the file is not insanely big
      if (entry.getSize() > TOOBIG ) {
         throw new IllegalStateException("File to be unzipped is huge.");
      }
      if (entry.getSize() == -1) {
         throw new IllegalStateException("File to be unzipped might be huge.");
      }
      FileOutputStream fos = new FileOutputStream(entry.getName());
      BufferedOutputStream dest = new BufferedOutputStream(fos, BUFFER);
      while ((count = zis.read(data, 0, BUFFER)) != -1) {
        dest.write(data, 0, count);
      }
      dest.flush();
      dest.close();
      zis.closeEntry();
    }
  } finally {
    zis.close();
  }
}

Acknowledgement: The vulnerability in this code was pointed out by Giancarlo Pellegrino, researcher at the Technical University of Darmstadt in Germany, and Davide Balzarotti, faculty member of EURECOM in France.

Compliant Solution

In this compliant solution, the code validates the name of each entry before extracting the entry. If the name is invalid, the entire extraction is aborted. However, a compliant solution could also skip only that entry and continue the extraction process, or it could even extract the entry to some safe location.

Furthermore, the code inside the while loop tracks the uncompressed file size of each entry in a zip archive while extracting the entry. It throws an exception if the entry being extracted is too large—about 100MB in this case. We do not use the ZipEntry.getSize() method because the value it reports is not reliable. Finally, the code also counts the number of file entries in the archive and throws an exception if there are more than 1024 entries.

static final int BUFFER = 512;
static final long TOOBIG = 0x6400000; // Max size of unzipped data, 100MB
static final int TOOMANY = 1024;      // Max number of files
// ...

private String validateFilename(String filename, String intendedDir)
      throws java.io.IOException {
  File f = new File(filename);
  String canonicalPath = f.getCanonicalPath(); 

  File iD = new File(intendedDir);
  String canonicalID = iD.getCanonicalPath();
  
  if (canonicalPath.startsWith(canonicalID)) {
    return canonicalPath;
  } else {
    throw new IllegalStateException("File is outside extraction target directory.");
  }
}

public final void unzip(String filename) throws java.io.IOException {
  FileInputStream fis = new FileInputStream(filename);
  ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
  ZipEntry entry;
  int entries = 0;
  long total = 0;
  try {
    while ((entry = zis.getNextEntry()) != null) {
      System.out.println("Extracting: " + entry);
      int count;
      byte data[] = new byte[BUFFER];
      // Write the files to the disk, but ensure that the filename is valid,
      // and that the file is not insanely big
      String name = validateFilename(entry.getName(), ".");
      if (entry.isDirectory()) {
        System.out.println("Creating directory " + name);
        new File(name).mkdir();
        continue;
      }
      FileOutputStream fos = new FileOutputStream(name);
      BufferedOutputStream dest = new BufferedOutputStream(fos, BUFFER);
      while (total + BUFFER <= TOOBIG && (count = zis.read(data, 0, BUFFER)) != -1) {
        dest.write(data, 0, count);
        total += count;
      }
      dest.flush();
      dest.close();
      zis.closeEntry();
      entries++;
      if (entries > TOOMANY) {
        throw new IllegalStateException("Too many files to unzip.");
      }
      if (total + BUFFER > TOOBIG) {
        throw new IllegalStateException("File being unzipped is too big.");
      }
    }
  } finally {
    zis.close();
  }
}

Risk Assessment

Rule

Severity

Likelihood

Remediation Cost

Priority

Level

IDS04-J

Low

Probable

High

P2

L3

Automated Detection

ToolVersionCheckerDescription
The Checker Framework

2.1.3

Tainting CheckerTrust and security errors (see Chapter 8)
SonarQube
9.9

S5042

Expanding archive files is security-sensitive

Related Guidelines

MITRE CWE

CWE-409, Improper Handling of Highly Compressed Data (Data Amplification)

Secure Coding Guidelines for Java SE, Version 5.0

Guideline 1-1 / DOS-1: Beware of activities that may use disproportionate resources

Related Vulnerabilities

VulnerabilityDescription
Zip Slip

Zip Slip is a form of directory traversal that can be exploited by extracting files from an archive. It is caused by a failure to validate path names of the files within an archive which can lead to files being extracted outside of the intended directory and overwriting existing system files. An attacker can exploit this vulnerability to overwrite executable files to achieve remote command execution on a victim’s machine. Snyk responsibly disclosed the vulnerability before public disclosure on June 5th 2018. Their blog post and technical paper detailing the vulnerability can be found at https://snyk.io/blog/zip-slip-vulnerability/.


Android Implementation Details

Although not directly a violation of this rule, the Android Master Key vulnerability (insecure use of ZipEntry) is related to this rule. Another attack vector, found by a researcher in China, is also related to this rule.

Bibliography



20 Comments

  1. There are a few problems with the compliant solution.

    Firstly, can we trust ZipEntry.getSize? No, not really. Currently ZipInputStream checks decompressed size (and others) at the end of reading the data, which is a bit late. Who knows what ZipFile does. I strongly suggest metering the amount of data actually read. Probably don't bother with the lying getSize. Probably also worth metering the size of the raw zip file itself.

    Example is also subject to directory traversal and similar attacks (suggest run away from using untrusted data in file paths). And it doesn't close the output stream.

    (100 MB is probably better written as 100*1024*1024 than 0x6400000.)

    1. Fair enough. I think we could generalize your point to say that this rule should be just an instance of IDS00-J, where the "untrusted data" includes any metadata coming from the zip file.  I guess the only wrinkle is that, in this case, the flow of untrusted input (entry size) to trusted output (uncompressed file size) is implicit, whereas IDS00-J seems to be more about explicit data.

      Do we need a rule (or extension of IDS00-J) for implicit flows?

      1. I don't think we need a new rule...we can fix this one.

        The getSize() API doc also allows it to return -1 if the size is 'not known', so I wouldn't trust it in Java 1.8 even if it worked now. Still, I would trust getSize() more than the size of the raw ZIP file itself. Best suggestion is to amend the CS to terminate the while loop after reading TOOBIG bytes, if it gets that far.

        We do discuss directory traversal attacks in this rule:
        FIO00-J. Do not operate on files in shared directories
        but that deals with checking the path for a preexisting file, not for creating a new file. Probably could be solved by making sure that none of the directory names in the path are . or .. The other hazards only apply to preexisting directories, which is not an issue when doing uncompression.

        The compliant solution didn't close anything because it is intended to replace a subset of the noncompliant solution. Prob we should explicitly do the closing just to avoid confusion.

        1. I was only suggesting metering the raw file as an addition to metering the output. As far as I can see, an entry can only get as far as allocating 16-bit sized chunks of memory, which it will then attempt to read. At least in ZipInputStream. ZipFile is altogether a dodgier area.

          There's all sorts of issues with user supplied file name: directory traversal, file extensions, nulls, translation of odd characters.

          The non-compliant solution doesn't release resource in the event of an exception.

          1. Do we need to say anything about ZipFile?  Currently we do not, and the first article I read on the topic suggests that it is better to use ZipFile  http://javarevisited.blogspot.com/2014/06/2-examples-to-read-zip-files-in-java-zipFile-vs-zipInputStream.html

        2. Two other items not taken into account by this sample:

          1. Total disk space used is not tracked. Should add a int grandTotal and also quit if that exceeds TOOBIG.

          2. Total number of entries. You could also exhaust inodes by zipping up a file with thousands of tiny entries. Should add an int entryCount and quit if that exceeds TOOMANY.

          Of course, in a real implementation, these would be methods that dynamically set limits rather than hard-coded values.  It's always a bug to allow a user to even accidentally exhaust the machine's resources (memory, disk, inodes, etc.).

          In a real implementation, too, you should log these errors as a security exception.

          1. I've fixed the NCCE to use a finally clause to ensure that the ZIP file is closed even if an exception is thrown. (The CS already did this.)

            I've modified the CS to also count the number of files extracted and throw an exception if this exceeds a constant...1024 in the example.

            I've also modified the CS to not extract more than total bytes. This should address issues of running out of disk space, assuming it has space for total bytes. Actually the maximum amount of space required will be something like total + BUFFER + TOOMANY * BLOCK_SIZE. Exhausting disk space is a nasty problem, partially because many systems give no indication when it happens, and many other programs can crash when their attempts to write to disk fail.

            BTW see rule ERR00-J. Do not suppress or ignore checked exceptions for more details on how to handle exceptions securely.

  2. Probably total should be a long. Consider changing

     while (total <= TOOBIG ..

    to

     while (total < TOOBIG ..

    combined with an (overflow aware) check if total+count is smaller than TOOBIG before the write would prevent the total allocated size to grow ever beyond TOOBIG. Otherwise a large chunk exceeding the total limit might be written in the last iteration.


    1. Agreed, fixed on both counts.

      1. I amended the code wrt Marc's last paragraph. At this point TOOBIG is a hard limit, and the code will not exceed it. It may, however, spuriously fail if the total number of bytes is TOOBIG - BUFFER. Given the real-world scenario, I consider this an acceptable tradeoff.

  3. For the compliant solution:

    validateFileName needs to throw IOException

    methods should be static

    Adding the CS to a test based on 42.zip from http://en.wikipedia.org/wiki/Zip_bomb failed because ZipInputStream doesn't handle encrypted entries.  Does this rule out the exploits mentioned?

    Stub follows:

     

        @Test

        public void checkZipBomb() {

            try {

                unzip("42.zip");

                fail();

            } catch(FileNotFoundException x) {

                fail(x.getMessage());

            } catch(IOException e) {

                System.out.println(e);

                assertEquals("Too many files to unzip.", e.getMessage());

            }

        }

     

    result:

    org.junit.ComparisonFailure:

    Expected :Too many files to unzip.

    Actual   :encrypted ZIP entry not supported

    1. Modifying unzip like so:


          public static void unzip(String filename, String password) throws java.io.IOException{
              FileInputStream fis = new FileInputStream(filename);
              ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
              if(password != null) {
                  ZipDecryptInputStream zdis = new ZipDecryptInputStream(fis, password);
                  zis = new ZipInputStream(new BufferedInputStream(zdis));
              }

      and using info from : http://blog.alutam.com/2009/10/31/reading-password-protected-zip-files-in-java/

       

      Test now fails with "invalid compression method"

      1. OK compression method 99 is for AES encrypted entries.  To read this zip it looks like a library like TrueZip or zip4j would be required.

        Are there some publicly available malicious zip files so that I can test the CS works with just standard java?

        1. A few options I can see:

          1. unzip the 42.zip and then zip it up yourself using windows (so it isn't using AES).
          2. create your own smaller encrypted test zips and change the CS constants so you don't need so many entries or such large files to prove it is working -- don't forget to create some files that go into places you don't want them to go (doesn't have to be anything sensistive, of course).
          1. Unfortunately I can't do (1) as our AV software correctly detects this as a malicious file (smile)

  4. Just to add a couple other things this doesn't currently handle.

    1. subdirectories.  Currently it will take a subdirectory entry and incorrectly create a file rather than a directory. You'll then get IOExceptions when it tries to put files into the 'subdirectory'.
      It's fixed simply with

                    name = validateFilename(entry.getName(), ".");

                    if (entry.isDirectory())  {

                      System.out.println("Creating directory " + name);

                      new File(name).mkdir();

                      continue;

                    }

      right above where you currently create a FileOutputStream fos.
    2. incomplete files.  Currently it will leave the final incomplete file on disk, assuming the total data exceeds the limit. I have a meeting at the moment, but I'll post an example that solves these.

       

       

       

    1. I've incorporated A. Bishop's fixes to the compliant solution.

      I suspect there are more features we could add to enhance our 'secure unzip', but the code is intended to be illustrative rather than feature-complete. So I'm not going to worry about incomplete files.

      I suspect the exploit would work on an unencrypted 42.zip file, or any zip file that java.util.zip could handle.

      1. This is sort of a odd rule.  We normally focus on one problem, such as "resource exhaustion".  A rule like that might include extracting the contents of a zip file or even just reading a single file that is too large.  Another rule may focus on the path traversal problem, which IDS02-J. Canonicalize path names before validating them sort of does.  The title of this rule is very sweeping: IDS04-J. Safely extract files from ZipInputStream.  It sort of suggests that the compliant solution will address every security problem.  Given this scope, concerns like the ones expressed by Ron Craig  are completely valid. On the other hand, it is probably a reasonable idea to have ZipInputStream in the title of a rule/guideline, so that folks will discover this rule when attempting to use the java.util.zip package.

         

  5. This implementation will never throw "File being unzipped is too big." exception as we break the while loop in case

    total + BUFFER > TOOBIG

    but we throw exception in case

    total > TOOBIG

    1. You're right. Tweaked the exception to only throw if total + BUFFER > TOOBIG, which is the same condition in the while loop. Also made TOOBIG a long (although it is currently within the range of int, but it is close to the max value)