Skip to content

Compression Streams

Overview

Compression Streams offer a built in way to compress and decompress data based on several common file formats.

Supported Formats

Format Specification
brotli RFC7932
gzip RFC1952
deflate (also known as ZLIB) RFC1950
deflate-raw RFC1951

Specification

Design

Compression Streams work by passing a chunk of data (uint8 array) in and then returning the compression/decompression output of that chunk. There can be many chunks of data passed in during a compression/decompression stream.

Speed is of paramount importance when trying to compress/decompress data.

In Compression Streams, we try to allocate enough memory up front, so we only have to perform the compression once. For Decompression Streams, we employ a 2x increase in memory allocated each pass we attempt to decompress a chunk (with a cap of 1GB). This design was done to ensure minimal amounts of allocations being required, and to limit the number of times we needed to call inflate/deflate.

Of course with the increase in memory usage, on smaller memory devices we may hit a cap. To handle this, we implemented a back off algorithm that will quickly scale back memory allocation size if they start to fail.

We currently buffer all the compression/decompression output per chunk, and return them to the user all at once. We could also consider the possibility of returning the chunk output to the user after every call to inflate/deflate.

Libraries

On Windows and Linux we exclusively use zlib to perform the heavy lifting for all compression and decompression algorithms. Brotli is currently not available on these platforms.

On Apple platforms we leverage the Compression Framework for the brotli format. The other formats use the zlib library directly.

Notes

Compression Streams does contain one minor flaw for the gzip and deflate algorithms. There is a checksum located at the end of both of these file formats. This means that you can decompress a large file, but if any issue occurs during verifying the checksum step you may not be able to trust any of the output. This is ultimately due to the file formats design, but it does not allow for any real salvaging of any of the decompressed data if something goes wrong.