38.6 LZW Compression

Key Idea

The LWZ approach is based on the idea of exploiting redundancy and patterns in the input data to achieve compression. Basically, each codeword can represent multiple symbols.

For example, imagine a sequence of symbols ABCABCA. In traditional compression, each symbol would be mapped to a fixed-length codeword, resulting in a compressed sequence like 010001001000100100. With the LWZ approach, the codewords can be based on patterns in the input data. In this case, the algorithm might start with a codeword table. (A --> 0, B --> 1, C --> 2). This could result in a compressed sequence looking something like 01201201. By allowing for codewords that can represent multiple symbols, the LWZ approach can achieve more efficient compression than traditional approaches.

Algorithm

  • The algorithm starts with a simple codeword table where each codeword corresponds to a single symbol.

  • Whenever a codeword is used, a new codeword is created by concatenating the previous codeword with the next symbol.

  • The algorithm does not specify what happens when the codeword table becomes full, but there are many variants of the algorithm that handle this differently.

  • A neat fact about the LWZ approach is that it is possible to reconstruct the codeword table from the compressed bitstream alone, without needing to send the table along with the compressed data.

  • LWZ decompression demo.

Fun Facts

  • The algorithm is named after its inventors, Lempel, Ziv, and Welch.

  • The LWZ algorithm is used as a component in many compression tools, including .gif files, .zip files, and more.

  • The LWZ algorithm was once controversial due to attempts to enforce licensing fees, but the patent expired in 2003.

Last updated