38.5 Compression Theory

Compression Ratios

  • The goal of data compression is to reduce the size of a sequence of data while retaining as much information as possible. For example, the letter e appears more frequently in the English dictionary than z, so we would want to represent e with smaller bits.

  • Compression ratio is a measure of how much the size of the compressed data differs from the original data.

    • Huffman Coding is a compression technique that represents common symbols with smaller numbers of bits, resulting in a more efficient encoding.

    • Run-length encoding is another compression technique that replaces repeated characters with the character itself and the number of times it occurs.

    • LZW is a compression technique that searches for common repeated patterns in the input and replaces them with a shorter code.

  • The general idea behind most compression techniques is to exploit any existing redundancy or order within the sequence to reduce the size of the data. However, if a sequence has no existing redundancy or order, compression may not be possible.

Self-Extracting Bits

  • Self-extracting bits is a compression technique that wraps the compressed bits and the decompression algorithm into a single sequence of bits.

  • The goal is to simplify the compression and decompression process by combining the two steps into one. Self-extracting bits can be used to create executable files that can be run on any system with an interpreter (e.g. Java interpreter).

HugPlant Example

  • To compress an image file like hugplant.bmp, we can break it into 8-bit chunks and Huffman encode each chunk.

  • We package the compressed data plus decoder into a single self-extracting .java file, represented as a byte[] array.

  • When the byte[] array is passed to an interpreter, the interpreter executes the Huffman decoding algorithm and produces the original hugplant.bmp image file.

  • The size of the compressed bitstream and the Huffman decoding algorithm combined is smaller than the original image file, making it more efficient to store and transmit.

  • However, the receiver of the compressed file must have access to the appropriate interpreter to decode and reconstruct the original image.

Last updated