# 38.5 Compression Theory

## Compression Ratios

• The goal of data compression is to reduce the size of a sequence of data while retaining as much information as possible. For example, the letter `e` appears more frequently in the English dictionary than `z`, so we would want to represent `e` with smaller bits.
• Compression ratio is a measure of how much the size of the compressed data differs from the original data.
• Huffman Coding is a compression technique that represents common symbols with smaller numbers of bits, resulting in a more efficient encoding.
• Run-length encoding is another compression technique that replaces repeated characters with the character itself and the number of times it occurs.
• LZW is a compression technique that searches for common repeated patterns in the input and replaces them with a shorter code.
• The general idea behind most compression techniques is to exploit any existing redundancy or order within the sequence to reduce the size of the data. However, if a sequence has no existing redundancy or order, compression may not be possible.

## Self-Extracting Bits

• Self-extracting bits is a compression technique that wraps the compressed bits and the decompression algorithm into a single sequence of bits.
• The goal is to simplify the compression and decompression process by combining the two steps into one. Self-extracting bits can be used to create executable files that can be run on any system with an interpreter (e.g. Java interpreter).

## HugPlant Example

• To compress an image file like `hugplant.bmp`, we can break it into 8-bit chunks and Huffman encode each chunk.
• We package the compressed data plus decoder into a single self-extracting `.java` file, represented as a `byte[]` array.
• When the `byte[]` array is passed to an interpreter, the interpreter executes the Huffman decoding algorithm and produces the original `hugplant.bmp` image file.
• The size of the compressed bitstream and the Huffman decoding algorithm combined is smaller than the original image file, making it more efficient to store and transmit.
• However, the receiver of the compressed file must have access to the appropriate interpreter to decode and reconstruct the original image.