# 39.5 Compression Theory

## Compression Ratios

* The goal of data compression is to reduce the size of a sequence of data while retaining as much information as possible. For example, the letter `e` appears more frequently in the English dictionary than `z`, so we would want to represent `e` with smaller bits.
* **Compression ratio** is a measure of how much the size of the compressed data differs from the original data.
  * **Huffman Coding** is a compression technique that represents common symbols with smaller numbers of bits, resulting in a more efficient encoding.
  * **Run-length encoding** is another compression technique that replaces repeated characters with the character itself and the number of times it occurs.
  * **LZW** is a compression technique that searches for common repeated patterns in the input and replaces them with a shorter code.
* The general idea behind most compression techniques is to exploit any existing redundancy or order within the sequence to reduce the size of the data. However, if a sequence has no existing redundancy or order, compression may not be possible.

## Self-Extracting Bits

* Self-extracting bits is a compression technique that wraps the compressed bits and the decompression algorithm into a **single sequence of bits**.
* The goal is to simplify the compression and decompression process by combining the two steps into one. Self-extracting bits can be used to create executable files that can be run on any system with an interpreter (e.g. Java interpreter).

<figure><img src="https://3031105543-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F2aYwKpK43ima4gxrLWSB%2Fuploads%2Fgit-blob-e29324e57be81d05441b6318142422038d4ea431%2FScreenshot%202023-04-24%20at%209.34.44%20PM.png?alt=media" alt=""><figcaption></figcaption></figure>

## HugPlant Example

* To compress an image file like `hugplant.bmp`, we can break it into 8-bit chunks and Huffman encode each chunk.
* We package the compressed data plus decoder into a single self-extracting `.java` file, represented as a `byte[]` array.
* When the `byte[]` array is passed to an interpreter, the interpreter executes the Huffman decoding algorithm and produces the original `hugplant.bmp` image file.
* The size of the compressed bitstream and the Huffman decoding algorithm combined is **smaller** than the original image file, making it more efficient to store and transmit.
* However, the receiver of the compressed file must have access to the appropriate interpreter to decode and reconstruct the original image.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://cs61b-2.gitbook.io/cs61b-textbook-fall-2025/39.-compression-and-complexity/39.5-compression-theory.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.