Comment on page
39.1 Models of Compression
In the last chapter, we saw many approaches to compression. This raises an interesting question: for a given bitstream, what is the best algorithm for compression?
For example, consider compression the text of Moby Dick using different compression formats. In this case,
bzip2yields the best compression.
Different compression formats applied to
One might argue, however, that the best possible compression algorithm for
mobydick.txtis simply as follows:
if input == 0:
return "Call me Ishamel. ...."
return the text as is
Using this as our decompression function, we can condense all of Moby Dick into a single bit!
However, there is a problem with this approach. If we include the code for the decompression algorithm as part of the compressed model (recall compression model 2 from the previous chapter), we see that Moby Dick is not compressed to one bit, but actually requires more bits than the original text!
Ultimately, the goal of a compresion algorithm is to find short sequences of bits that generate desired longer sequences of bits. Formally stated, our problem is as follows:
- Given a sequence of bits
B, find a shorter sequence
DA + C(B)that produces
Bwhen fed into an interpreter.
DArepresents the bits for the decompression algorithm, while
C(B)represents the compressed version of
Our compression model applied to
HugPlantexample from the previous chapter. Using Huffman encoding, we can achieve a compression ratio of 25%.
However, using another algorithm we'll call
MysteryXfor now, we can compress
HugPlant.bmpdown to 29,432 bits! This achieves a 0.35% compression ratio. Out of the
possible bistreams of length
, only one in
can achieve such a compression ratio.
A mystery compression algorithm that outperforms Huffman encoding by a large margin.
MysteryX? Well, it's simply the Java code
HugPlant.javawritten to generate the
.bmpfile! Going back to the model of self-extracting bits, we see the power of code and interpreters in compression. This leads us to two interesting questions:
- comprehensible compression: given a target bitstream
B, can we create an algorithm that outputs useful, readable Java code?
- optimal compression: given a target bitstream
B, can we find the shortest possible program that outputs this bitstream?