39.1 Models of Compression
Comparing Compression Algorithms
In the last chapter, we saw many approaches to compression. This raises an interesting question: for a given bitstream, what is the best algorithm for compression?
For example, consider compression the text of Moby Dick using different compression formats. In this case, bzip2 yields the best compression.

mobydick.txtOne might argue, however, that the best possible compression algorithm for mobydick.txt is simply as follows:
D(B):
if input == 0:
return "Call me Ishamel. ...."
else:
return the text as isUsing this as our decompression function, we can condense all of Moby Dick into a single bit!
However, there is a problem with this approach. If we include the code for the decompression algorithm as part of the compressed model (recall compression model 2 from the previous chapter), we see that Moby Dick is not compressed to one bit, but actually requires more bits than the original text!
Decompression Algorithms
Ultimately, the goal of a compresion algorithm is to find short sequences of bits that generate desired longer sequences of bits. Formally stated, our problem is as follows:
Given a sequence of bits
B, find a shorter sequenceDA + C(B)that producesBwhen fed into an interpreter.DArepresents the bits for the decompression algorithm, whileC(B)represents the compressed version ofB.

mobydick.txt.Improving Compression
Recall the HugPlant example from the previous chapter. Using Huffman encoding, we can achieve a compression ratio of 25%.
However, using another algorithm we'll call MysteryX for now, we can compress HugPlant.bmp down to 29,432 bits! This achieves a 0.35% compression ratio. Out of the possible bistreams of length , only one in can achieve such a compression ratio.

What is MysteryX? Well, it's simply the Java code HugPlant.java written to generate the .bmp file! Going back to the model of self-extracting bits, we see the power of code and interpreters in compression. This leads us to two interesting questions:
comprehensible compression: given a target bitstream
B, can we create an algorithm that outputs useful, readable Java code?optimal compression: given a target bitstream
B, can we find the shortest possible program that outputs this bitstream?
Last updated