# 38.2 Prefix-free Codes

Consider the representation of English text in Java. We represent text as a sequence of characters, each taking 8 bits of memory.

One easy way to compress, then, is to simply use less than 8 bits per character. To do this, we have to decide which **codewords** (bit sequences) go with each **symbol** (character).

## Mapping Alphanumeric Symbols

### Morse Code

As an introductory example, consider the Morse code alphabet. Looking at the alphabet below, what does the sequence – – • – – • represent? It’s ambiguous! The same sequence of symbols can represent either MEME, or GG, depending on what you choose – – • to represent

<figure><img src="https://2316889115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FCLYj7ccqvV6l4Pt9R0w5%2Fuploads%2F4s5DkbcGzBoxpRc8Bb3y%2FScreen%20Shot%202023-04-24%20at%205.44.37%20PM.png?alt=media&#x26;token=bf521f7d-02f5-4c7e-95bb-a5159e9231c7" alt=""><figcaption><p>Ambiguity in morse code</p></figcaption></figure>

In real usage, operators must pause between codewords to indicate a break. The pause acts as an implicit third symbol, but we can't encode this real-time information into our code.

### Prefix-free Codes

An alternate strategy to avoid the need for real-time is to use **prefix-free codes**. In a prefix-free code, no codeword is a prefix of any other. In the Morse Code example, there would be no confusion whether the – – in the pattern – – • – – • is supposed to represent M, or the start of G.

Let's represent Morse code as a tree of codewords leading to symbols. As we can see from the tree, several symbols have representations that are prefixes of other symbols.

<figure><img src="https://2316889115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FCLYj7ccqvV6l4Pt9R0w5%2Fuploads%2FjFQrGPOHJtKo7EvY3GaL%2Fimage.png?alt=media&#x26;token=d25e6cd8-3b6f-4868-adfb-8b976526d363" alt=""><figcaption><p>Morse code is not prefix-free.</p></figcaption></figure>

As an example of an (arbitrary) prefix-free code, consider the following encoding:

<figure><img src="https://2316889115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FCLYj7ccqvV6l4Pt9R0w5%2Fuploads%2FPaawkPuLbznL6uT9XN6j%2Fimage.png?alt=media&#x26;token=1e5ba4ff-a85f-477b-8b38-2a8f63536a9f" alt=""><figcaption><p>One prefix-free code.</p></figcaption></figure>

The following code is also prefix-free:

<figure><img src="https://2316889115-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FCLYj7ccqvV6l4Pt9R0w5%2Fuploads%2F9XersVXbKTaZbKbgjsFK%2Fimage.png?alt=media&#x26;token=dd5dd3a3-cb16-40c3-ad25-754a3948c8f7" alt=""><figcaption><p>Another prefix-free code.</p></figcaption></figure>

Note that some codes are more efficient for certain strings than others: in the first representation, `I ATE` uses less bits than the second code. However, this is highly dependent on what string we're trying to encode.
