38.3 Shannon-Fano Codes

Shannon-Fano codes are an approach to create prefix-free codes based on a set of symbols/characters and their probabilities. The main idea is that we want shorter prefix-free codes for more popular characters, and longer codes for lesser used characters.
The algorithm is:
  • Count relative frequencies of all characters in a text.
  • Split into ‘left’ and ‘right halves’ of roughly equal frequency.
    • Left half gets a leading zero. Right half gets a leading one.
    • Repeat.
At the end, you will get a tree as shown below, with shorter paths for characters with a higher frequency, and longer paths for characters with a lower frequency.
However, Shannon-Fano coding is NOT optimal, so it is not used very often.