October 2016

In 1952, David Huffman proposed a statistical method allowing a binary code word to be assigned to the various symbols to be compressed (pixels or characters for example). The length of each code word is not identical for all the symbols: the most frequent symbols (those which appear most often) are coded with short code words, while the most uncommon symbols receive longer binary codes. The expression **Variable Length Code** (*VLC*) is used to indicate this type of coding because no code is the prefix of another. Thus, the final succession of coded words with variable lengths will be on average smaller than that obtained with a constant length coding.

The Huffman coder creates an ordered tree from all the symbols and their frequency of appearance. The branches are built recursively starting with the least frequent symbols.

The construction of the tree is done by initially ordering the symbols by frequency of appearance. The two symbols with the lowest appearance frequency are Successively removed from the list and are attached to a node whose weight is equal to the sum of the frequencies of the two symbols. The symbol with the least weight is assigned to branch 1, the other to branch 0 and so on, by considering each node formed as a new symbol, until only one parent node called the *root* is obtained.

The code for each symbol corresponds to the succession of codes along the way starting from this character to the root. Thus, the “deeper” inside the tree the symbol is, the longer the code word will be.

Consider the following sentence: "COMMENT_CA_MARCHE". The following are the appearance frequencies for the letters:

M | A | C | E | _ | H | O | N | T | R |

3 | 2 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 |

This is the corresponding tree:

The corresponding codes for each character are such, that the codes for the most frequent characters are short and those corresponding to the least frequent symbols are long:

M | A | C | E | _ | H | O | N | T | R |

00 | 100 | 110 | 010 | 011 | 1110 | 1111 | 1010 | 10110 | 10111 |

Compressions based on this type of coding yield good compression ratios, particularly for monochrome images (faxes for example). It is particularly used in the T4 and T5 recommendations used in ITU-T

Código Huffman

Huffman-Kodierung

Codage de Huffman

Codifica di Huffman

Codificação de Huffman