Huffman coding

December 2016

Huffman coding

In 1952, David Huffman proposed a statistical method allowing a binary code word to be assigned to the various symbols to be compressed (pixels or characters for example). The length of each code word is not identical for all the symbols: the most frequent symbols (those which appear most often) are coded with short code words, while the most uncommon symbols receive longer binary codes. The expression Variable Length Code (VLC) is used to indicate this type of coding because no code is the prefix of another. Thus, the final succession of coded words with variable lengths will be on average smaller than that obtained with a constant length coding.

The Huffman coder creates an ordered tree from all the symbols and their frequency of appearance. The branches are built recursively starting with the least frequent symbols.

The construction of the tree is done by initially ordering the symbols by frequency of appearance. The two symbols with the lowest appearance frequency are Successively removed from the list and are attached to a node whose weight is equal to the sum of the frequencies of the two symbols. The symbol with the least weight is assigned to branch 1, the other to branch 0 and so on, by considering each node formed as a new symbol, until only one parent node called the root is obtained.
The code for each symbol corresponds to the succession of codes along the way starting from this character to the root. Thus, the “deeper” inside the tree the symbol is, the longer the code word will be.

Consider the following sentence: "COMMENT_CA_MARCHE". The following are the appearance frequencies for the letters:

MACE_HONTR
3222211111

This is the corresponding tree:

Huffman tree

The corresponding codes for each character are such, that the codes for the most frequent characters are short and those corresponding to the least frequent symbols are long:

MACE_HONTR
001001100100111110111110101011010111

Compressions based on this type of coding yield good compression ratios, particularly for monochrome images (faxes for example). It is particularly used in the T4 and T5 recommendations used in ITU-T


Related :


Código Huffman
Código Huffman
Huffman-Kodierung
Huffman-Kodierung
Codage de Huffman
Codage de Huffman
Codifica di Huffman
Codifica di Huffman
Codificação de Huffman
Codificação de Huffman
This document entitled « Huffman coding » from CCM (ccm.net) is made available under the Creative Commons license. You can copy, modify copies of this page, under the conditions stipulated by the license, as this note appears clearly.