Method:
The Huffman coding which implies the greedy algorithm has its theory:
If you have a set of characters , and called it $C$ , has several characters which may or may not overlap each other.
Then lets say $C$ contains these characters: f,e,d,c,b,a
each character has different frequency which we will called it $F$ . Now we have a sequence like: f(5) ,e(9), d(16), c(12),b(13),a(45)
Now if we want to code these characters in order to get a better compression of the space we stored it , we have two choices
- we just simply code it into same length code
- we use the greedy method, which means we do our best to put the higher frequency character into short codes and lower frequency characters into the longer codes As these kind of coding can be describe as a tree , and denote as follows:
$B(T)=\sum_{c\in C }{f(c)d_t(c)}$
$B(T)$: the final coding length
$f(c)$: the frequency of the character
$d_T(c)$: the depth of character in the tree, which also means the code length
For case 1 and 2, we will show in graph a and b
apparently case 1 has larger $B(T)$ than case 2.