Hamotzi's Data Mining Log: Paper Review: A Mathematical Theory of Communication

A Mathematical Theory of Communication
Shannon's classic paper on Information Theory.

The following example gives a flavor of the objective:
We mathematically describe the information in bits per second that is produced by an information source. This is the capacity of the information source. We then try to reduce the required capacity by using the "statistical knowledge about the source". For e.g. in telegraphy - the messages to be transmitted are a sequence of letters - which are not completely random as they form sentences and have the statistical structure of languages such as English. The letter E for example, is more frequent than the letter Z, the sequence TH is more frequent than XP. We harness the understanding of this structure to reduce channel capacity (or reduce the time required to send messages). For e.g. E is encoded as a dot while infrequent letters such as Q, X and X are represented by a longer sequences of dots and dashes.

Some background:
Information sources are discrete sources that generate messages symbol by symbol. The discrete source will choose successive symbols according to certain probabilities depending on preceding choices as well the particular symbols in question. These information sources are represented by a stochastic process. E.g. natural languages such as English, etc.

Interesting properties of the entropy function:

H = 0 iff all the probabilities of the possible events p(i) - but one - are zero. When we are certain of an outcome, then uncertainity (H) is 0
For a given n (the number of all possible events), H is max and equal to log n when all the p(i) are equal. So if n is 2, and p(1) = p(2), then H is maximum and is 0.5
If there are 2 events x and y, then H(x,y) <= H(x) + H(y). If x and y are independent then H(x,y) = H(x) + H(y)
Given x and y, H(y) >= Hx(y). The uncertainity of y is never increased by the knowledge of x. It will be decreased - unless x and y are independent - in which case uncertainity remains unchanged.

Relative Entropy
The ration of the entropy of a source to the maximum value it could have while still restricted to teh same symbols is called its relative entropy. THis is the maximum compression possible when we encode into the same alphabet.
Redundancy
This is one minus the relative entropy. The redundancy of English - not considering statistical structure over distances greater than 8 letters is about 50%. This means when we write English, half of what we write is determined by the structure of the language and half is chosen freely.

Examples
Given 4 symbols - A, B, C and D with probabilities of 1/2, 1/4, 1/8 and 1/8.
H = (1/2 log 1/2 + 1/4 log 1/4 + 1/8 log 1/8 + 1/8 log 1/8) = 7/4 bits per symbol
The symbols can be encoded as A => 0, B => 10, C=> 110 and D => 111. The average number of binary digits used in encoding a sequence of N symbols will be N ( 1/2 * 1 + 1/4 * 2 + 2/8 * 3 ) = 7/4 N.

Hamotzi's Data Mining Log

Monday, August 22, 2005

Paper Review: A Mathematical Theory of Communication

0 Comments:

About Me

Previous Posts