- The study of efficient data compression, and robust and reliable data transmission.
Important Quantities
-
Let
be a probability distribution. The entropy is defined as - It can be interpreted as the expected amount of surprise that we may have about a given event. That is, it is the expected amount of information we can gain from the distribution.
- This also corresponds to the degree in which the distribution is Uniform.
- This also corresponds with the amount of uncertainty that we may have about the distribution.
-
Let
and be probability distributions. The cross entropy is defined as - It is a measure of the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution
rather than the true distribution .
- It is a measure of the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution
-
The conditional entropy is defined as
- It measures how much entropy
has remaining given we have learnt the value of .
- It measures how much entropy
-
The Pointwise Mutual Information between two events
and is defined as - It measures the discrepancy between these occurring together compared to what would be expected by chance.
- It is also the amount we learn from updating a prior into a posterior.
Topics
Miscellaneous
-
The Kozachenko-Leonenko Estimate* 1 for Entropy works as follows. Let
be a continuous random variable with values in some metric space and be the density. The entropy is defined as And estimated using the digamma function
. Let be twice the distance from to its -th nearest neighbor. Then
Where
is the dimension of and is the volume of the -dimensional unit ball. -
The idea is to estimate
using the probability distribution between and its -th nearest neighbor — specifically, , is the probability that a point is within from , that there are other points at smaller distances, and points at larger distances. Let
be the mass of the -ball centered at . It can be shown that
If we assume that
is constant in the entire -ball, we have And
-
For maximum norm, set
-
For Euclidean norm, set
-
The estimator is unbiased if
is strictly constant.
-
Links
- Probability Theory - more on probability which is the basis of Information Theory.
- Murphy Ch. 2.8
Footnotes
-
: A specification is given in Kraskov, Stoegbauer, and Grassberger (2003) Estimating Mutual Information. The original paper is in Russian. ↩