-
An
-Divergence is a function which measures the distance between two probability distributions and . -
All
-divergences with differentiable look like KL divergence up to second order when close to . Specifically Where
is the Fisher Information matrix for calculated at .
Specific Types
KL-Divergence
- The Kullback-Leibler Divergence is defined as
- It measures the coding inefficiency from using a model
to compress the data, when the true distribution is . - It also measures the dissimilarity between
and - It can also be formulated as:
Formulated this way, the KL divergence is the average number of extra bits needed to encode the data due to the fact we used distribution
rather than the true distribution . - This formulation informally motivates the following inequality called Gibb’s Inequality
- This formulation informally motivates the following inequality called Gibb’s Inequality
- Schulman’s Approximation The KL-divergence can be estimated as follows. Assume we have access to sample points
but we cannot analytically compute the sum. Let . Then
JS-Divergence
-
The Jensen-Shannon Divergence is a symmetric and smoothed version of the KL divergence defined as
Where
is a mixture distribution.
-
It can also be written as follows
-
In the general case, we may use weighting parameters
such that where
- Alternatively
- Alternatively
-
The JSD is bounded assuming the use of base
for logarithms. In general