An -Divergence is a function which measures the distance between two probability distributions and .
All -divergences with differentiable look like KL divergence up to second order when close to . Specifically

Where is the Fisher Information matrix for calculated at .

Specific Types

KL-Divergence

The Kullback-Leibler Divergence is defined as
It measures the coding inefficiency from using a model to compress the data, when the true distribution is .
It also measures the dissimilarity between and
It can also be formulated as:
Formulated this way, the KL divergence is the average number of extra bits needed to encode the data due to the fact we used distribution rather than the true distribution .
- This formulation informally motivates the following inequality called Gibb’s Inequality
Schulman’s Approximation The KL-divergence can be estimated as follows. Assume we have access to sample points but we cannot analytically compute the sum. Let . Then

The Jensen-Shannon Divergence is a symmetric and smoothed version of the KL divergence defined as

Where

is a mixture distribution.
It can also be written as follows
In the general case, we may use weighting parameters such that

where
- Alternatively
The JSD is bounded assuming the use of base for logarithms.

In general