• An -Divergence is a function which measures the distance between two probability distributions and .

  • All -divergences with differentiable look like KL divergence up to second order when close to . Specifically

    Where is the Fisher Information matrix for calculated at .

Specific Types

KL-Divergence

  • The Kullback-Leibler Divergence is defined as
  • It measures the coding inefficiency from using a model to compress the data, when the true distribution is .
  • It also measures the dissimilarity between and
  • It can also be formulated as:
    Formulated this way, the KL divergence is the average number of extra bits needed to encode the data due to the fact we used distribution rather than the true distribution .
    • This formulation informally motivates the following inequality called Gibb’s Inequality
  • Schulman’s Approximation The KL-divergence can be estimated as follows. Assume we have access to sample points but we cannot analytically compute the sum. Let . Then

JS-Divergence

  • The Jensen-Shannon Divergence is a symmetric and smoothed version of the KL divergence defined as

    Where

    is a mixture distribution.

  • It can also be written as follows

  • In the general case, we may use weighting parameters such that

    where

    • Alternatively
  • The JSD is bounded assuming the use of base for logarithms.

    In general