
  • We say is distributed according to distribution with notation
  • Alternatively, we may notate this as


  • The Bernoulli denotes the probability of an experiment succeeding given is the probability of success.

  • The Binomial denotes the probability of an experiment succeeding times given trials were conducted and is the probability of success.

  • The Multinomial . is a vector such that denotes the number of times the -th outcome occur and denotes the probability of it occurring. The multinomial denotes the probability of being observed in trials.

  • The Categorical is a special case of the Multnomial where and is a one hot encoded vector.

  • The Poisson is the binomial distribution but taken with a large number of trials and probability of success . It denotes the probability of successes being observed over a long period of time for very unlikely events.

  • The Empirical / Sample Distribution is a distribution where given a set of data , we have the probability of each as being . It denotes an associated empirical measure of sample.

  • The Geometric gives the probability that the -th trial is the first success observed given the experiment succeeds with probability

  • The Hypergeometric describes the probability of successes (i.e., an object is drawn with a desired feature) in draws without replacement from a population of size with objects with that feature.

  • The Negative Hypergeometric describes the probability distribution on the number of draws needed before failures are observed (see Hypergeometric)

  • The Uniform is a distribution where all events from to are equally likely and all other events never occur.

  • The Normal / Gaussian is a distribution following a bell curve whose mean, median and mode are and whose variance is . It is of the form

    One interpretation of the above is that it is a limiting distribution of the Binomial Distribution over many trials.

  • The Dirac Function is a function that is infinite at and everywhere else. It is a Gaussian with very low variance.

  • The Student is a variation of the Gaussian which is more robust to outliers (i.e., it has heavier tails). It is denoted with mean , variance and degrees of freedom . As , it becomes the Gaussian.

  • The Cauchy is a variant of the Student distribution with degree of freedom. It has such heavy tails that it does not have a mean. It can be interpreted as the distribution of the quotient of two normal distributions with the same mean.

  • The Laplace is defined as

    It is robust to outliers and puts more probability density at the center than the Gaussian.

  • The Gamma is a distribution parameterized with shape and rate . It is the limiting distribution of

  • The Exponential is denoted . Where is the rate parameter .It describes the distribution of the times between events in a Poisson process.

  • The Erlang Distribution is denoted is defined as the Gamma Distribution . It is the distribution of the time until the -th event of a Poisson Process.

  • The Chi-Squared is the distribution of the sum of squared Gaussian Random Variables

  • The Inverse Gamma is the distribution of the reciprocal of a random variable following the Gamma Function

  • The inverse chi-squared distribution is defined as

  • The Beta is a family of distributions of the form

    Where denotes the successes, and the failures.

  • The Pareto is used to model long tailed distributions where most items do not occur often. These distributions exhibit power laws. determines some threshold for which input is greater, but not by much (determined by )

  • The Multivariate Gaussian / Multivariate Normal is a generalization of the Gaussian over random variables. It is defined as

    • The expression in the exponent is the Mahalanobis distance between vector and mean .
    • This means that the contours of the probability distribution lie in ellipsoids (see Murphy Ch. 4.1.2). Informally, this means the distribution involves translating by and rotating by .
    • The eigenvalues of determine how stretched the ellipsoid contours are. The eigenvectors determine the axes of the ellipsoid.
  • The Multivariate Student is a generalization of the distribution over random variables. It is defined as

    where is the scale matrix rather than the covariance matrix and

    As with the distribution, as , the distribution tends to the Multivariate Gaussian.

  • The Dirichlet is a generalization of the Beta Distribution. It has support over the simplex (generalization of triangular surface) given by

    The distribution is defined as

    And if , then


    controls the strength of the distribution (i.e., how peaked it is) and controls where the peak occurs.

  • The Wishart is a generalization of the Gamma distribution to positive definite matrices. Its pdf is defined as

    Where is simply a normalization constant.

  • The Inverse Wishart is the generalization of the inverse gamma. It is defined for and positive definite

  • The Normal Inverse Wishart is of the following form


    • is the prior mean for
    • is the belief in .
    • is proportional for the prior mean for
    • is the belief in
  • The Normal Inverse Chi-Squared is defined as


    • is the prior mean for
    • is the belief in .
    • is proportional for the prior mean for
    • is the belief in
  • The Inverse Gaussian (or Wald) is given by


    • is the mean
    • is the shape parameter
    • Input . If the Gaussian describes Brownian motion at fixed time, the Inverse Gaussian describes the time it takes for a Brownian motion to reach a level.
  • The Normal Inverse Gaussian is defined as


    • is the mean of the Normal distribution
    • is a scaling factor to the variance of the Normal Distribution
    • is the mean of the Wald distribution
    • is the shape parameter of the Wald distribution.
  • The Boltzmann is a probability distribution that gives the probability of a certain state as a function of energy and temperature

    Where is the probability of state is the energy of state is the Boltzmann constant is the absolute temperature of the system is the number of states accessible to the system is the normalization constant.

    The Boltzmann distribution is the distribution that maximizes entropy.
