Important Quantities
Topics
Miscellaneous
Links

Graph View

Backlinks

Trivia
Hacking
Theory of Computation
Algorithms
Computational Macroeconomics
Factory Dynamics
Network Communities
Characterizing Systems
Complex Systems
Information in Systems
System Dynamics
Games as Rules
Extending Context in Transformers
Representation Learning
Loss Function
Metrics for Language Modeling
Communication with MARL
Heterogeneous MARL
Basic Policy Gradient Methods
Energy-Based Reinforcement Learning Models
Trust Region Policies
Mathematics
Statistics
Mutual Information
Bayesian Models
Gaussian Models
Probability Distributions Zoo
Probability Theory
Belief Propagation
Chaos Theory

The Library

Search

Information Theory

Mar 10, 2025, 4 min read

The study of efficient data compression, and robust and reliable data transmission.

Important Quantities

Let be a probability distribution. The entropy is defined as
- It can be interpreted as the expected amount of surprise that we may have about a given event. That is, it is the expected amount of information we can gain from the distribution.
- This also corresponds to the degree in which the distribution is Uniform.
- This also corresponds with the amount of uncertainty that we may have about the distribution.
Let and be probability distributions. The cross entropy is defined as
- It is a measure of the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distribution rather than the true distribution .
The conditional entropy is defined as
- It measures how much entropy has remaining given we have learnt the value of .
The Pointwise Mutual Information between two events and is defined as
- It measures the discrepancy between these occurring together compared to what would be expected by chance.
- It is also the amount we learn from updating a prior into a posterior.

Topics

F Divergence
Mutual Information

Miscellaneous

The Kozachenko-Leonenko Estimate* ¹ for Entropy works as follows. Let be a continuous random variable with values in some metric space and be the density. The entropy is defined as

And estimated using the digamma function . Let be twice the distance from to its -th nearest neighbor.

Then

Where is the dimension of and is the volume of the -dimensional unit ball.
- The idea is to estimate using the probability distribution between and its -th nearest neighbor — specifically, , is the probability that a point is within from , that there are other points at smaller distances, and points at larger distances.
  
  Let be the mass of the -ball centered at .
  
  It can be shown that
  
  If we assume that is constant in the entire -ball, we have
  
  And
- For maximum norm, set
- For Euclidean norm, set
- The estimator is unbiased if is strictly constant.

Links

Probability Theory - more on probability which is the basis of Information Theory.
Murphy Ch. 2.8

Footnotes

: A specification is given in Kraskov, Stoegbauer, and Grassberger (2003) Estimating Mutual Information. The original paper is in Russian. ↩

Created with Quartz v4.1.0, © 2025

Have an issue Send an issue here

GitHub