Regret Matching

Regret Matching involves minimizing regret and achieving no-regret outcomes.
- Unconditional Regret Matching - computes action probabilities proportional to the positive average unconditional regrets of the action. Here, we define unconditional regret as follows
  
  The average unconditional regret is given as
  
  We then perform the update as follows (note that the following still needs to be normalized )
Conditional regret matching computes action probabilities proportional to the positive average unconditional regrets of the action. Here, we define conditional regret for choosing instead of as

The average conditional regret is given as

The policy is then updated as follows

In the formula, we have that is the action chosen in the last episode and

Which is a hyperparameter controlling the bias towards higher conditional regret (higher = lower bias). The lower bound ensures the resulting sum of probabilities , when .
The average regret of each agent is bounded by . At the limit when , regret will approach .
If all agents use conditional / unconditional regret matching, then the empirical distribution of joint actions will converge to the set of correlated / coarse-correlated equilibria (see Albrecht, Christianos, and Schafer for more on this ).

We achieve the following bound under the empirical distribution

The Library