-
Regret Matching involves minimizing regret and achieving no-regret outcomes.
-
Unconditional Regret Matching - computes action probabilities proportional to the positive average unconditional regrets of the action. Here, we define unconditional regret as follows
The average unconditional regret is given as
We then perform the update as follows (note that the following still needs to be normalized )
-
-
Conditional regret matching computes action probabilities proportional to the positive average unconditional regrets of the action. Here, we define conditional regret for choosing
instead of as The average conditional regret is given as
The policy is then updated as follows
In the formula, we have that
is the action chosen in the last episode and Which is a hyperparameter controlling the bias towards higher conditional regret (higher
= lower bias). The lower bound ensures the resulting sum of probabilities , when . -
The average regret of each agent is bounded by
. At the limit when , regret will approach . -
If all agents use conditional / unconditional regret matching, then the empirical distribution of joint actions
will converge to the set of correlated / coarse-correlated equilibria (see Albrecht, Christianos, and Schafer for more on this ). We achieve the following bound under the empirical distribution