• Regret Matching involves minimizing regret and achieving no-regret outcomes.

    • Unconditional Regret Matching - computes action probabilities proportional to the positive average unconditional regrets of the action. Here, we define unconditional regret as follows

      The average unconditional regret is given as

      We then perform the update as follows (note that the following still needs to be normalized )

  • Conditional regret matching computes action probabilities proportional to the positive average unconditional regrets of the action. Here, we define conditional regret for choosing instead of as

    The average conditional regret is given as

    The policy is then updated as follows

    In the formula, we have that is the action chosen in the last episode and

    Which is a hyperparameter controlling the bias towards higher conditional regret (higher = lower bias). The lower bound ensures the resulting sum of probabilities , when .

  • The average regret of each agent is bounded by . At the limit when , regret will approach .

  • If all agents use conditional / unconditional regret matching, then the empirical distribution of joint actions will converge to the set of correlated / coarse-correlated equilibria (see Albrecht, Christianos, and Schafer for more on this ).

    We achieve the following bound under the empirical distribution

Links