• Proposed by 1

  • Interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents.

  • It aims to address non-stationarity and convergence of MARL problems and the instability of various MARL approaches.

  • The function is factorized using only pairwise local interactions. Let be the index set of the neighboring agents of agent as defined by the problem.

  • The pairwise interaction is approximated using Mean Field Theory.

    Let be represented using a one-hot encoding of each of the possible actions.

    The mean action denoted based on neighborhood is defined as follows

    Where is a small perturbation to the mean action.

  • The function can then be expressed as

    Where

    Essentially acts as a random variable which serves as a small perturbation near zero.

  • Assuming all agents are homogeneous, the remainders cancel and give us the following approximation

    That is the pairwise interactions are simplified to be between and a virtual mean agent (from the mean effect of all of ’s neighbors).

  • The mean-field function is updated as follows

  • The MARL problem is thus converted to finding ’s best response with respect to the mean action of all its neighbors.

  • The policy can be defined using the virtual mean agent as

  • For Policy Gradient Methods we minimize the loss function

    Where

    is the target mean value.

    The gradient is then given by

  • For actor critic methods, the gradient of the actor is trained with the gradient

Footnotes

  1. Yang et al. (2018) Mean Field Multi-Agent Reinforcement Learning