-
Proposed by 1
-
Interactions within the population of agents are approximated by those between a single agent and the average effect from the overall population or neighboring agents.
-
It aims to address non-stationarity and convergence of MARL problems and the instability of various MARL approaches.
-
The
function is factorized using only pairwise local interactions. Let be the index set of the neighboring agents of agent as defined by the problem. -
The pairwise interaction
is approximated using Mean Field Theory. Let
be represented using a one-hot encoding of each of the possible actions. The mean action denoted
based on neighborhood is defined as follows Where
is a small perturbation to the mean action. -
The
function can then be expressed as Where
Essentially
acts as a random variable which serves as a small perturbation near zero. -
Assuming all agents are homogeneous, the remainders cancel and give us the following approximation
That is the pairwise interactions are simplified to be between
and a virtual mean agent (from the mean effect of all of ’s neighbors). -
The mean-field
function is updated as follows -
The MARL problem is thus converted to finding
’s best response with respect to the mean action of all its neighbors. -
The policy
can be defined using the virtual mean agent as -
For Policy Gradient Methods we minimize the loss function
Where
is the target mean value.
The gradient is then given by
-
For actor critic methods, the gradient of the actor is trained with the gradient
Footnotes
-
Yang et al. (2018) Mean Field Multi-Agent Reinforcement Learning ↩