• CTDE is an approach in training MARLagents.
    • They enable conditioning approximate value functions on privileged information in a computationally tractable manner.
    • They are common since we only need a policy during inference time, but the policy relies on the value function which can be made accurate through Centralized Training.
    • DDPG plus a centralized learning - decentralized execution approach works (called MADDPG). This mixes DDPG’s mode-free stability with the centralized approach. This also requires considering the outcome of the choices of all other agents.

Centralized Critics

  • We can use Actor Critic Methods since a centralized critic can be used to train decentralized actors. Empirically, we can observe that they have the following advantages.
    • Eased learning
    • Learning coordinated behaviors
    • Improved performance
    • Reduced variance
    • More robustness.

Links