CTDE is an approach in training MARLagents.
- They enable conditioning approximate value functions on privileged information in a computationally tractable manner.
- They are common since we only need a policy during inference time, but the policy relies on the value function which can be made accurate through Centralized Training.
- DDPG plus a centralized learning - decentralized execution approach works (called MADDPG). This mixes DDPG’s mode-free stability with the centralized approach. This also requires considering the outcome of the choices of all other agents.

Centralized Critics

The Library