Graph View

Backlinks

Multi-Agent Reinforcement Learning

The Library

Search

MARL -- Notation Guide

Mar 10, 2025, 4 min read

Notation outlined for reinforcement learning and game theory remains relevant unless overwritten here
- the transition probability of the environment.
- denotes something for agent taken at time step . ¹
- denotes something joint (for all agents) taken at time step .
- pertains to the -th agent
- pertains to everything but the -th agent .
- pertains to the agents from the set .

- initial state distribution
- the set of observations available to agent
- agent ’s observation function
- the joint observation function.
- a generic observation.
- a belief state
- a generic environment action space for agent
- a generic communication action space for agent ,

- history at time step
- the full history starting from time step up to and including time step .
- the history of agent at time step .
- the set of full histories
- the return of agent under policy
- the discounted return of the agent in
- the last state in history .
- the history of observations

- concatenation. Sometimes this is dropped in favor of the notation
- a special joint policy wherein the actions of the agents can be correlated with each other.
- action modifier for correlated equilibrium
- the social welfare of a joint policy
- the social fairness of a joint policy.
- the regret of the -th agent when examining the history from episode to (inclusive).

- empirical distribution obtained from following policy .
- minimax solution for agent and starting at state .
- JAL-GT game starting from
- JAL-GT game starting from and entry for agent
- agent ;‘s model for policy for agent
- number of times action was selected
- number of times action was selected when in state .
- expected value of agent for taking action when in state
- action value given the history and action of agent .
- value of information
- average of past policies.

- difference reward for agent .
- default action
- common information (in centralized value functions)
- the utility of agent at time step .

- parameters (for neural network models)
- parameters for agent ’s model of agent ’s policy.
- encoder network taking in input and with parameters .
- output of encoder network for agent at timestep
- decoder network taking in a latent representation and with parameters .
- indicates shared parameters

- population of policies for agent at time .
- meta-game for the -th generation.

Footnotes

Note that sometimes we may write the agent as the superscript. However, we will always use for agents and for time. ↩

Created with Quartz v4.1.0, © 2025

Have an issue Send an issue here

GitHub