• Notation outlined for reinforcement learning and game theory remains relevant unless overwritten here

  • - the transition probability of the environment.

  • - denotes something for agent taken at time step . 1

  • - denotes something joint (for all agents) taken at time step .

  • - pertains to the -th agent

  • - pertains to everything but the -th agent .

  • - pertains to the agents from the set .

  • - initial state distribution
  • - the set of observations available to agent
  • - agent ’s observation function
  • - the joint observation function.
  • - a generic observation.
  • - a belief state
  • - a generic environment action space for agent
  • - a generic communication action space for agent ,

  • - history at time step
  • - the full history starting from time step up to and including time step .
  • - the history of agent at time step .
  • - the set of full histories
  • - the return of agent under policy
  • - the discounted return of the agent in
  • - the last state in history .
  • - the history of observations

  • - concatenation. Sometimes this is dropped in favor of the notation
  • - a special joint policy wherein the actions of the agents can be correlated with each other.
  • - action modifier for correlated equilibrium
  • - the social welfare of a joint policy
  • - the social fairness of a joint policy.
  • - the regret of the -th agent when examining the history from episode to (inclusive).

  • - empirical distribution obtained from following policy .
  • - minimax solution for agent and starting at state .
  • - JAL-GT game starting from
  • - JAL-GT game starting from and entry for agent
  • - agent ;‘s model for policy for agent
  • - number of times action was selected
  • - number of times action was selected when in state .
  • - expected value of agent for taking action when in state
  • - action value given the history and action of agent .
  • - value of information
  • - average of past policies.

  • - difference reward for agent .
  • - default action
  • - common information (in centralized value functions)
  • - the utility of agent at time step .

  • - parameters (for neural network models)
  • - parameters for agent ’s model of agent ’s policy.
  • - encoder network taking in input and with parameters .
  • - output of encoder network for agent at timestep
  • - decoder network taking in a latent representation and with parameters .
  • - indicates shared parameters

  • - population of policies for agent at time .
  • - meta-game for the -th generation.

Footnotes

  1. Note that sometimes we may write the agent as the superscript. However, we will always use for agents and for time.