-
Notation outlined for reinforcement learning and game theory remains relevant unless overwritten here
-
- the transition probability of the environment. -
- denotes something for agent taken at time step . 1 -
- denotes something joint (for all agents) taken at time step . -
- pertains to the -th agent -
- pertains to everything but the -th agent . -
- pertains to the agents from the set .
- initial state distribution - the set of observations available to agent - agent ’s observation function - the joint observation function. - a generic observation. - a belief state - a generic environment action space for agent - a generic communication action space for agent ,
- history at time step - the full history starting from time step up to and including time step . - the history of agent at time step . - the set of full histories - the return of agent under policy - the discounted return of the agent in - the last state in history . - the history of observations
- concatenation. Sometimes this is dropped in favor of the notation - a special joint policy wherein the actions of the agents can be correlated with each other. - action modifier for correlated equilibrium - the social welfare of a joint policy - the social fairness of a joint policy. - the regret of the -th agent when examining the history from episode to (inclusive).
- empirical distribution obtained from following policy . - minimax solution for agent and starting at state . - JAL-GT game starting from - JAL-GT game starting from and entry for agent - agent ;‘s model for policy for agent - number of times action was selected - number of times action was selected when in state . - expected value of agent for taking action when in state - action value given the history and action of agent . - value of information - average of past policies.
- difference reward for agent . - default action - common information (in centralized value functions) - the utility of agent at time step .
- parameters (for neural network models) - parameters for agent ’s model of agent ’s policy. - encoder network taking in input and with parameters . - output of encoder network for agent at timestep - decoder network taking in a latent representation and with parameters . - indicates shared parameters
- population of policies for agent at time . - meta-game for the -th generation.
Footnotes
-
Note that sometimes we may write the agent as the superscript. However, we will always use
for agents and for time. ↩