- Reinforcement Learning can be thought of as an adversarial process between generating a good policy based on a value function and generating a value function from a given policy. Both converge to the optimum.
Topics
Papers
-
1 proposes DIAYN for Hierarchical RL as a method to learn skills—latent conditioned, consistent policies, without the use of a reward function.
- In order to acquire skills that are useful, we must train the skills so that they maximize coverage over the set of possible behaviors while being distinct enough from each other.
- The paper uses maximum entropy policies to be diverse to obtain a mixture of skills. This method is easily generalizable to other tasks.
- Different skills specialize in visiting different states.
- States are used to distinguish skills rather than actions. This way actions that have the same effect are indistinguishable.
- Skills are diverse and suited for complex tasks.
- The skills can be learnt with supervision as needed.
- Learnt skills can be used for imitating an expert.
Links
- Sutton and Barto
- Powell
- OpenAI Spinning Up - has various modern algorithms
Footnotes
-
Eyesenbach, Gupta, Ibarz, and Levine (2018) Diversity Is All You Need: Learning Skills Without A Reward Function ↩