[^Oguntola__2023] Test
1 Characterizes the performance of model based and model-free learning in the context of Human-Robot Interactions.
- Rationale: There is debate for whether or not to prefer model-based and model-free learning, especially in the context of HRI.
- The paper examines an HRI system where the robot optimizes a reward function in the presence of a human who optimizes theirs in response to the robots actions.
- Key Findings:
- If we had a perfect theory of mind, we’d reduce the sample complexity of learning compared to black-box model-based learning, and we could learn off of off-policy data.
- It is possible for black box model based approaches to fail to improve even with a lot of data, unless that data comes from the right distribution
- Theory of Mind is robust to the wrong assumptions, and even when the human deviates but this was due to ToM’s wrong predictions.
- Theory of Mind is easily transferrable.
- Based on the researchers experiments, model-based methods do not sufficiently explore the search space and are stuck in low reward local minima.
- If we have a good model, learning its parameters leads to good performance compared to learning from scratch.
- Limitations: The “human” aspect of the simulated HRI system may not be representative of real life. The results were also only derived from one particular task.
- The robot figures out the human’s reward function via inverse RL.
Off policy approach: Collect a training dataset of human interactions and fit a neural network to estimate the human’s cost function. This ignores the fact that the predictive model influences the behavior of the human.
On policy approach: Data collection and training are done iteratively.
Idealized Exploration approach : Aims to address exploration issues in both off-policy and on-policy approaches. Data is collected on the optimal policy. This data is mixed with the on-policy data.
Model free learning approach: Use Policy Gradient Methods to train the model.
Swamy et al. (2020) On the Utility of Model Learning in HRI ↩