Interactive Reward Tuning
Interactive Visualization for Preference Elicitation

Accepted paper @ IROS 2024 Abu Dhabi

Authors

Danqing Shi Shibei Zhu Tino Weinkauf Antti Oulasvirta

Abstract

In reinforcement learning, tuning reward weights in the reward function is necessary to align behavior with user preferences. However, current approaches, which use pairwise comparisons for preference elicitation, are inefficient, because they miss much of the human ability to explore and judge groups of candidate solutions. The paper presents a novel visualization-based approach that better exploits the user’s ability to quickly recognize interesting directions for reward tuning. It breaks down the tuning problem by using the visual information-seeking principle: overview first, zoom and filter, then details-on-demand. Following this principle, we built a visualization system comprising two interactively linked views: 1) an embedding view showing a contextual overview of all sampled behaviors and 2) a sample view displaying selected behaviors and visualizations of the detailed time-series data. A user can efficiently explore large sets of samples by iterating between these two views. The paper demonstrates that the proposed approach is capable of tuning rewards for challenging behaviors. The simulation-based evaluation shows that the system can reach optimal solutions with fewer queries relative to baselines.

Media

Material

Paper

Cite

@article{shi2024interactive, title={Interactive Reward Tuning: Interactive Visualization for Preference Elicitation}, author = {Shi, Danqing and Zhu, Shibei and Weinkauf, Tino and Oulasvirta, Antti}, booktitle={2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, year={2024}, organization={IEEE} }