Diffused Task-Agnostic Milestone Planner

Robot Learning Lab, Seoul Natiaonl University,

NeurIPS 2023

Abstract

Addressing decision-making problems using sequence modeling to predict future trajectories shows promising results in recent years. In this paper, we take a step further to leverage the sequence predictive method in wider areas such as long-term planning, vision-based control, and multi-task decision-making. To this end, we propose a method to utilize a diffusion-based generative sequence model to plan a series of milestones in a latent space and to have an agent to follow the milestones to accomplish a given task. The proposed method can learn control-relevant, low-dimensional latent representations of milestones, which makes it possible to efficiently perform long-term planning and vision-based control. Furthermore, our approach exploits generation flexibility of the diffusion model, which makes it possible to plan diverse trajectories for multi-task decision-making. We demonstrate the proposed method across offline reinforcement learning (RL) benchmarks and an visual manipulation environment. The results show that our approach outperforms offline RL methods in solving long-horizon, sparse-reward tasks and multi-task problems, while also achieving the state-of-the-art performance on the most challenging vision-based manipulation benchmark.

Video

Overview

We propose a method named Diffused Task-Agnostic Milestone Planner (DTAMP), which predicts milestones in a latent space using a diffusion model, to guide an agent to reach a given goal by following them.

Visualization of Planned Milestones

Antmaze-medium

Antmaze-large

Kitchen

Implied tasks: 'open microwave', 'move kettle', 'turn on light', 'open cabinet'

Calvin

Implied tasks: 'place block in drawer', 'move slider left', 'turn on led'

Results

Averaged score / inference time comparison

The figures compare DTAMP to the other baselines in the terms of inference time and averaged score. DTAMP shows the highest averaged score & 300~1,000 times faster inference time compared to the other sequence modeling methods (TT: Trajectory Transformer, DD: Decision Diffuser).

CALVIN benchmark

DTAMP outperforms the other baselines and achieves state-of-the-art performance on the CALVIN benchmark tasks.
Number of tasks represents the number of tasks implied by a single goal image, and DTAMP+Replanning represents a variant of DTAMP that allows the agent to replan the milestones during an episode.

BibTeX

@InProceedings{hong2023dtamp,
     author    = {Hong, Mineui and Kang, Minjae and Oh, Songhwai},
     title     = {Diffused Task-Agnostic Milestone Planner},
     booktitle = {Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)},
     month     = {December},
     year      = {2023}
 }