Make-An-Agent

A Generalizable Policy Network Generator with Behavior-Prompted Diffusion

NeurIPS 2024

1University of Maryland, College Park 2Tsinghua University, IIIS 3UC San Diego

Make-An-Agent synthesizes policy network parameters using agents' trajectories as prompts.
Train once, use everywhere.

Abstract

Can we generate a control policy for an agent using just one demonstration of desired behaviors as a prompt, as effortlessly as creating an image from a textual description? In this paper, we present Make-An-Agent, a novel policy parameter generator that leverages the power of conditional diffusion models for behavior-to-policy generation.

Guided by behavior embeddings that encode trajectory information, our policy generator synthesizes latent parameter representations, which can then be decoded into policy networks. Trained on policy network checkpoints and their corresponding trajectories, our generation model demonstrates remarkable versatility and scalability on multiple tasks and has a strong generalization ability on unseen tasks to output well-performed policies with only few-shot demonstrations as inputs. We showcase its efficacy and efficiency on various domains and tasks, including varying objectives, behaviors, and even across different robot manipulators. Beyond simulation, we directly deploy policies generated by Make-An-Agent onto real-world robots on locomotion tasks.



Effectiveness, Versatility, Generalizability

Test trajectories are from the same RL training buffer and evaluations are under environmental randomness (4 seeds).

Make-An-Agent can generate optimal policies for a wide variety of tasks with few-shot trajectories as conditional inputs.

Make-An-Agent demonstrates strong generalizability, synthesizing well-performed policies for unfamiliar tasks and robots.

Benchmark visualization in simulators

Multi-task training on MetaWorld and Robosuite: 13 tasks, cross-domain

Unseen task/robot generalization: 11 tasks

Real-world deployment

Synthesizing policy networks using trajectories from IsaacGym simulator and deploying in real-world locomotion tasks.


Avoid stepping on a bouquet while moving across a mat.

Navigating to circumvent the goal and ball swiftly.



Methodology

(a) A contrastive behavior embedding to process trajectory data. (b) An autoencoder to encode and decode policy network parameters. (c) A policy network generator using a conditional latent diffusion model as the backbone.

Policy synthesis, not memorizing

Trajectories as conditional inputs v.s. Trajectories deployed by synthesized policies from Make-An-Agent.
Make-An-Agent generates diverse and more efficient policies.

BibTeX

@article{liang2024make,
      title={Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion},
      author={Liang, Yongyuan and Xu, Tingqiang and Hu, Kaizhe and Jiang, Guangqi and Huang, Furong and Xu, Huazhe},
      journal={arXiv preprint arXiv:2407.10973},
      year={2024}
      }